Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

abaghela · 2017-08-02T19:24:23Z

Develop an interactive application to facilitate informed sequencing quality control decisions for downstream analysis on many samples

There's the saying of "garbage in, garbage out" in computer science where the quality of your input influences downstream analyses. Genome sequencing has decreased in cost and so experiments can have many more samples. Manually checking each sample can be time consuming, and less precise. So I propose the development of web application or tool where you can drop in your samples and interactively explore the quality of your samples. This tool could be built by various means. One option would be to develop a Shiny R application, which would require knowledge of R, the Shiny package, and possibly HTML/CSS/JavaScript. Another would be to rely on web development standards (HTML/CSS/JS) to build something like an Electron application for cross browser compatibility and be user friendly. This idea stems from my experience dealing with 16S rRNA sequencing samples. I had a single experiment collect about 200 samples, with a total of about 400 samples for paired end sequencing. Manually viewing all 400 samples is time consuming. Additionally, further analysis of sequencing reads typically require some trimming based on the quality diminishing with longer reads. This tool could also be designed to recommend an ideal trim length based on your specifications of a hard threshold trimming all samples this length, or a dynamic threshold per sample basis. This trimming parameter will depend on the downstream tools used if they can handle such varying read lengths.

erictleung · 2017-09-10T02:26:22Z

Sooo I may have found someone's solution to my proposed project called MultiQC (GitHub link). It was published just over a year ago and is even more robust and has more functionality than just for my 16S rRNA use case. A quick Biostars/Google search could have saved me time 😅

@abaghela if you allow me, I have another proposition for a project I could lead that is specific to microbiome analysis. Let me know if you have any concerns with this new proposed project or not. Thanks.

Title: Develop an interactive application to help understand alpha and beta diversity metrics choices

Problem: There are many alpha and beta diversity metrics to analyze microbial ecological or microbiome data. Alpha diversity describes an estimate of the total number of species in a sample. Beta diversity describes the differences between samples. Below are some example of then number of metrics you can use.

Plot from "Alpha diversity graphics" page for phyloseq showing various alpha diversity metrics to choose from http://joey711.github.io/phyloseq/plot_richness-examples

Below is are just a few beta diversity metrics choose from

> library(phyloseq)
> unlist(distanceMethodList)
    UniFrac1     UniFrac2        DPCoA          JSD     vegdist1     vegdist2
   "unifrac"   "wunifrac"      "dpcoa"        "jsd"  "manhattan"  "euclidean"
    vegdist3     vegdist4     vegdist5     vegdist6     vegdist7     vegdist8
  "canberra"       "bray" "kulczynski"    "jaccard"      "gower"   "altGower"
    vegdist9    vegdist10    vegdist11    vegdist12    vegdist13    vegdist14
  "morisita"       "horn"  "mountford"       "raup"   "binomial"       "chao"
   vegdist15   betadiver1   betadiver2   betadiver3   betadiver4   betadiver5
       "cao"          "w"         "-1"          "c"         "wb"          "r"
  betadiver6   betadiver7   betadiver8   betadiver9  betadiver10  betadiver11
         "I"          "e"          "t"         "me"          "j"        "sor"
 betadiver12  betadiver13  betadiver14  betadiver15  betadiver16  betadiver17
         "m"         "-2"         "co"         "cc"          "g"         "-3"
 betadiver18  betadiver19  betadiver20  betadiver21  betadiver22  betadiver23
         "l"         "19"         "hk"        "rlb"        "sim"         "gl"
 betadiver24        dist1        dist2        dist3   designdist
         "z"    "maximum"     "binary"  "minkowski"        "ANY"

With so many metrics to choose from, how do you know which is the "best" and how will your data affect the calculation of these metrics?

Proposed Project: Create an interactive Shiny application to show changes in your chosen alpha or beta diversity metrics to see how each change based on simulated or real data. Some of these metrics are sensitive to single or double counts of species so this will be good to see how different distributions of counts will change these metrics and your interpretations of them. This should be designed to give an intuitive understanding of how these metrics work.

Possible Requirements:

Knowledge of R programming
Knowledge (or willingness to learn) Shiny R package
Local computer is sufficient to develop
RStudio installed (this will make it easier to make Shiny apps)

abaghela · 2017-09-11T01:08:10Z

@erictleung Hi Eric, we approve your change in project. We are looking forward to this new one!

ampatzia · 2017-10-01T10:58:19Z

Assignments are out, really looking forward to collaborating in this 👍
@erictleung Need in help with preparation?

erictleung · 2017-10-02T18:08:13Z

@ampatzia thanks for your interest! I've created a bare repository for put this project. I plan on getting a base Shiny application up for people to get up and running later this week, along with some ideas of what could be in the application itself. If I come up with anything else, I'll let you know! 😄

erictleung · 2017-10-04T21:56:41Z

Some good articles to use while working on this project will be http://shiny.rstudio.com/articles/. It has lots of content on getting started, building the structure, frontend and backend sides of the application, and improving it.

jakelever · 2017-10-10T23:48:35Z

Hey team lead, we've been gathering Github IDs for your team members. We see that you've already started a repo for this project. So could you please add the following people as collaborators to that project?

aimirza
amanji
rnoronha00
ampatzia
vnsriniv

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake
obo the Hackseq organising committee

erictleung · 2017-10-11T18:46:34Z

@jakelever thanks!

jakelever · 2017-10-11T22:15:30Z

Hi, one more Github ID for you:

cabrerad

Thanks, Jake

jakelever · 2017-10-13T04:08:29Z

And one last one: scatcher125

Cheers, Jake

erictleung · 2017-10-14T01:43:40Z

@jakelever both added! Thanks for the update.

jakelever · 2017-10-17T20:16:09Z

And actually one more Github ID: szhan

knausb mentioned this issue Sep 5, 2017

Project 5: Developing advanced R tutorials for genomic data analysis #5

Open

abaghela changed the title ~~Project 4: Develop an interactive application to facilitate informed sequencing quality control decisions for downstream analysis on many samples~~ Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices Sep 11, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

abaghela commented Aug 2, 2017

erictleung commented Sep 10, 2017 •

edited

Loading

abaghela commented Sep 11, 2017

ampatzia commented Oct 1, 2017

erictleung commented Oct 2, 2017 •

edited

Loading

erictleung commented Oct 4, 2017

jakelever commented Oct 10, 2017 •

edited

Loading

erictleung commented Oct 11, 2017

jakelever commented Oct 11, 2017

jakelever commented Oct 13, 2017

erictleung commented Oct 14, 2017

jakelever commented Oct 17, 2017

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

Comments

abaghela commented Aug 2, 2017

erictleung commented Sep 10, 2017 • edited Loading

Plot from "Alpha diversity graphics" page for phyloseq showing various alpha diversity metrics to choose from http://joey711.github.io/phyloseq/plot_richness-examples

abaghela commented Sep 11, 2017

ampatzia commented Oct 1, 2017

erictleung commented Oct 2, 2017 • edited Loading

erictleung commented Oct 4, 2017

jakelever commented Oct 10, 2017 • edited Loading

erictleung commented Oct 11, 2017

jakelever commented Oct 11, 2017

jakelever commented Oct 13, 2017

erictleung commented Oct 14, 2017

jakelever commented Oct 17, 2017

erictleung commented Sep 10, 2017 •

edited

Loading

erictleung commented Oct 2, 2017 •

edited

Loading

jakelever commented Oct 10, 2017 •

edited

Loading