Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices #4

Open
abaghela opened this issue Aug 2, 2017 · 11 comments

Comments

@abaghela
Copy link
Contributor

abaghela commented Aug 2, 2017

Develop an interactive application to facilitate informed sequencing quality control decisions for downstream analysis on many samples

There's the saying of "garbage in, garbage out" in computer science where the quality of your input influences downstream analyses. Genome sequencing has decreased in cost and so experiments can have many more samples. Manually checking each sample can be time consuming, and less precise. So I propose the development of web application or tool where you can drop in your samples and interactively explore the quality of your samples. This tool could be built by various means. One option would be to develop a Shiny R application, which would require knowledge of R, the Shiny package, and possibly HTML/CSS/JavaScript. Another would be to rely on web development standards (HTML/CSS/JS) to build something like an Electron application for cross browser compatibility and be user friendly. This idea stems from my experience dealing with 16S rRNA sequencing samples. I had a single experiment collect about 200 samples, with a total of about 400 samples for paired end sequencing. Manually viewing all 400 samples is time consuming. Additionally, further analysis of sequencing reads typically require some trimming based on the quality diminishing with longer reads. This tool could also be designed to recommend an ideal trim length based on your specifications of a hard threshold trimming all samples this length, or a dynamic threshold per sample basis. This trimming parameter will depend on the downstream tools used if they can handle such varying read lengths.

Team Lead: Eric Leung | [email protected] | @erictleung | Grad Student | Oregon Health & Science University, USA |

@erictleung
Copy link

erictleung commented Sep 10, 2017

Sooo I may have found someone's solution to my proposed project called MultiQC (GitHub link). It was published just over a year ago and is even more robust and has more functionality than just for my 16S rRNA use case. A quick Biostars/Google search could have saved me time 😅

@abaghela if you allow me, I have another proposition for a project I could lead that is specific to microbiome analysis. Let me know if you have any concerns with this new proposed project or not. Thanks.


Title: Develop an interactive application to help understand alpha and beta diversity metrics choices

Problem: There are many alpha and beta diversity metrics to analyze microbial ecological or microbiome data. Alpha diversity describes an estimate of the total number of species in a sample. Beta diversity describes the differences between samples. Below are some example of then number of metrics you can use.

Plot from "Alpha diversity graphics" page for phyloseq showing various alpha diversity metrics to choose from http://joey711.github.io/phyloseq/plot_richness-examples

Below is are just a few beta diversity metrics choose from

> library(phyloseq)
> unlist(distanceMethodList)
    UniFrac1     UniFrac2        DPCoA          JSD     vegdist1     vegdist2
   "unifrac"   "wunifrac"      "dpcoa"        "jsd"  "manhattan"  "euclidean"
    vegdist3     vegdist4     vegdist5     vegdist6     vegdist7     vegdist8
  "canberra"       "bray" "kulczynski"    "jaccard"      "gower"   "altGower"
    vegdist9    vegdist10    vegdist11    vegdist12    vegdist13    vegdist14
  "morisita"       "horn"  "mountford"       "raup"   "binomial"       "chao"
   vegdist15   betadiver1   betadiver2   betadiver3   betadiver4   betadiver5
       "cao"          "w"         "-1"          "c"         "wb"          "r"
  betadiver6   betadiver7   betadiver8   betadiver9  betadiver10  betadiver11
         "I"          "e"          "t"         "me"          "j"        "sor"
 betadiver12  betadiver13  betadiver14  betadiver15  betadiver16  betadiver17
         "m"         "-2"         "co"         "cc"          "g"         "-3"
 betadiver18  betadiver19  betadiver20  betadiver21  betadiver22  betadiver23
         "l"         "19"         "hk"        "rlb"        "sim"         "gl"
 betadiver24        dist1        dist2        dist3   designdist
         "z"    "maximum"     "binary"  "minkowski"        "ANY"

With so many metrics to choose from, how do you know which is the "best" and how will your data affect the calculation of these metrics?

Proposed Project: Create an interactive Shiny application to show changes in your chosen alpha or beta diversity metrics to see how each change based on simulated or real data. Some of these metrics are sensitive to single or double counts of species so this will be good to see how different distributions of counts will change these metrics and your interpretations of them. This should be designed to give an intuitive understanding of how these metrics work.

Possible Requirements:

  • Knowledge of R programming
  • Knowledge (or willingness to learn) Shiny R package
  • Local computer is sufficient to develop
  • RStudio installed (this will make it easier to make Shiny apps)

@abaghela
Copy link
Contributor Author

@erictleung Hi Eric, we approve your change in project. We are looking forward to this new one!

@abaghela abaghela changed the title Project 4: Develop an interactive application to facilitate informed sequencing quality control decisions for downstream analysis on many samples Project 4: Develop an interactive application to help understand alpha and beta diversity metrics choices Sep 11, 2017
@ampatzia
Copy link

ampatzia commented Oct 1, 2017

Assignments are out, really looking forward to collaborating in this 👍
@erictleung Need in help with preparation?

@erictleung
Copy link

erictleung commented Oct 2, 2017

@ampatzia thanks for your interest! I've created a bare repository for put this project. I plan on getting a base Shiny application up for people to get up and running later this week, along with some ideas of what could be in the application itself. If I come up with anything else, I'll let you know! 😄

@erictleung
Copy link

Some good articles to use while working on this project will be http://shiny.rstudio.com/articles/. It has lots of content on getting started, building the structure, frontend and backend sides of the application, and improving it.

@jakelever
Copy link

jakelever commented Oct 10, 2017

Hey team lead, we've been gathering Github IDs for your team members. We see that you've already started a repo for this project. So could you please add the following people as collaborators to that project?

aimirza
amanji
rnoronha00
ampatzia
vnsriniv

Once the people are added, it'd be a great idea to start a discussion on that repo with information to get your team members started (e.g. some small suggested reading, things to look up, etc). We will also be adding everyone to Slack and creating a specific channel for each project. This may be an easier way to communicate.

We'll forward on any remaining Github IDs through this issue.

Thanks, Jake
obo the Hackseq organising committee

@erictleung
Copy link

@jakelever thanks!

@jakelever
Copy link

Hi, one more Github ID for you:

cabrerad

Thanks, Jake

@jakelever
Copy link

And one last one: scatcher125

Cheers, Jake

@erictleung
Copy link

@jakelever both added! Thanks for the update.

@jakelever
Copy link

And actually one more Github ID: szhan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants