-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Automatic threshold determination #126
Comments
That's a good question! The choice of stringent ( You may consider this example for choosing thresholds: You may need to choose different thresholds if you use a different peak caller or different experiment setup. We are interested in developing a method that can suggest values for these thresholds based on given input. Do you have any particular method in mind? |
Not really. I'm working with multiple setups including ChIP-Seq (histones, tf), ATAC-Seq, RIP-Seq, and so on, which result in different peak sizes, distributions, and signal-to-noise ratio. Peak calling is done with MACS2, EPIC2, and MUSIC. If you plot these peaks after sorting for ascending -log10(fdr) you get something like this (example from using MACS2 with q-value 1e-4 on ATAC-Seq resulting in ~80k peaks). A generalized algorithm would have to dynamically find lower/upper bounds here, which I have no good idea about. You could look for the slope, i.e. where it changes from linear to exponential. Or area under the curve, but that basically amounts to percentiles, which seems kinda crude. BTW: in my experience the "extreme" peaks are often artefacts that result from erroneously calling multiple peaks as one, or are located in weird regions that should really be blacklisted. I know you can't fix issues with the peak calling itself, this should be done before your software is run. Just saying that the super peaks are often fishy, despite being highly significant and reproducible. Edit: I just included your two suggested thresholds. |
This is a good suggestion, we can give it a try. We can work on it together if you're interested. Regarding the second point: Yes, agree. We discussed excluding black listed regions (#85), probably we can prioritize it. |
The automatic thresholding would be indeed a cool improvement for MSPC, but it’s very hard to do since it depends a lot on the particular experiment and on the peak caller used. Regarding the extreme peaks, excluding ENCODE blacklisted regions as a first MSPC step would be certainty useful, and not complex to do. |
I agree that it's probably impossible to automatically find the perfect threshold for every experimental variation automatically. But you might be able to identify certain types of distributions that behave similarly. And an imperfect auto threshold option would already be a great help. Just something to start off from a not totally arbitrary point. Considering the ENCODE blacklisted regions: these were already removed from my example peaks. I guess that most people do this as part of the normal peak calling. But it sure can't hurt to have the option here. |
@ckuenne I agree suggesting a value for these thresholds is important. I think it would be a very useful addition; would you be interested leading its method development and implementation? I can help you as much needed. Some questions probably we can start with:
|
Well, if I had a viable idea for this I would probably just have written a small wrapper and that's that. Sadly, I don't have any. And this is not really something i can devote myself to right now. Sorry. |
Thanks for the suggestion anyway :) I'd keep issue open for a while in case anyone comes up a method. |
Hello! I've been experimenting with MSPC a bit to call peaks from two replicates and I've also been struggling with setting thresholds. |
That is a good suggestion! Some thoughts:
|
Hi,
is there a systematic way to find meaningful cutoffs for -s/-w? How do you decide on these?
Best,
Carsten
The text was updated successfully, but these errors were encountered: