-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated usage example #39
Comments
hi! I just wanted to check in again and see if this is something you could provide guidance on. Due to changes in the CLI I wasn't able to figure out how to run kSpider to cluster transcripts by shared k-mers. I'm super excited by the initial results you shared and would love to try it on more data |
Hi, thanks for your interest in kSpider! I appreciate your patience on this issue, and I will update it as soon as possible with clear instructions on how to use the dev branch on this project to cluster your sequences. As the project's emphasis has recently shifted towards clustering large datasets rather than individual sequences, this might involve some development time to resolve the issue. Thank you! |
Thanks for the update @mr-eyes! I look forward to trying it out when you have the time to put it together! |
Hi Mo, i just wanted to circle back around on this and see if you have any bandwidth for this issue any time soon. I built another transcriptome today and experienced simultaneous deep loathing for how im clustering isoforms currently and excitement that I could potentially use this approach. I know you're busy but I just wanted to check in. If you don't have bandwidth any time soon, do you think it is possible to do this type of clustering with sourmash (either the CLI or the python API)? I was very excited by the results you shared with me in slack about this type of approach as a proof of concept that the approach works. |
Hi @taylorreiter I believe Branchwater can tackle this with its efficient brute force approach with a scale of 1. The pairwise command is implemented in branchwater here sourmash-bio/sourmash_plugin_branchwater#181 Please let me know if you have further questions. |
@ctb @bluegenes Do have better approaches instead of having a signature for every sequence? |
No (not yet), but the next branchwater plugin release will be able to sketch singletons to make sketching faster, at least |
@taylorreiter I/we would love your feedback if you try out A couple tips:
|
@taylorreiter, then you can write a signature for every sequence through the Sourmash API, and then Branchwater can handle it from here. |
@bluegenes I didn't understand why you need to keep edgeless nodes. Could you please elaborate on why this could be a problem in sparse comparisons? |
Oh, I got it; you mean single-node clusters. |
@mr-eyes it's really just a question of your output expectation. I was wanting all of my original sequences to be represented in the "clusters" output, whether they are singletons or part of a larger cluster. If we are just using the original Then, when we run cluster from this output file, I keep the 'singletons', which would be sketches that show up, but had similarity that did not pass threshold for connecting via an edge. But then the output has singletons, but not singletons that were not in the |
yep! |
Thank you both! I think I should be able to try this approach out next week! I think Mo showed that clustering with kSpider using a threshold of 16 (if I'm interpreting some graphs he shared with me correctly)...and I'm assuming the threshold here means containment of 16% of k-mers. Is this something that sourmash can do now, cluster by containment threshold? (I'll read the docs next week, but I wasn't aware that sourmash had that functionality!) |
@taylorreiter Yes, branchwater can do that. The columns you will get in the output include However, I also recommend exploring the community detection ideas here sourmash-bio/sourmash_plugin_branchwater#252 |
Maybe a related issue sourmash-bio/sourmash#2816 |
I tried to follow the usage example outlined at https://dib-lab.github.io/kSpider/, but the instructions no longer work. Specifically, the indexing step seems to have changed from
kSpider index_kmers...
tokSpider index
, with many of the arguments in the example no longer options to the new command. Would you be willing to provide updated instructions for how to cluster with kSpider? My use case is clustering isoforms in a de novo transcriptome when we have no knowledge of which genes each isoform/contig encodes. All of my transcripts are in a single FASTA file and I would like to predict which encode the same isoforms by clustering.The text was updated successfully, but these errors were encountered: