Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to get MeShClust v3.0.0 #10

Open
nvucic opened this issue Sep 21, 2022 · 3 comments
Open

How to get MeShClust v3.0.0 #10

nvucic opened this issue Sep 21, 2022 · 3 comments

Comments

@nvucic
Copy link

nvucic commented Sep 21, 2022

Sorry there must be some resource I'm missing but could not find the latest v3.0.0

@nvucic nvucic changed the title How to get v3.0.0 How to get MeShClust v3.0.0 Sep 21, 2022
@hani-girgis
Copy link
Member

This is the right repository. Please follow the posted instructions. Once compilation is done, you will find identity and meshclust v3.0.

@sguizard
Copy link

@hani-girgis Thanks for this tool.
I think the confusion come from the version displayed by the program when it run. It shows MeShClust v2.0.

MeShClust 2.0 is developed by Hani Z. Girgis, PhD.

This program clusters DNA sequences using identity scores obtained without alignment.

Copyright (C) 2021-2022 Hani Z. Girgis, PhD

Academic use: Affero General Public License version 1.

Any restrictions to use for profit or non-academics: Alternative commercial license is required.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Please contact Dr. Hani Z. Girgis ([email protected]) if you need more information.

Please cite the following papers: 
	MeShClust v3.0: High-quality clustering of DNA sequences using the mean shift algorithm
	and alignment-free identity scores (2022). Hani Z. Girgis, BMC Genomics, 23(1):423.

	Identity: Rapid alignment-free prediction of sequence alignment identity scores using
	self-supervised general linear models (2021). Hani Z. Girgis, Benjamin T. James, and
	Brian B. Luczak. NAR Genom Bioinform, 13(1), lqab001.

	A survey and evaluations of histogram-based statistics in alignment-free sequence
	comparison (2019). Brian B. Luczak, Benjamin T. James, and Hani Z. Girgis. Briefings
	in Bioinformatics, 20(4):1222–1237.

	MeShClust: An intelligent tool for clustering DNA sequences (2018). Benjamin T. James,
	Brian B. Luczak, and Hani Z. Girgis. Nucleic Acids Res, 46(14):e83.

Database file: mono.fasta
Output file: test.txt
Cores: 16
Provided threshold: 0.8
Block size for all vs. all: 25000
Block size for reading sequences: 100000
Number of data passes: 10
Can assign all: No


Average: 2273
K: 5
Histogram size: 1024
A histogram entry is 16 bits.
Generating data.
Preparing data ...
	Positive examples: 10000
	Training size: 5000
	Validation size: 5000
Better performance of: 0.00155154
	jeffrey_divergence x simMM
Better performance of: 0.0012286
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.00110226
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.0010716
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.00103351
	jeffrey_divergence
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.000955872
	minkowski
	jeffrey_divergence
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.000905404
	minkowski
	jeffrey_divergence
	chi_squared x sim_ratio
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
Better performance of: 0.000880007
	minkowski
	jeffrey_divergence
	chi_squared x sim_ratio
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
	squared_chord^2 x simMM^2
Better performance of: 0.000835517
	minkowski
	jeffrey_divergence
	chi_squared x sim_ratio
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
	squared_chord^2 x sim_ratio^2
	squared_chord^2 x simMM^2
Better performance of: 0.000806042
	minkowski
	jeffrey_divergence
	chi_squared x sim_ratio
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
	sim_ratio x d2_s_r^2
	chi_squared^2 x d2_s_r^2
	squared_chord^2 x sim_ratio^2
	squared_chord^2 x simMM^2
Selected statistics:
	minkowski
	jeffrey_divergence
	chi_squared x sim_ratio
	minkowski x sim_ratio^2
	minkowski x simMM^2
	correlation x d2_s_r^2
	jeffrey_divergence x simMM
	sim_ratio x d2_s_r^2
	chi_squared^2 x d2_s_r^2
	squared_chord^2 x sim_ratio^2
	squared_chord^2 x simMM^2
Finished training.
	MAE: 0.0177417
	MSE: 0.000806042
Optimizing ...
Validating ...
	MAE: 0.0231115
	MSE: 0.00118335

Clustering ... 

Data run 1 ...
	Processed sequences: 13486
	Unprocessed sequences: 0
	Found centers: 149

Assigning ...
Finished.

Thanks for using MeShClust v2.0. Please post any questions or problems on GitHub: 
https://github.com/BioinformaticsToolsmith/Identity or email Dr. Hani Z. Girgis.

@simonorozcoarias
Copy link

Hi @hani-girgis Thank you for this amazing tool. I am trying to get MeShClust 3.0 from the last release (V.2.0). Nevertheless, after sucessfully compile identity, MeShClust is not appearing. Actually, I was looking for the MeShClust source code and is not even there.

I also tried to get MeShClust 3.0 from the master branch, but it is generating the identity v.1.2 and MeShClust v.2.0.

So, how can I get the MeShClust version 3.0?

Thank you for the help.

Best,

Simon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants