This repository provides information-reproducibility on how compressible a cassava genome sequence is (TME 204 [article,sequence]) using different data compressors.
The 762,392,783 cassava DNA symbols have been compressed (lossless) to
(*) The base line of 2 bits per symbol is used to calculate the (data compression) Factor, which represents the proportion of the sequence that has been fully compressed and is given by 100-((CompressedBytes*8)/(762392783*2)*100). Experiments ran in a Desktop computer running Linux with Intel® Core™ i7-6700 CPU @ 3.40GHz × 8, 31,2 GiB RAM, and disk of 3 TB. The ranking is given by the lowest number of bytes (Kolmogorov complexity approximation).
Data Compressor | Repository | Description |
---|---|---|
GeCo3 | code | article |
GeCo2 | code | article |
paq8l | code | article |
nncp v3.1 | code | article |
NAF | code | article |
lzma 5.2.5 | code | article |
JARVIS | code | article |
bzip2 1.0.8 | code | article |
MFCompress | code | article |
bsc-m03 v0.2.1 | code | article |
JARVIS2 | code | article |
Change directory and give permitions:
cd scripts/ chmod +x *.sh Install_Tools.sh GetCassava.sh RunCassava.sh