This repo contains the code for the capstone project of Rob Hazell, John Partee, and Anjli Solsi, with significant contributions and advisement by Dr. John Santerre.
K-mer analysis has been proven to be effective and extremely accurate for classifying bacteria. However, the space grows exponentially with an increase in k-mer size, which is necessary for higher accuracy analysis. In this paper, we explore different tools and compression methods for more space and time efficient analysis, with the goal of reducing the barrier to entry with K-mer analysis, and ultimately releasing our developed tools as a Python package.
At this stage the methodology has been mostly proven, we're exploring ways to speed up the trials now so that we can explore larger k and token sizes.