This repository embeds all the tools needed to analyse projects using Sniffer. For a detailed description of the tool and its associated research work, you can refer to the following research papers:
- The rise of Android code smells: Who is to blame?
- On the survival of Android code smells in the wild.
First Step of the analysis, calling SmellDetector on each commit of the selected applications.
Main tool used for creating smell databases with application analysis and querying databases content.
Transformation: GitHub
-> Neo4j
-> (CSV
| Java stream
)
Retrieve data about the GitHub project associated with the android application (e.g. developers, issues, commits, ...).
Perform metrics calculation about the given project (e.g. the number of introduced smells per commit, per devs, ...)
Transformation: CSV
-> CSV
This project is deprecated as it was underperforming by transforming smells CSV files to metrics CSV files.
New module handling all analysis (currently only commits and smells) from Paprika databases
and inserting the results in a PostgreSQL
or an sqlite
database.
Transformation: Java stream
-> PostgreSQL
Selection of deprecated scripts assembling the old tools together. See in the folder for more information.
Before performing any operation on the project, initialize the submodules using the command git submodule init && git submodule update
.
To build all artifacts of this repository use the command ./gradlew packages
.
This section explains the whole process of this toolkit, and propose a few ways of integrating new code smell definitions or datasources.
The process can be cut down as follows:
- The script
projectLooper.sh
clones all input projects, and starts the scriptcommitLooper.sh
with each project as input. - For each commit of the input project,
commitLooper.sh
performs a checkout and callsSmellDetector.jar
on the source code files. - The
SmellDetector.jar
analyses the source code of each commit by going through the following steps:- Send the source code in our
Spoon
processors to generate an AST usingJDT
. - Process the generated AST to create a
Paprika
model. - Persist the model with as much metadata as a graph in a
Neo4j
database. The commits of the same project have all their models stored in one database. That is, by the end of this step, we have aNeo4J
database per project.
- Send the source code in our
- We run
SmellTracker.jar
on each project database to fill aPostgreSQL
database containing valuable data fo all applications. TheSmellTracker
process is detailed in this document, and can be resumed as follows:- Extract the commits data and order from the Git repository and the
Neo4j
database. - Retrieve the branch data and order from the Git repository. This step assures the precision of our smell history tracking.
- Detect code smells by launching queries defined in
SmellDetector
on theNeo4J
database. - Based on the extracted commits order and the detected code smells, track the history of each code smell instance and store it in the
PostgreSQL
database.
- Extract the commits data and order from the Git repository and the
The easiest way to integrate a new code smell in this process would be to create a new definition in SmellDetector
.
This means writing a Neo4j
query relying on the metadata persisted by SmellDetector
.
It may require to add some new metadata in the persisted model, but will not be much of a hassle to integrate.
This is most likely the easiest way, but the less sustainable if the Paprika
model does not fit all your needs.
TODO: Add links & details
If your smell detection are performed before persisting the model and already handled by a visitor pattern,
you may want to add this detection before persisting the model, since Spoon is already working with visitor.
It would then be possible to add new metadata in the SmellDetector
database, to represent the detected code smells.
The point is that you will have less work to do to transform your code smell definition.- However, if you want to track your new code smells you would be required to write a new detection query for them. Those queries should be trivial to write though.
TODO: Add links & details
In the long run, we want to be datasource agnostic, and have a SmellTracker
able to interact with multiple code smell detectors.
This will require a rework of the toolkit on the SmellDetector
, and SmellTracker
to a certain extent.
Due to the fact that SmellTracker
is querying all smells at once to track their history, it will not be possible to stream the information through the two tools.
However, we should be able to dot his by integrating multiple code parsers and smell detection tools to create a model
that could be read by SmellTracker
through an interface defined in SmellDetector
.
TODO: Add details