This code base contain the pipeline to process messages encountered during disaster
and classify them using RandomForest
into one of the 36 categories that will define
the response or help to be sent.
The messages are processed using the following:
- Tokenise (
Lemmatisation
) CountVectoriser
TF-IDF
MultiOutputClassifier
usingRandomForest
The project is organised as follows:
-
data Contains the csv file of messages and their categories classification as csv files. It also has
process_data.py
which reads the csv file and prepares the data for the model. -
model Contains the
train_classifier.py
that has the logic to setup, train and evaluate the model using data fromdata
folder -
app This is the flask app to interact with the model. (by Udacity)
-
Run the following commands in the project's root directory to set up your database and model.
- To run ETL pipeline that cleans data and stores in database
python data/process_data.py data/disaster_messages.csv data/disaster_categories.csv data/DisasterResponse.db
- To run ML pipeline that trains classifier and saves
python models/train_classifier.py data/DisasterResponse.db models/classifier.pkl
- To run ETL pipeline that cleans data and stores in database
-
Run the following command in the app's directory to run your web app.
python run.py
-
Go to http://0.0.0.0:3001/