This project, Mental Health Assistant RAG, was developed to support individuals with mental health concerns by implementing a Retrieval-Augmented Generation (RAG) system. The assistant intelligently answers mental health-related queries by integrating a knowledge base and large language models (LLMs). The system is designed to assist with questions about issues such as depression, trauma, stress, grief, and relationships. By leveraging retrieval and generation capabilities, this project offers users accurate, supportive, and personalized responses, making it a valuable tool for mental health guidance and education.
Mental health is a crucial aspect of well-being, affecting millions of people worldwide. However, individuals seeking guidance often face challenges in finding trustworthy, relevant, and personalized information quickly. The complexity and sensitivity of mental health issues, coupled with the vast amount of available resources, make it difficult for people to get timely, accurate advice.
This project aims to address these challenges by developing an intelligent mental health assistant that can handle complex queries and provide contextually relevant and accurate answers. By leveraging RAG techniques, the system combines the reasoning power of LLMs with the precision of curated mental health knowledge, ensuring that users receive personalized support and guidance based on expert sources. This assistant makes mental health resources more accessible and helps reduce the stigma associated with seeking help, offering a bridge between individuals and the mental health care they need.
- Docker: Containerizes the application to ensure seamless deployment and consistent execution across various environments, making it easier to manage dependencies and configurations.
- Grafana: Provides real-time monitoring and visualization dashboards to track application performance, user interactions, and usage metrics, supporting continuous improvement.
- Streamlit: Offers a simple, interactive user interface that allows users to engage with the Mental Health Assistant and ask mental health-related questions.
- PostgreSQL: Serves as the relational database for storing user questions, responses, and feedback, ensuring data consistency and scalability.
- gemma2-9b-it: Used to reformulate user questions, optimizing them for better clarity and understanding by the assistant.
- llama3-70b-8192: Powers the retrieval-augmented generation process by handling complex mental health queries and delivering accurate, contextually relevant responses.
- mixtral-8x7b-32768: Contributes to processing large volumes of text, refining responses for depth and relevance.
- Pytest: Used for unit and integration testing to ensure the reliability and robustness of the code.
- Git: Version control system for tracking project changes, enabling collaboration, and maintaining code integrity.
- Visual Studio Code: IDE used for coding, debugging, and managing the development workflow.
- Jupyter Notebook: Utilized for exploratory data analysis, prototyping, and preprocessing, offering an interactive platform for data exploration.
- MinSearch: Provides fast, scalable retrieval of information by managing vector indexing and semantic search, enabling more precise query-to-answer matching.
- Groq: Integrated into the system to enhance vector processing efficiency during the search and retrieval phase, improving response time.
The dataset in this project contains information related to mental health queries, structured for use in the knowledge base of the Retrieval-Augmented Generation (RAG) system. The data is stored in the dataset directory and serves as the foundation for generating mental health-related responses in areas such as depression, trauma, stress, and relationship issues.
The backend of the Mental Health Assistant application handles various components of the RAG system, including data ingestion, retrieval, and interaction with the Large Language Models (LLMs). Below is an overview of the key backend files:
1. app.py
- Streamlit app that Processes the user queries by retrieving relevant data from the knowledge base and generating a response using the LLM and gets the user feedback on the responses provided by the assistant to continuously improve the system.
2. rag.py
- Core RAG Logic: Implements the primary logic for the Retrieval-Augmented Generation process.
- Key Functions:
- Query Minsearch: Searches for relevant mental health-related documents in the Minsearch knowledge base.
- Build Prompts: Constructs the input prompt for the LLM based on the retrieved documents.
- Evaluate Answers: Assesses the relevance of the AI-generated answers to ensure they are contextually accurate and meaningful.
3. db.py
- Database Management: Manages interaction with the PostgreSQL database.
- Key Features:
- Initialize Database Schema: Creates the necessary database tables to store conversations and user feedback.
- Save Conversations: Stores question-answer pairs and feedback to analyze user interactions over time.
5. ingest.py
- Document Ingestion: Handles theingestion of the dataset into Minsearch.
- Key Features:
- Text Cleaning: Pre-processes and cleans mental health-related documents before indexing.
- Document Indexing: Pushes the cleaned documents into Elasticsearch for later retrieval during RAG queries.
6. init.py
- Grafana Setup: Configures Grafana for monitoring the system's performance.
- Key Features:
- Data Source Configuration: Sets up PostgreSQL as a data source for tracking system metrics such as query response times, user activity, and feedback analysis.
- Dashboard Initialization: Initializes Grafana dashboards for visualizing key metrics like query efficiency, response accuracy, and feedback trends.
6. Dockerfile
- Base Image : Uses python:3.12-slim as the base image for a lightweight container.
- Working Directory : Sets the working directory to /app.**
- Pipenv Installation : Installs pipenv to manage dependencies.
- Data and Dependency Copying : Copies the dataset and dependency files into the container.
- Streamlit Launch: Specifies the command to run the Streamlit app on port 8501.
- Service Definitions: Sets up three services: PostgreSQL, Streamlit, and Grafana.
- PostgreSQL Configuration: Defines environment variables for the database setup.
- Streamlit Configuration: Builds the Streamlit service using the Dockerfile, specifying dependencies and environment variables.
- Grafana Configuration: Sets up Grafana with the necessary environment variables and dependencies.
This backend architecture supports the Mental Health Assistant application by efficiently processing user queries, generating contextually relevant mental health responses, and providing insightful metrics for system performance monitoring.
This guide outlines how to run the Mental Health Assistant application, configure the database, and interact with the system using Docker, Docker Compose, and local environments.
Before starting the application for the first time, the PostgreSQL database needs to be initialized.
To run PostgreSQL using Docker Compose, execute the following command:
docker-compose up postgres
After PostgreSQL is running, initialize the database schema by running the db_prep.py
script:
pipenv shell
cd mental_health_assistant
export POSTGRES_HOST=localhost
python run app.py
To inspect the contents of the database, you can use pgcli
, which is installed through pipenv
. Access the PostgreSQL instance with the following command:
pipenv run pgcli -h localhost -U your_username -d mental_health_assistant -W
Once inside pgcli
, you can view the schema with the \d
command:
\d conversations;
To select and display data from the conversations
table:
SELECT * FROM conversations;
The easiest way to run the application is by using Docker Compose. This command will bring up the services defined in the docker-compose.yaml
file, including PostgreSQL, Grafana, and the Streamlit application:
docker-compose up
If you want to run the application locally without Dockerizing the entire environment, you can start only the PostgreSQL and Grafana services using Docker Compose:
docker-compose up postgres grafana
If you previously started all services using docker-compose up
, stop the Streamlit application before proceeding:
docker-compose stop streamlit
Now, you can run the application on your host machine:
pipenv shell
cd mental_health_assistant
export POSTGRES_HOST=localhost
python app.py
You may want to run the application in Docker without Docker Compose, especially for debugging or development purposes.
Before running the application standalone in Docker, ensure the environment is properly set up by running Docker Compose as explained above.
To build the Docker image manually, run the following command:
docker build -t mental-health-assistant .
To run the application in a Docker container without using Docker Compose, use this command:
docker run -it --rm \
--network="mental_health_assistant_default" \
--env-file=".env" \
-e GROQ_API_KEY=${GROQ_API_KEY} \
-e DATA_PATH="dataset/data.csv" \
-p 8501:8501 \
mental-health-assistant
This will run the container, exposing the Streamlit application on port 8501
.
For experiments, we use Jupyter notebooks. They are in the notebooks folder.
To start Jupyter, run:
cd notebooks
pipenv run jupyter notebook
We have the following notebooks:
data-generation.ipynb: Generating the ground truth dataset for retrieval evaluation.
rag-test.ipynb: The RAG flow and evaluating the system.
Retrieval performance is critical in ensuring that the Mental Health Assistant provides accurate and relevant answers to user queries. The evaluation was carried out using two versions of our retrieval system: the basic version (without boosting) and the improved version (with tuned boosting parameters).
The initial retrieval system, without any additional boosting mechanisms, yielded the following results:
-
Hit Rate: 92%
The hit rate measures the proportion of user queries that successfully retrieve at least one relevant document from the knowledge base. A 92% hit rate means that 92 out of every 100 queries returned relevant results. -
MRR (Mean Reciprocal Rank): 71%
The MRR measures the effectiveness of the retrieval system by calculating the average of the reciprocal ranks of the relevant documents retrieved. In this case, the MRR of 71% indicates that relevant documents are generally ranked well but with some room for improvement.
By adjusting the boosting parameters, the retrieval performance improved significantly. Boosting helps prioritize more relevant results by increasing the weight of important features (e.g., giving more emphasis to certain keywords in queries or documents).
-
Hit Rate: 97%
The hit rate increased to 97%, demonstrating a much higher likelihood of retrieving relevant results for user queries. -
MRR: 72%
A slight improvement in the MRR shows that the tuned system ranks relevant results marginally better than the basic version.
The best performance was achieved with the following boosting configuration:
- Boost Parameters:
questions
: 1.0answers
: 1.0num_results
: 20
This provided a balanced approach by equally boosting the significance of both questions and answers, and by returning up to 20 results for each query, ensuring more diverse retrieval options.
In addition to retrieval performance, the entire Retrieval-Augmented Generation (RAG) flow was evaluated. This flow combines retrieved documents with a Large Language Model (LLM) to generate relevant responses to user queries.
We used the LLM-as-a-Judge metric to evaluate the relevance of the answers generated by the RAG system. This metric involves having the LLM assess its own responses, classifying them into three categories:
- RELEVANT: The response fully answers the query based on the retrieved documents.
- PARTLY_RELEVANT: The response partially addresses the query but lacks full relevance.
- NON_RELEVANT: The response is not relevant to the query.
For this evaluation, we used the Mixtral-8x7b-32768 model, a powerful language model optimized for understanding and generating natural language.
-
348 RELEVANT
The vast majority of responses generated by the RAG system were classified as fully relevant, meaning they accurately addressed the user queries. -
17 PARTLY_RELEVANT
A small percentage of the responses were only partially relevant, indicating that the system could still improve its handling of some queries or edge cases. -
1 NON_RELEVANT
Only one response was deemed non-relevant, showing that the system is highly reliable in producing useful responses.
The following chart illustrates the distribution of the relevance categories for the responses generated by the Mixtral-8x7b-32768 model:
For comparison, we also tested the system using the Llama LLM, which produced the following results:
317 RELEVANT This model generated relevant responses for most user queries.
31 NON_RELEVANT A higher rate of irrelevant responses compared to the Mixtral model, indicating that this model is more prone to generating inappropriate answers.
18 PARTLY_RELEVANT Some responses were only partially relevant to the query.
The following chart illustrates the distribution of the relevance categories for the responses generated by the Llama model:
We use Grafana for monitoring the application.All Grafana configurations are in the grafana folder:
- init.py - for initializing the datasource and the dashboard.
- dashboard.json - the actual dashboard (taken from LLM Zoomcamp without changes). To initialize the dashboard, first ensure Grafana is running (it starts automatically when you do docker-compose up). Then run:
pipenv shell
cd grafana
env | grep POSTGRES_HOST
python init.py
Access Grafana at localhost:3000 with the default credentials (admin/admin).
The monitoring dashboard contains several panels:
- Last 5 Conversations (Table): Displays a table showing the five most recent conversations, including details such as the question, answer, relevance, and timestamp. This panel helps monitor recent interactions with users.
- +1/-1 (Pie Chart): A pie chart that visualizes the feedback from users, showing the count of positive (thumbs up) and negative (thumbs down) feedback received. This panel helps track user satisfaction.
- Relevancy (Gauge): A gauge chart representing the relevance of the responses provided during conversations. The chart categorizes relevance and indicates thresholds using different colors to highlight varying levels of response quality.
- Model Used (Bar Chart): A bar chart displaying the count of conversations based on the different models used. This panel provides insights into which AI models are most frequently used.
- Response Time (Time Series): A time series chart showing the response time of conversations over time. This panel is useful for identifying performance issues and ensuring the system's responsiveness.
- Fork the repository
- Create your feature branch (git checkout -b feature/AmazingFeature)
- Commit your changes (git commit -m 'Add some AmazingFeature')
- Push to the branch (git push origin feature/AmazingFeature)
- Open a Pull Request
- Issue Tracker: Use GitHub Issues for bug reports and feature requests.
For questions, suggestions, or collaboration opportunities:
Check out my latest articles here: