Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wandbot v1.3 - There be changes ahead #85

Open
wants to merge 106 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
106 commits
Select commit Hold shift + click to select a range
4d23861
Upgrade to python 3.11, remove poetry for uv, move to lazy loading in…
morganmcg1 Dec 26, 2024
ed0b3d9
ruff fixes
morganmcg1 Dec 26, 2024
284ad1d
run black
morganmcg1 Dec 26, 2024
94adeca
update to weave.op
morganmcg1 Dec 26, 2024
324164e
Update pyproject.toml
morganmcg1 Dec 26, 2024
6cbabe3
remove poetry.lock
morganmcg1 Dec 26, 2024
960c2a4
add disk-usage route
morganmcg1 Dec 26, 2024
f9bceb2
fix disk usage
morganmcg1 Dec 26, 2024
25941a9
add no-user to installs
morganmcg1 Dec 26, 2024
47bc78b
simple, newer .replit file
morganmcg1 Dec 26, 2024
143a06c
Update README
morganmcg1 Dec 26, 2024
182c667
add clear pip cache
morganmcg1 Dec 26, 2024
8eae613
add disk usage increment logging
morganmcg1 Dec 26, 2024
5d40093
Add dotenv for better devX
morganmcg1 Dec 26, 2024
a520738
add bulid.sh debug disk usage logging
morganmcg1 Dec 26, 2024
380527d
add top 20 disk usage logging
morganmcg1 Dec 26, 2024
8d81f01
tidy up
morganmcg1 Dec 26, 2024
555d433
add wandb cache cleanup
morganmcg1 Dec 26, 2024
eaeedc5
readme
morganmcg1 Dec 26, 2024
825741e
Update readme
morganmcg1 Dec 27, 2024
a8cefd7
Add args and configs to eval script, update reqs, update readme
morganmcg1 Dec 27, 2024
eea41f0
update env var, rename main.py to eval
morganmcg1 Dec 27, 2024
e06ab67
Remove langchain-cohere
morganmcg1 Dec 27, 2024
230e0bf
disable feedback logging for now
morganmcg1 Dec 27, 2024
41f2bfb
Readme
morganmcg1 Dec 27, 2024
822ff1a
remove feedback
morganmcg1 Dec 31, 2024
a58d27a
remove poetry lock and eval deps
morganmcg1 Dec 31, 2024
b0893be
add eval_requiremnts.txt
morganmcg1 Dec 31, 2024
bf65a39
OpenHands: Add native chromadb implementation with optimized MMR search
openhands-agent Jan 1, 2025
8f47776
OpenHands: Switch to native chromadb with numpy 2.2.0 support
openhands-agent Jan 1, 2025
03fb31b
OpenHands: Update chromadb to 0.6.0 for numpy 2.2.0 compatibility
openhands-agent Jan 1, 2025
a3a1fdd
python 3.12, add retries for evals
morganmcg1 Jan 1, 2025
8567ba1
Merge branch 'make_wandbot_great_again' of https://github.com/wandb/w…
morganmcg1 Jan 1, 2025
17df3fb
update eval naming and logging
morganmcg1 Jan 2, 2025
dcdb23d
Add validation error retry to query enhancer chain
morganmcg1 Jan 2, 2025
914a1e9
update default index artifact, add config logging to evals
morganmcg1 Jan 2, 2025
6dc42da
remove emojis from disk usage message
morganmcg1 Jan 3, 2025
7c2d779
formatting
morganmcg1 Jan 3, 2025
b43c2d6
update readme
morganmcg1 Jan 3, 2025
cf8a633
modify similarity search in retrieval
morganmcg1 Jan 3, 2025
e0a54f4
Replace langchain-chroma with native ChromaDB implementation
openhands-agent Jan 4, 2025
6ace947
remove mistaken openhands chroma commit
morganmcg1 Jan 5, 2025
8c0ea03
rename chroma wrapper
morganmcg1 Jan 5, 2025
e91d81b
Fix native ChromaDB implementation to match langchain-chroma behavior
openhands-agent Jan 5, 2025
0ce71f5
remove langchain embeddings, add native embeddings models
morganmcg1 Jan 5, 2025
8b5b856
make EmbeddingsModel callable
morganmcg1 Jan 5, 2025
60c66f3
Centralise all configs
morganmcg1 Jan 6, 2025
4ad024c
fix configs
morganmcg1 Jan 6, 2025
66313fc
prompt updates
morganmcg1 Jan 6, 2025
61ddd45
update readme
morganmcg1 Jan 6, 2025
f6e4f7b
increase retries for query handler
morganmcg1 Jan 6, 2025
255cd3f
Add e2b dockerfile
morganmcg1 Jan 8, 2025
80278d4
update docker file
morganmcg1 Jan 8, 2025
0da7bdc
docker fixes
morganmcg1 Jan 8, 2025
b6fd1d6
Update readme and dockerfile
morganmcg1 Jan 9, 2025
d52c23d
add docker temp dir clearnup
morganmcg1 Jan 9, 2025
c002c88
Use code interpreter with python 3.12
jakubno Jan 9, 2025
67a7005
Merge pull request #87 from jakubno/make_wandbot_great_again
morganmcg1 Jan 9, 2025
a3d0714
fix retry in query handler
morganmcg1 Jan 9, 2025
51afeb4
tidy up configs app routes, tidy up web search
morganmcg1 Jan 9, 2025
3aa0103
Add async methods through entire RAG pipeline
morganmcg1 Jan 12, 2025
b56e0eb
fix up evals script
morganmcg1 Jan 12, 2025
8160686
evals import fix
morganmcg1 Jan 12, 2025
700efa0
evals config updates
morganmcg1 Jan 13, 2025
c756f8d
gitignire
morganmcg1 Jan 13, 2025
ffcdc74
modify QueryEnhancer prompts
morganmcg1 Jan 15, 2025
6d93e4f
Eval: pass experiment name to wandbot call; update eval config import
morganmcg1 Jan 16, 2025
d50618e
tidy up response synthesis args and update readme
morganmcg1 Jan 19, 2025
174d9f8
commit for now
morganmcg1 Jan 19, 2025
1ce0f59
undo QueryEnhancer prompt changes for eval
morganmcg1 Jan 19, 2025
12d0b9a
update index to chroma v34 for eval
morganmcg1 Jan 19, 2025
c9a205c
fix app to eval config
morganmcg1 Jan 19, 2025
41dafc5
quieten disk usage logs on startup
morganmcg1 Jan 19, 2025
96a58f2
Add de-dupe of retrieved contexts before re-ranking
morganmcg1 Jan 19, 2025
4dca364
Experiment: try fetch_k = 20
morganmcg1 Jan 20, 2025
6ddc48a
Downgrade some requirements for old compatibility
morganmcg1 Jan 20, 2025
b1e45cf
Adds float or base64 encoding option to EmbeddingModel, set config to…
morganmcg1 Jan 20, 2025
417df7c
actually switch embedding encoding format to base64, update eval.py f…
morganmcg1 Jan 20, 2025
212d32e
change search type from mmr to similarity
morganmcg1 Jan 20, 2025
ef2111e
Tidy up retriever and configs
morganmcg1 Jan 22, 2025
4bbe178
Implement hacky MMR for langchain MMR equivlancy
morganmcg1 Jan 22, 2025
c8b4648
Decompose EmbeddingModel into provider-specific classes
morganmcg1 Jan 22, 2025
cf4636b
fix EmbeddingCall
morganmcg1 Jan 22, 2025
dcc379a
Return to fetch_k 60 for experiment.
morganmcg1 Jan 22, 2025
c64a382
Update config naming
morganmcg1 Jan 22, 2025
8b7d067
Add rich logging
morganmcg1 Jan 24, 2025
8d93cc3
Remove llamaindex, langchain and Raga eval dependencies
morganmcg1 Jan 24, 2025
4f0fab1
remove llama_index, ragas package deps
morganmcg1 Jan 24, 2025
73da804
base64 handling in EmbeddingModel
morganmcg1 Jan 24, 2025
0582a58
silence prints in hacky mmr
morganmcg1 Jan 24, 2025
840427d
Add Evaluation tests
morganmcg1 Jan 24, 2025
a28274f
Re-organise evaluation folder, keep relevancy, faithfulness prompts
morganmcg1 Jan 24, 2025
7777783
Update rich console log style
morganmcg1 Jan 24, 2025
999e6b3
Tidy up evaluation output format
morganmcg1 Jan 24, 2025
f58621e
Add LLMModel class to swap out model providers
morganmcg1 Jan 24, 2025
5655d1c
Fix up LLLModel and correctness evaluator
morganmcg1 Jan 24, 2025
9ba48b8
Better Evaluation and LLM parsing and error handling
morganmcg1 Jan 24, 2025
8e13842
Switch eval prompt back to use "reason"
morganmcg1 Jan 25, 2025
89f7a90
add anthropic to requirements.txt
morganmcg1 Jan 25, 2025
5b470cd
Remove langchain from QueryEnhancer
morganmcg1 Jan 25, 2025
3fceb99
Fix anthropic response_model parsing
morganmcg1 Jan 25, 2025
9781cfa
modify eval message
morganmcg1 Jan 25, 2025
b235d90
Update QeuryEnhancer and tests
morganmcg1 Jan 25, 2025
25e4d35
Update tests
morganmcg1 Jan 25, 2025
7545e37
update .gitignore
morganmcg1 Jan 25, 2025
bc093d7
Add more logging for api call error
morganmcg1 Jan 25, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,11 +1,19 @@
# Byte-compiled / optimized / DLL files
__pycache__/
*.
*.py[cod]
*$py.class

temp_index/
e2b*

# C extensions
*.so

testing_*.py
testing_*.ipynb
temp_*

# Distribution / packaging
.Python
build/
Expand Down Expand Up @@ -105,6 +113,8 @@ celerybeat.pid
# Environments
.env
.venv
wandbot_venv/
3-10_env/
env/
venv/
ENV/
Expand Down
19 changes: 8 additions & 11 deletions .replit
Original file line number Diff line number Diff line change
@@ -1,19 +1,16 @@
run = "bash run.sh"
entrypoint = "main.py"
modules = ["python-3.10:v18-20230807-322e88b"]
modules = ["python-3.12"]

disableInstallBeforeRun = true
[nix]
channel = "stable-24_05"

hidden = [".pythonlibs"]
[unitTest]
language = "python3"

[nix]
channel = "stable-23_05"
[gitHubImport]
requiredFiles = [".replit", "replit.nix"]

[deployment]
run = ["sh", "-c", "bash run.sh"]
build = ["sh", "-c", "bash build.sh"]
deploymentTarget = "gce"

[[ports]]
localPort=8000
externalPort=80
build = ["sh", "-c", "bash build.sh"]
203 changes: 139 additions & 64 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,9 +1,24 @@
# wandbot

Wandbot is a question-answering bot designed specifically for Weights & Biases [documentation](https://docs.wandb.ai/).
WandBot is a question-answering bot designed specifically for Weights & Biases Models and Weave [documentation](https://docs.wandb.ai/).

## What's New

### wandbot v1.3.0
**New:**

- **Move to uv for package management**: Installs and dependency checks cut down from minutes to seconds
- **Support python 3.12 on replit**
- **Move to lazing loading in app.py to help with startup**: Replit app deployments can't seen to handle the delay from loading the app, despite attempting async or background tasks
- **Add wandb artifacts cache cleanup**: Saved 1.2GB of disk space
- **Turn off web search**: Currently we don't have a web search provider to use.
- **Refactored EvalConfig and evals script**: Switched config to using simple_parsing for free cli arguments. Added n_trials, debug mode. Undid hardcoding of ja weave eval dataset.
- **Removed langchain-cohere**: Started hitting dependency errors, removed it in favor of raw cohere client.
- **wandb Tables Feedback logging disabled in prep for Weave feedback**
- **Small formatting updates for weave.op**
- **Add dotenv in app.py for easy env var loads**


### wandbot v1.2.0

This release introduces a number of exciting updates and improvements:
Expand Down Expand Up @@ -35,76 +50,69 @@ Japanese

## Features

- Wandbot employs Retrieval Augmented Generation with a ChromaDB backend, ensuring efficient and accurate responses to user queries by retrieving relevant documents.
- WandBot uses:
- a local ChromaDB vector store
- OpenAI's v3 embeddings
- GPT-4 for query enhancement and response synthesis
- Cohere's re-ranking model
- It features periodic data ingestion and report generation, contributing to the bot's continuous improvement. You can view the latest data ingestion report [here](https://wandb.ai/wandbot/wandbot-dev/reportlist).
- The bot is integrated with Discord and Slack, facilitating seamless integration with these popular collaboration platforms.
- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Tables. Visit the workspace for more details [here](https://wandb.ai/wandbot/wandbot_public).
- Wandbot has a fallback mechanism for model selection, which is used when GPT-4 fails to generate a response.
- The bot's performance is evaluated using a mix of metrics, including retrieval accuracy, string similarity, and the correctness of model-generated responses.
- Curious about the custom system prompt used by the bot? You can view the full prompt [here](data/prompts/chat_prompt.json).
- Performance monitoring and continuous improvement are made possible through logging and analysis with Weights & Biases Weave
- Has a fallback mechanism for model selection

## Installation

The project is built with Python version `>=3.10.0,<3.11` and utilizes [poetry](https://python-poetry.org/) for managing dependencies. Follow the steps below to install the necessary dependencies:
The project is built with Python version `3.12` and utilizes `uv` for dependency management. Follow the steps below to install the necessary dependencies:

```bash
git clone [email protected]:wandb/wandbot.git
pip install poetry
cd wandbot
poetry install --all-extras
# Depending on which platform you want to run on run the following command:
# poetry install --extras discord # for discord
# poetry install --extras slack # for slack
# poetry install --extras api # for api
bash build.sh
```

## Usage

### Data Ingestion

The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
To ingest the data run the following command from the root of the repository
```bash
poetry run python -m src.wandbot.ingestion
```
You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.
These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).


### Running the Q&A Bot
### Running WandBot

Before running the Q&A bot, ensure the following environment variables are set:

```bash
OPENAI_API_KEY
COHERE_API_KEY
WANDB_API_KEY
WANDBOT_API_URL="http://localhost:8000"
WANDB_TRACING_ENABLED="true"
LOG_LEVEL=INFO
WANDB_PROJECT="wandbot-dev"
WANDB_ENTITY= <your W&B entity>

```

If you're running the slack or discord apps you'll also need the following keys/tokens set as env vars:

```
SLACK_EN_APP_TOKEN
SLACK_EN_BOT_TOKEN
SLACK_EN_SIGNING_SECRET
SLACK_JA_APP_TOKEN
SLACK_JA_BOT_TOKEN
SLACK_JA_SIGNING_SECRET
WANDB_API_KEY
DISCORD_BOT_TOKEN
COHERE_API_KEY
WANDBOT_API_URL="http://localhost:8000"
WANDB_TRACING_ENABLED="true"
WANDB_PROJECT="wandbot-dev"
WANDB_ENTITY="wandbot"
```

Once these environment variables are set, you can start the Q&A bot application using the following commands:
Then build the app to install all dependencies in a virtual env.

```
bash build.sh
```

Start the Q&A bot application using the following commands:

```bash
(poetry run uvicorn wandbot.api.app:app --host="0.0.0.0" --port=8000 > api.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l en > slack_en_app.log 2>&1) & \
(poetry run python -m wandbot.apps.slack -l ja > slack_ja_app.log 2>&1) & \
(poetry run python -m wandbot.apps.discord > discord_app.log 2>&1)
bash run.sh
```

You might need to then call the endpoint to trigger the final wandbot app initialisation:
Then call the endpoint to trigger the final wandbot app initialisation:
```bash
curl http://localhost:8000/
curl http://localhost:8000/startup
```

For more detailed instructions on installing and running the bot, please refer to the [run.sh](./run.sh) file located in the root of the repository.
Expand All @@ -113,44 +121,111 @@ Executing these commands will launch the API, Slackbot, and Discord bot applicat

### Running the Evaluation pipeline

Make sure to set the environments in your terminal.
**Eval Config**

Modify the evaluation config file here: `wandbot/src/wandbot/evaluation/config.py`

`evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
`eval_dataset` :
- [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
- [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA"
`eval_judge_model` : model used for judge
`wandb_entity` : wandb entity name for record
`wandb_project` : wandb project name for record

**Dependencies**

Ensure wandbot is installed by installing the production depenencies, activate the virtual env that was created and then install the evaluation dependencies

```
bash build.sh
source wandbot_venv/bin/activate
uv pip install -r eval_requirements.txt
poetry install
```
set -o allexport; source .env; set +o allexport

**Environment variables**

Make sure to set the environment variables (i.e. LLM provider keys etc) from the `.env` file.

**Launch the wandbot app**
You can either use `uvicorn` or `gunicorn` to launch N workers to be able to serve eval requests in parallel. Note that weave Evaluations also have a limit on the number of parallel calls make, set via the `WEAVE_PARALLELISM` env variable, which is set further down in the `eval.py` file using the `n_weave_parallelism` flag. Launch wandbot with 8 workers for faster evaluation. The `WANDBOT_FULL_INIT` env var triggers the full wandbot app initialization.

`uvicorn`
```bash
WANDBOT_FULL_INIT=1 uvicorn wandbot.api.app:app \
--host 0.0.0.0 \
--port 8000 \
--workers 8 \
--timeout-keep-alive 75 \
--loop uvloop \
--http httptools \
--log-level debug
```

alternatively you can also run wandbot with `gunicorn`:

```bash
WANDBOT_FULL_INIT=1 \
./wandbot_venv/bin/gunicorn wandbot.api.app:app \
--preload \
--bind 0.0.0.0:8000 \
--timeout=200 \
--workers=20 \
--worker-class uvicorn.workers.UvicornWorker
```

Launch the wandbot with 8 workers. This speeds up evaluation
Testing: You can test that the app is running correctly by making a request to the `chat/query` endpoint, you should receive a response payload back from wandbot after 30 - 90 seconds:

```bash
curl -X POST \
http://localhost:8000/chat/query \
-H 'Content-Type: application/json' \
-d '{"question": "How do I log a W&B artifact?"}'
```
WANDBOT_EVALUATION=1 gunicorn wandbot.api.app:app --bind 0.0.0.0:8000 --timeout=200 --workers=8 --worker-class uvicorn.workers.UvicornWorker

**Debugging**
For debugging purposes during evaluation you can run a single instance of the app by chaning the `uvicorn` command above to use `--workers 1`
```

**Run the evaluation**

Launch W&B Weave evaluation in the root `wandbot` directory. Ensure that you're virtual envionment is active. By default, a sample will be evaluated 3 times in order to account for both the stochasticity of wandbot and our LLM judge. For debugging, pass the `--debug` flag to only evaluate on a small number of samples. To adjust the number of parallel evaluation calls weave makes use the `--n_weave_parallelism` flag when calling `eval.py`

Set up for evaluation
```
source wandbot_venv/bin/activate

wandbot/src/wandbot/evaluation/config.py
- `evaluation_strategy_name` : attribute name in Weave Evaluation dashboard
- `eval_dataset` :
- [Latest English evaluation dataset](https://wandb.ai/wandbot/wandbot-eval/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval%2Fobjects%2Fwandbot_eval_data%2Fversions%2FeCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU%3F%26): "weave:///wandbot/wandbot-eval/object/wandbot_eval_data:eCQQ0GjM077wi4ykTWYhLPRpuGIaXbMwUGEB7IyHlFU"
- [Latest Japanese evaluation dataset](https://wandb.ai/wandbot/wandbot-eval-jp/weave/datasets?peekPath=%2Fwandbot%2Fwandbot-eval-jp%2Fobjects%2Fwandbot_eval_data_jp%2Fversions%2FoCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA%3F%26): "weave:///wandbot/wandbot-eval-jp/object/wandbot_eval_data_jp:oCWifIAtEVCkSjushP0bOEc5GnhsMUYXURwQznBeKLA"
- `eval_judge_model` : model used for judge
- `wandb_entity` : wandb entity name for record
- `wandb_project` : wandb project name for record
python src/wandbot/evaluation/weave_eval/eval.py
```

Debugging, only running evals on 1 sample and for 1 trial:

Launch W&B Weave evaluation
```
python src/wandbot/evaluation/weave_eval/main.py
python src/wandbot/evaluation/weave_eval/eval.py --debug --n_debug_samples=1 --n_trials=1
```

## Overview of the Implementation
Evaluate on Japanese dataset:

1. Creating Document Embeddings with ChromaDB
2. Constructing the Q&A RAGPipeline
3. Selection of Models and Implementation of Fallback Mechanism
4. Deployment of the Q&A Bot on FastAPI, Discord, and Slack
5. Utilizing Weights & Biases Tables for Logging and Analysis
6. Evaluating the Performance of the Q&A Bot
```
python src/wandbot/evaluation/weave_eval/eval.py --lang ja
```

To only evaluate each sample once:

```
python src/wandbot/evaluation/weave_eval/eval.py --n_trials 1
```


### Data Ingestion

You can monitor the usage of the bot in the following project:
https://wandb.ai/wandbot/wandbot_public
The data ingestion module pulls code and markdown from Weights & Biases repositories [docodile](https://github.com/wandb/docodile) and [examples](https://github.com/wandb/examples) ingests them into vectorstores for the retrieval augmented generation pipeline.
To ingest the data run the following command from the root of the repository

```bash
python -m wandbot.ingestion
```

You will notice that the data is ingested into the `data/cache` directory and stored in three different directories `raw_data`, `vectorstore` with individual files for each step of the ingestion process.

These datasets are also stored as wandb artifacts in the project defined in the environment variable `WANDB_PROJECT` and can be accessed from the [wandb dashboard](https://wandb.ai/wandb/wandbot-dev).
68 changes: 65 additions & 3 deletions build.sh
Original file line number Diff line number Diff line change
@@ -1,4 +1,66 @@
pip install fasttext && \
poetry install --without dev --all-extras && \
poetry build && \
echo "Running build.sh"
set -x # Enable command echo
set -e # Exit on error

# Debug disk usage
du -sh .
top_usage=$(du -ah . | sort -rh | head -n 20)
current_dir_usage=$(du -sm . | awk '{print $1}')
echo -e "Current directory usage: ${current_dir_usage}M"
echo -e "Top files/dirs usage: ${top_usage}\n"

# Find libstdc++ to use
for dir in /nix/store/*-gcc-*/lib64 /nix/store/*-stdenv-*/lib /nix/store/*-libstdc++*/lib; do
echo "Checking directory: $dir" # Add this line for debugging
if [ -f "$dir/libstdc++.so.6" ]; then
export LD_LIBRARY_PATH="$dir:$LD_LIBRARY_PATH"
echo "Found libstdc++.so.6 in $dir"
break
fi
done

# Create virtualenv & set up
rm -rf .venv
python3.12 -m venv wandbot_venv --clear
export VIRTUAL_ENV=wandbot_venv
export PATH="$VIRTUAL_ENV/bin:$PATH"
export PYTHONPATH="$(pwd)/src:$PYTHONPATH"

# Use uv for faster installs
pip install --no-user pip uv --upgrade

# Clear any existing installations that might conflict
rm -rf $VIRTUAL_ENV/lib/python*/site-packages/typing_extensions*
rm -rf $VIRTUAL_ENV/lib/python*/site-packages/pydantic*
rm -rf $VIRTUAL_ENV/lib/python*/site-packages/fastapi*

# Install dependencies
uv pip install -r requirements.txt --no-cache

# Re-install problematic package
uv pip install --no-cache-dir --force-reinstall typing_extensions==4.12.2

# Install app
uv pip install . --no-deps

# Check if the package is installed correctly
python -c "import wandbot; print('Wandbot package installed successfully')"

# Free up disk space
pip cache purge

mkdir -p ./data/cache

# Debug information
echo "LD_LIBRARY_PATH: $LD_LIBRARY_PATH"
ls -la $LIBSTDCXX_DIR/libstdc++.so* || true
ldd $VIRTUAL_ENV/lib/python*/site-packages/pandas/_libs/*.so || true

# Debug disk usage
du -sh .
top_usage=$(du -ah . | sort -rh | head -n 20)
current_disk_usage=$(du -sm . | awk '{print $1}')
echo -e "Current directory usage: ${current_dir_usage}M"
echo -e "Top files/dirs usage: ${top_usage}\n"
increment=$((current_disk_usage - initial_disk_usage))
echo -e "Disk usage increment: ${increment}M\n"
Loading
Loading