Skip to content

Latest commit

 

History

History
52 lines (28 loc) · 2.55 KB

README.md

File metadata and controls

52 lines (28 loc) · 2.55 KB

CALMS : Context-Aware Language Model for Science

CALMS is a retrieval and tool augmented large language model (LLM) to assist scientists, design experiments around and perform science using complex scientific instrumentation.

Paper: https://www.nature.com/articles/s41524-024-01423-2

tool_video_merged3.mp4



Getting started

  1. conda create --name calms python=3.12

  2. conda activate calms

  3. conda install pip

  4. git clone https://github.com/AdvancedPhotonSource/CALMS

  5. Navigate to the folder, activate your conda environment, then depending on your OS:

    pip install --no-deps -r requirements_linux.txt (or) requirements_win.txt

  • No deps is required due to tight versioning with a couple of packages.
  1. Go to https://pytorch.org/get-started/locally/ and run the appropriate command to install torch

  2. Start the app:

  • The VERY FIRST time you run each model, you will have to compute embeddings over the document stores. You can do this by setting init_docs = True in params.py before starting the chat app. This will take a LONG time but only needs to be run once

  • python chat_app.py --openai

    for OpenAI models (choose which one (GPT3.5, GPT4 etc. ) in params.py)

(OR)

  • python chat_app.py —hf

    for open-source models (choose which one (Vicuna etc.) in params.py)

    Recommend at least 50 GB of GPU memory for LLAMA family of models

Please note you will have to provide your own OpenAI and Materials Project API keys

  1. Navigate to localhost:2023 for the open-source model and localhost:2024 for the openai model

    Ports can be set in chat_app.py



DISCLAIMER

The content presented in this paper has been generated using pre-trained Large Language Models (LLMs), specifically GPT 3.5 and Vicuna, by injecting contextual prompts into these LLM pipelines through a retrieval and augmentation tool. The generated content is reported as is, without any manipulation or alteration of the LLM outputs. The authors acknowledge that LLM-generated content may contain errors, biases, or inaccuracies, which could significantly impact the scientific workflows in which they are incorporated. It is important to note that the current code base is not production-ready and requires additional checks and balances before being used for large-scale deployment. Furthermore, the authors disclaim any responsibility or liability for the accuracy, completeness, or reliability of LLM-generated content presented in this paper.