Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

falquaddoomi · 2024-12-18T18:11:46Z

This PR generalizes GPT3CompletionModel to work with API clients for other model providers. The class now takes a model_provider string parameter, which must be a valid key in the manubot_ai_editor.models.MODEL_PROVIDERS dictionary. Explicit references to OpenAI have been generalized to apply to other model providers, e.g. the openai_api_key parameter is now just api_key.

GPT3CompletionModel now supports Anthropic as a second model provider, and more can be added by extending the MODEL_PROVIDERS dict mentioned previously.

The PR modifies the "cost" end-to-end test tests.test_prompt_config.test_prompts_apply_gpt3 to also check Anthropic. To run the tests against both OpenAI and Anthropic, be sure that you've exported both OPENAI_API_KEY and ANTHROPIC_API_KEY with valid API keys for each, then run poetry run pytest --runcost to run the end-to-end tests.

End-to-end test tweaks: Note that the "cost" test always has the potential to break, since the LLM doesn't always obey the prompt's request to insert a special keyword into the text. This morning, the OpenAI test was unable to add "bottle" to the "abstract" section, so I changed it to "violin", which appeared to pass. Also, it was inserting the keyword "breakdown" as "break down", so I modified the test to remove the spaces in the response before checking for the keyword.

Documentation: I've gone through the README and tried to tweak it to explain that we now support multiple model providers, but it may require further tweaking. Also, I'm unsure if "model provider" is the preferred term for companies like OpenAI and Anthropic that provide APIs to query LLMs, or if we should use something else; feedback appreciated!

…der, updated docs and tests accordingly.

…ffecting later tests

…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.

vincerubinetti · 2024-12-18T20:54:41Z

README.md

-1. If you haven't already, [make an OpenAI account](https://openai.com/api/) and [create an API key](https://platform.openai.com/api-keys).
-1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `OPENAI_API_KEY` and paste in your API key as the secret.
+1. If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider.
+1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `<PROVIDER>_API_KEY` and paste in your API key as the secret. Replace `<PROVIDER>`


We might want to make a PR to add the Anthropic key to the rootstock workflow here:
https://github.com/manubot/rootstock/blob/main/.github/workflows/ai-revision.yaml#L59

If at some point in the future we theoretically support like a dozen or more services, maybe we just instruct the user to update their ai-revision workflow accordingly for whatever services they're using.

Excellent point; I've converted this PR into a draft until I figure out the implications upstream, including the one you raised. I'm wondering if we should relax the requirement that <PROVIDER>_API_KEY exists and has a non-empty value for every provider, and just check that it's valid when we actually use it to query the API.

I don't know how many services we'll end up providing, but ideally we won't have to make PRs in multiple repos to support the changes going forward. Let me think on it; perhaps we can take in a value in a structured format from rootstock for all the AI Editor options, and the definition of that format can be in this repo, too.

I can take care of that small rootstock PR. Per our discussion, we'll add:

comment above workflow step saying something like "duplicate step as necessary to use different providers"

rename "open ai key" var to just "ai key"

add provider env var

d33bs

Nice job, wanted to add some comments in case they're helpful along the journey here.

d33bs · 2025-01-07T19:46:08Z

README.md

+support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only,
+and are working to add support for other model providers.
+
+When using OpenAI models, [our evaluations](https://github.com/pivlab/manubot-ai-editor-evals) show that `gpt-4-turbo`


Slightly outside the bounds of this PR: I wondered if versioning the evals could make sense (perhaps through a DOI per finding or maybe through the poster which was shared). There could come a time (probably sooner than we think) that GPT-4-Turbo isn't available or relevant.

That's a good point; I wonder if we should move the statement about which model was best in evaluation to the https://github.com/pivlab/manubot-ai-editor-evals repo, so that it can be updated without having to keep this repo up to date as well. I suppose @vincerubinetti and @miltondp might have opinions there, since they're the primary contributors on the evals repo.

libs/manubot_ai_editor/models.py

tests/test_model_basics.py

Co-authored-by: Dave Bunten <[email protected]>

falquaddoomi · 2025-01-15T21:02:05Z

We need to figure out how we're going to handle the additional API key environment variables for new providers, since they require updates to rootstock as @vincerubinetti mentioned, and might quickly get unmanageable as the number of providers we support grows.

I'd be in favor of resolving the API key like so:

if a provider-specific API key is supplied, e.g. ANTHROPIC_API_KEY, it'll be used with that provider
if the provider-specific API key isn't found, a generic one, e.g. PROVIDER_API_KEY will be checked
if that can't be found, either, and the provider requires a key it'll throw an error

Happy to hear differing opinions, of course!

falquaddoomi added 7 commits December 18, 2024 09:49

Added langchain-anthropic dependency, updated lockfile

ececf26

Added support for other model providers. Added anthropic as 2nd provi…

20e0eb3

…der, updated docs and tests accordingly.

Fixed conflict between mock.patch.dict and env restoration that was a…

16cae0a

…ffecting later tests

Merged anthropic e2e test with existing openAI test

804c6d0

Changed abstract keyword to 'violin', since 'bottle' was failing. Rem…

cfa5000

…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.

Added black's updates to tests/test_prompt_config.py

e83bed7

Adds missing ANTHROPIC_API_KEY to test workflow

a82ca7b

falquaddoomi requested review from miltondp, d33bs and vincerubinetti December 18, 2024 18:31

vincerubinetti reviewed Dec 18, 2024

View reviewed changes

falquaddoomi marked this pull request as draft December 18, 2024 20:56

d33bs reviewed Jan 7, 2025

View reviewed changes

Refactors duplicated parameterize args into shared list

6c479fd

Co-authored-by: Dave Bunten <[email protected]>

falquaddoomi marked this pull request as ready for review January 15, 2025 20:50

falquaddoomi marked this pull request as draft January 15, 2025 21:17

Tweaked tests to get the --runcost test to pass

5381893

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

falquaddoomi commented Dec 18, 2024

vincerubinetti Dec 18, 2024

falquaddoomi Dec 18, 2024

vincerubinetti Dec 18, 2024

d33bs left a comment

d33bs Jan 7, 2025

falquaddoomi Jan 15, 2025

falquaddoomi commented Jan 15, 2025

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

Are you sure you want to change the base?

Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71

Conversation

falquaddoomi commented Dec 18, 2024

vincerubinetti Dec 18, 2024

Choose a reason for hiding this comment

falquaddoomi Dec 18, 2024

Choose a reason for hiding this comment

vincerubinetti Dec 18, 2024

Choose a reason for hiding this comment

d33bs left a comment

Choose a reason for hiding this comment

d33bs Jan 7, 2025

Choose a reason for hiding this comment

falquaddoomi Jan 15, 2025

Choose a reason for hiding this comment

falquaddoomi commented Jan 15, 2025