-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generalizes GPT3CompletionModel to work with other providers, adds Anthropic #71
base: main
Are you sure you want to change the base?
Conversation
…der, updated docs and tests accordingly.
…ffecting later tests
…oved spaces in checked response to identify replacement 'break down' as keyword 'breakdown'.
1. If you haven't already, [make an OpenAI account](https://openai.com/api/) and [create an API key](https://platform.openai.com/api-keys). | ||
1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `OPENAI_API_KEY` and paste in your API key as the secret. | ||
1. If you haven't already, follow the directions above to create an account and get an API key for your chosen model provider. | ||
1. In your fork's "⚙️ Settings" tab, make a new Actions repository secret with the name `<PROVIDER>_API_KEY` and paste in your API key as the secret. Replace `<PROVIDER>` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to make a PR to add the Anthropic key to the rootstock workflow here:
https://github.com/manubot/rootstock/blob/main/.github/workflows/ai-revision.yaml#L59
If at some point in the future we theoretically support like a dozen or more services, maybe we just instruct the user to update their ai-revision workflow accordingly for whatever services they're using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point; I've converted this PR into a draft until I figure out the implications upstream, including the one you raised. I'm wondering if we should relax the requirement that <PROVIDER>_API_KEY
exists and has a non-empty value for every provider, and just check that it's valid when we actually use it to query the API.
I don't know how many services we'll end up providing, but ideally we won't have to make PRs in multiple repos to support the changes going forward. Let me think on it; perhaps we can take in a value in a structured format from rootstock for all the AI Editor options, and the definition of that format can be in this repo, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can take care of that small rootstock PR. Per our discussion, we'll add:
- comment above workflow step saying something like "duplicate step as necessary to use different providers"
- rename "open ai key" var to just "ai key"
- add provider env var
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job, wanted to add some comments in case they're helpful along the journey here.
support whichever model providers LangChain supports. That said, we currently support OpenAI and Anthropic models only, | ||
and are working to add support for other model providers. | ||
|
||
When using OpenAI models, [our evaluations](https://github.com/pivlab/manubot-ai-editor-evals) show that `gpt-4-turbo` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly outside the bounds of this PR: I wondered if versioning the evals could make sense (perhaps through a DOI per finding or maybe through the poster which was shared). There could come a time (probably sooner than we think) that GPT-4-Turbo isn't available or relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point; I wonder if we should move the statement about which model was best in evaluation to the https://github.com/pivlab/manubot-ai-editor-evals repo, so that it can be updated without having to keep this repo up to date as well. I suppose @vincerubinetti and @miltondp might have opinions there, since they're the primary contributors on the evals repo.
Co-authored-by: Dave Bunten <[email protected]>
We need to figure out how we're going to handle the additional API key environment variables for new providers, since they require updates to rootstock as @vincerubinetti mentioned, and might quickly get unmanageable as the number of providers we support grows. I'd be in favor of resolving the API key like so:
Happy to hear differing opinions, of course! |
This PR generalizes
GPT3CompletionModel
to work with API clients for other model providers. The class now takes amodel_provider
string parameter, which must be a valid key in themanubot_ai_editor.models.MODEL_PROVIDERS
dictionary. Explicit references to OpenAI have been generalized to apply to other model providers, e.g. theopenai_api_key
parameter is now justapi_key
.GPT3CompletionModel
now supports Anthropic as a second model provider, and more can be added by extending theMODEL_PROVIDERS
dict mentioned previously.The PR modifies the "cost" end-to-end test
tests.test_prompt_config.test_prompts_apply_gpt3
to also check Anthropic. To run the tests against both OpenAI and Anthropic, be sure that you've exported bothOPENAI_API_KEY
andANTHROPIC_API_KEY
with valid API keys for each, then runpoetry run pytest --runcost
to run the end-to-end tests.End-to-end test tweaks: Note that the "cost" test always has the potential to break, since the LLM doesn't always obey the prompt's request to insert a special keyword into the text. This morning, the OpenAI test was unable to add "bottle" to the "abstract" section, so I changed it to "violin", which appeared to pass. Also, it was inserting the keyword "breakdown" as "break down", so I modified the test to remove the spaces in the response before checking for the keyword.
Documentation: I've gone through the README and tried to tweak it to explain that we now support multiple model providers, but it may require further tweaking. Also, I'm unsure if "model provider" is the preferred term for companies like OpenAI and Anthropic that provide APIs to query LLMs, or if we should use something else; feedback appreciated!