Add `Outline`s #1364

yvan-sraka · 2025-01-08T08:23:04Z

Fix #1346, WDYT about the integration with outlines.prompt? (Here, the template argument is just a function that returns a str, so it could be either an @outlines.prompt-decorated function or a regular one)

tests/test_outline.py

outlines/outline.py

rlouf · 2025-01-08T13:00:05Z

Fix #1346, WDYT about the integration with outlines.prompt?

We should allow functions that are not decorated, such as

def build_prompt(a: int) -> str:
    return f"What is {a} squared?"

Because we do not want to force the use of templates on users.

rlouf · 2025-01-08T13:01:24Z

Also you can deprecate the code in function.py as this is going to replace it.

outlines/outline.py

rlouf · 2025-01-08T13:06:25Z

outlines/outline.py

+        prompt = self.template(*args)
+
+        # Generate the response using the model
+        response = self.model.generate(prompt)


This should use either outlines.generate.json, outlines.generate.regex, etc. The interface is necessarily going to be brittle for now. You can take a look at the v1 branch to see how this is going to be more robust with the undergoing refactor.

Hmm, okay, but which one of those should I use by default? And how should I let users switch to the other one, by adding a fourth optional argument, e.g. generation_kind, WDYT?

I just realized that maybe I misunderstood the issue, and this PR should infer the correct regex to match the user's required output type. E.g., if a user asks for an int, I should ask outlines.generate.regex something like ^[+-]?[0-9]+$. IIUC, I'm unsure how to scale this to arbitrary return types. Should we define a list of supported basic return types? Using ast.literal_eval was the most flexible solution I came up with, but I'm not sure how well it handles custom data structures. In such cases (when the return type isn't in the list of straightforward Python builtins), I think we should check that users provide types that implement deserialization from JSON (and then use outlines.generate.json), WDYT?

Let's only accept Pydantic models / JSON Schema strings for now and use outlines.generate.json. We'll be able to easily generalise after we push the refactor in the v1 branch.

I've updated the PR (and the usage example in the docstring). Does it still look like what we want?

yvan-sraka · 2025-01-08T13:49:48Z

Fix #1346, WDYT about the integration with outlines.prompt?

We should allow functions that are not decorated, such as
def build_prompt(a: int) -> str:
    return f"What is {a} squared?"
Because we do not want to force the use of templates on users.

That's already how this PR works, I just wanted to confirm that this was the intended design and not an omission!

Also you can deprecate the code in function.py as this is going to replace it.

Should I handle that in this PR or a separate one? By deprecate, do you mean adding a user-facing warning when it's used, rather than removing it entirely?

rlouf · 2025-01-08T15:14:32Z

Should I handle that in this PR or a separate one? By deprecate, do you mean adding a user-facing warning when it's used, rather than removing it entirely?

We can do it in this PR. Let's add a user-facing warning, and announce that it will be removed in favour of the Outline interface in v1.0

outlines/function.py

outlines/outline.py

tests/test_outline.py

outlines/function.py

outlines/outline.py

outlines/function.py

rlouf · 2025-01-13T12:10:22Z

outlines/outline.py

+        try:
+            if isinstance(self.output_type, str):
+                return json.loads(response)
+            return self.output_type.model_validate_json(response)


Let's instantiate the model with the JSON data, like we do here to keep a consistent interface.

I got this linter warning:

The method "parse_raw" in class "BaseModel" is deprecated
The parse_raw method is deprecated; if your data is JSON use model_validate_json, otherwise load the data then use model_validate instead.Pylance

You're right, my bad

outlines/outline.py

tests/test_outline.py

yvan-sraka · 2025-01-13T16:35:30Z

I'm a bit frustrated by this failing test, I can't reproduce it locally, and I'm not sure how to work around it :/

torymur · 2025-01-14T12:05:27Z

Update the tests to not use deprecated Function?: https://github.com/dottxt-ai/outlines/blob/79100b293f861ee8bd74390527ec25cb6fe3b670/tests/test_function.py

(Nice to see the deprecation working properly though!)

torymur · 2025-01-14T12:07:45Z

Also, found mention in quickstart docs, needs to be updated too:

outlines/docs/quickstart.md

Lines 186 to 190 in 79100b2

    
           generate_joke = outlines.Function( 
        
               tell_a_joke, 
        
               Joke, 
        
               "microsoft/Phi-3-mini-4k-instruct" 
        
           )

yvan-sraka · 2025-01-15T12:57:28Z

Also, found mention in quickstart docs, needs to be updated too:

outlines/docs/quickstart.md

Lines 186 to 190 in 79100b2

generate_joke = outlines.Function(

tell_a_joke,

Joke,

"microsoft/Phi-3-mini-4k-instruct"

)

Rendered

yvan-sraka · 2025-01-17T13:19:23Z

I'm a bit frustrated by this failing test, I can't reproduce it locally, and I'm not sure how to work around it :/

=========================== short test summary info ============================
FAILED tests/test_outline.py::test_outline - TypeError: 'Mock' object is not iterable
!!!!!!!!!!!!!!!!!!!!!!!!!! stopping after 1 failures !!!!!!!!!!!!!!!!!!!!!!!!!!!
= 1 failed, 376 passed, 247 skipped, 1 xfailed, 2 xpassed in 1085.82s (0:18:05) =

yvan-sraka · 2025-01-22T10:58:07Z

I’m definitely struggling with this issue. Thanks to action-tmate, I was able to get an interactive shell in a GitHub Action job, but I’m not sure what to do next? Maybe I should compare my local and CI pip freeze outputs? I’d really like to avoid this kind of issue in the future, I’ve tried to address it in #1387!

rlouf · 2025-01-23T15:05:52Z

Can you run pytest with the --pdb flag?

And the `output_type` should now be a Pydantic model or a JSON Schema str

Since the module was deprecated the tests were failing with a warning.

yvan-sraka · 2025-01-27T12:27:52Z

Can you run pytest with the --pdb flag?

Thanks for the tips! I ran it inside the GitHub Action job and got the following traceback:

outlines/fsm/guide.py:76: in cached_create_states_mapping
    return uncached_create_states_mapping(regex_string, tokenizer, *args, **kwargs)
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/fsm/guide.py:141: in create_states_mapping
    return create_states_mapping_from_fsm(regex_fsm, tokenizer, frozen_tokens)
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/fsm/guide.py:178: in create_states_mapping_from_fsm
    states_to_token_maps, empty_token_ids = create_fsm_index_tokenizer(
/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/fsm/regex.py:473: in create_fsm_index_tokenizer
    tokens_to_token_ids, empty_token_ids = reduced_vocabulary(tokenizer)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

tokenizer = <Mock name='mock.tokenizer' id='140081937198256'>

    @lru_cache
    def reduced_vocabulary(
        tokenizer,
    ) -> Tuple[Dict[str, List[int]], Set[int]]:
        """Create a map from decoded vocabulary tokens to lists of equivalent token ids."""
        # TODO FIXME: See if we can get the underlying Rust tokenizers from HF and
        # do all this in Rust
        empty_token_ids = set()
        vocabulary: Dict[str, List[int]] = {}
>       for token, token_idx in tokenizer.vocabulary.items():
E       TypeError: 'Mock' object is not iterable

/opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/fsm/regex.py:394: TypeError
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> entering PDB >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> PDB post_mortem (IO-capturing turned off) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
> /opt/hostedtoolcache/Python/3.10.16/x64/lib/python3.10/site-packages/outlines_core/fsm/regex.py(394)reduced_vocabulary()
-> for token, token_idx in tokenizer.vocabulary.items():
(Pdb) p tokenizer
<Mock name='mock.tokenizer' id='140081937198256'>
(Pdb) p dir(tokenizer)
['assert_any_call', 'assert_called', 'assert_called_once', 'assert_called_once_with', 'assert_called_with', 'assert_has_calls', 'assert_not_called', 'attach_mock', 'call_args', 'call_args_list', 'call_
count', 'called', 'configure_mock', 'method_calls', 'mock_add_spec', 'mock_calls', 'reset_mock', 'return_value', 'side_effect', 'vocabulary']
(Pdb) p tokenizer.vocabulary
<Mock name='mock.tokenizer.vocabulary' id='140081937198736'>
(Pdb) p tokenizer.vocabulary.items
<Mock name='mock.tokenizer.vocabulary.items' id='140081937198832'>

This made me realize that from outlines_core.fsm.regex import reduced_vocabulary is only used in tests/generate/test_integration_transformers.py, so the test might fail because of an update in outline_core, rather than an issue with the tests in tests/test_outline.py, WDYT?

yvan-sraka requested review from torymur and rlouf January 8, 2025 08:23

yvan-sraka self-assigned this Jan 8, 2025

yvan-sraka linked an issue Jan 8, 2025 that may be closed by this pull request

Add Outlines #1346

Open

yvan-sraka commented Jan 8, 2025

View reviewed changes

tests/test_outline.py Outdated Show resolved Hide resolved

yvan-sraka force-pushed the 1346-add-outlines branch from 8cb052d to 26b14b9 Compare January 8, 2025 12:47

rlouf reviewed Jan 8, 2025

View reviewed changes

outlines/outline.py Outdated Show resolved Hide resolved

rlouf reviewed Jan 8, 2025

View reviewed changes

yvan-sraka force-pushed the 1346-add-outlines branch from 26b14b9 to e328144 Compare January 8, 2025 13:45

yvan-sraka requested a review from rlouf January 8, 2025 13:53

yvan-sraka force-pushed the 1346-add-outlines branch 5 times, most recently from ddd5409 to 214a2f3 Compare January 10, 2025 11:07

yvan-sraka marked this pull request as ready for review January 10, 2025 11:07

yvan-sraka force-pushed the 1346-add-outlines branch 2 times, most recently from b2512e9 to 9a9acba Compare January 10, 2025 11:43

torymur reviewed Jan 10, 2025

View reviewed changes

outlines/function.py Outdated Show resolved Hide resolved

torymur reviewed Jan 10, 2025

View reviewed changes

outlines/outline.py Outdated Show resolved Hide resolved

torymur reviewed Jan 10, 2025

View reviewed changes

tests/test_outline.py Show resolved Hide resolved

rlouf reviewed Jan 10, 2025

View reviewed changes

outlines/function.py Outdated Show resolved Hide resolved

outlines/outline.py Outdated Show resolved Hide resolved

torymur reviewed Jan 10, 2025

View reviewed changes

outlines/outline.py Outdated Show resolved Hide resolved

yvan-sraka force-pushed the 1346-add-outlines branch from 9a9acba to 90361fa Compare January 13, 2025 09:30

rlouf reviewed Jan 13, 2025

View reviewed changes

outlines/function.py Outdated Show resolved Hide resolved

rlouf reviewed Jan 13, 2025

View reviewed changes

outlines/outline.py Show resolved Hide resolved

rlouf reviewed Jan 13, 2025

View reviewed changes

tests/test_outline.py Show resolved Hide resolved

yvan-sraka force-pushed the 1346-add-outlines branch 2 times, most recently from 1892f30 to aa4a904 Compare January 13, 2025 15:54

yvan-sraka force-pushed the 1346-add-outlines branch from aa4a904 to 7991d18 Compare January 15, 2025 09:55

yvan-sraka mentioned this pull request Jan 22, 2025

Reproducible Python virtual environment using uv and nix #1387

Draft

yvan-sraka added 5 commits January 27, 2025 10:36

Add Outlines

23b51e2

Use generate.json in Outlines

82f76a8

And the `output_type` should now be a Pydantic model or a JSON Schema str

Address few rounds of GitHub PR reviews ;)

930aaf4

Update documentation (replace Function by Outline)

b67f0ad

Remove Function-related tests

535cd5a

Since the module was deprecated the tests were failing with a warning.

yvan-sraka force-pushed the 1346-add-outlines branch from 8bf90a8 to 535cd5a Compare January 27, 2025 12:31

Update test_outline.py

0dbbf9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `Outline`s #1364

Add `Outline`s #1364

yvan-sraka commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf Jan 8, 2025

yvan-sraka Jan 8, 2025

yvan-sraka Jan 8, 2025 •

edited

Loading

rlouf Jan 9, 2025

yvan-sraka Jan 10, 2025

yvan-sraka commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf Jan 13, 2025

yvan-sraka Jan 13, 2025

rlouf Jan 13, 2025

yvan-sraka commented Jan 13, 2025

torymur commented Jan 14, 2025 •

edited

Loading

torymur commented Jan 14, 2025

yvan-sraka commented Jan 15, 2025

yvan-sraka commented Jan 17, 2025

yvan-sraka commented Jan 22, 2025

rlouf commented Jan 23, 2025

yvan-sraka commented Jan 27, 2025

Add Outlines #1364

Are you sure you want to change the base?

Add Outlines #1364

Conversation

yvan-sraka commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf Jan 8, 2025

Choose a reason for hiding this comment

yvan-sraka Jan 8, 2025

Choose a reason for hiding this comment

yvan-sraka Jan 8, 2025 • edited Loading

Choose a reason for hiding this comment

rlouf Jan 9, 2025

Choose a reason for hiding this comment

yvan-sraka Jan 10, 2025

Choose a reason for hiding this comment

yvan-sraka commented Jan 8, 2025

rlouf commented Jan 8, 2025

rlouf Jan 13, 2025

Choose a reason for hiding this comment

yvan-sraka Jan 13, 2025

Choose a reason for hiding this comment

rlouf Jan 13, 2025

Choose a reason for hiding this comment

yvan-sraka commented Jan 13, 2025

torymur commented Jan 14, 2025 • edited Loading

torymur commented Jan 14, 2025

yvan-sraka commented Jan 15, 2025

yvan-sraka commented Jan 17, 2025

yvan-sraka commented Jan 22, 2025

rlouf commented Jan 23, 2025

yvan-sraka commented Jan 27, 2025

Add `Outline`s #1364

Add `Outline`s #1364

yvan-sraka Jan 8, 2025 •

edited

Loading

torymur commented Jan 14, 2025 •

edited

Loading