-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Sid Mohan
authored and
Sid Mohan
committed
Sep 20, 2024
1 parent
ccff087
commit 5ee1788
Showing
1 changed file
with
6 additions
and
91 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -35,97 +35,6 @@ curl -X POST "http://localhost:8000/extract-pii" \ | |
-d '{"content": "My name is John Doe and my email is [email protected]. My phone number is 123-456-7890."}' | ||
``` | ||
|
||
a default config file containing information about the model, tokenizer, regex_pattern gets loaded into your working directory. | ||
|
||
You can see the contents of that file by typing: | ||
|
||
``` | ||
datafog-instructor show-fogprint | ||
``` | ||
|
||
What is a fogprint? A fogprint is a template that you can re-use, with specific configuration settings for the models, filenames, model_ids, and other important information to instruct an LLM to detect entities. This file is currently saved as fogprint.json. | ||
|
||
### Verify the installation: | ||
|
||
``` | ||
datafog-instructor list-entities | ||
``` | ||
|
||
You should see a list of default entity types: PERSON, COMPANY, LOCATION, and ORG. | ||
|
||
## Sample Operations | ||
|
||
### Detect Entities in Text | ||
|
||
``` | ||
datafog-instructor detect-entities --prompt "Apple Inc. was founded by Steve Jobs in Cupertino, California." | ||
``` | ||
|
||
This will output a table of detected entities, their positions, and types. | ||
|
||
### Display Current Configuration | ||
|
||
``` | ||
datafog-instructor show-fogprint | ||
``` | ||
|
||
This command will show you the current configuration stored in `fogprint.json`. | ||
|
||
### Reinitialize with Custom Settings | ||
|
||
To change the default model or pattern: | ||
|
||
1. Edit the `fogprint.json` file directly, or | ||
2. Use the `init` command with the `--force` flag: | ||
|
||
``` | ||
datafog-instructor init --force | ||
``` | ||
|
||
Follow the prompts to update your configuration. | ||
|
||
## Advanced Usage | ||
|
||
- Adjust the maximum number of tokens generated: | ||
|
||
``` | ||
datafog-instructor detect-entities --prompt "Your text here" --max-new-tokens 100 | ||
``` | ||
|
||
- For batch processing or integration into your Python projects, import the `EntityDetector` class from `models.py`. | ||
|
||
## Development and Testing | ||
|
||
For development purposes, you can install additional dependencies: | ||
|
||
``` | ||
python -m venv venv && source venv/bin/activate && pip install requirements-dev.txt | ||
## Documentation | ||
To build the documentation locally: | ||
``` | ||
|
||
pip install datafog-instructor[docs] | ||
cd docs | ||
sphinx | ||
|
||
``` | ||
The documentation will be available in the `docs/_build/html` directory. | ||
## Contributing | ||
|
||
Contributions to the DataFog Instructor SDK are welcome! Please feel free to submit a Pull Request. | ||
|
@@ -138,6 +47,12 @@ This project is licensed under the MIT License. | |
|
||
If you encounter any problems or have any questions, please open an issue on the GitHub repository or join our Discord community at https://discord.gg/bzDth394R4. | ||
|
||
## Acknowledgements | ||
|
||
- Logfire: https://logfire.pydantic.dev | ||
- Pydantic: https://pydantic.dev | ||
- Instructor: https://github.com/jxnl/instructor | ||
|
||
## Links | ||
|
||
- Homepage: https://datafog.ai | ||
|