15 Dec 21:24

jparkerweb

ebfbe16

v2.4.1 - ✂️ Clean Split Latest

Latest

What's New 🎉

2.4.1 📦 Updated

Updated sentence splitter to use @stdlib/nlp-sentencize
Updated embedding cache to use lru-cache

v2.4.0

✨ Added

sentenceit function (split by sentence and return embeddings)

Please consider sending me a tip to support my work 😀

🍵 tip me here

⇢ 💻 Visit eQuill Labs
⇢ 💬 Join the Discord

Assets 2

13 Dec 20:54

jparkerweb

2.4.0

887837b

v2.4.0 - ✂️ Clean Split

What's New 🎉

v2.4.0

✨ Added

sentenceit function (split by sentence and return embeddings)

Please consider sending me a tip to support my work 😀

🍵 tip me here

⇢ 💻 Visit eQuill Labs
⇢ 💬 Join the Discord

Assets 2

18 Nov 23:32

jparkerweb

2.3.7

57898ce

v2.3.7 - 📦 Transformers.js v3

[2.3.1] - [2.3.7] 2024-11-25

Updated

Updated Web UI to v1.3.1
Updated Documentation
Updated default values in both the library and Web UI
- Web UI default can be set in webui/public/default-form-values.js
Only print version if logging is enabled (default is false)
- was adding console noise to upstream applications
Update string-segmenter patch version dependency
Misc cleanup and optimizations

[2.3.0] - 2024-11-11

Updated

Updated transformers.js from v2 to v3
Migrated quantization option from onnxEmbeddingModelQuantized (boolean) to dtype ('p32', 'p16', 'q8', 'q4')
Updated Web UI to use new dtype option

If you enjoy this package please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

12 Nov 20:33

jparkerweb

2.3.4

0035407

v2.3.4 - 📦 Transformers.js v3

[2.3.1] - [2.3.4] 2024-11-12

Updated

Updated Web UI to v1.3.1
Updated Documentation
Updated default values in both the library and Web UI
- Web UI default can be set in webui/public/default-form-values.js
Misc cleanup and optimizations

[2.3.0] - 2024-11-11

Updated

Updated transformers.js from v2 to v3
Migrated quantization option from onnxEmbeddingModelQuantized (boolean) to dtype ('p32', 'p16', 'q8', 'q4')
Updated Web UI to use new dtype option

If you enjoy this package please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

06 Nov 20:22

jparkerweb

2.2.4

0c976da

v2.2.4 - 🎯 Web UI for Tuning

[2.2.4] - 2024-11-08

Fixed

Fixed issue with Web UI embedding cache not being cleared when a new model is initialized.

[2.2.3] - 2024-11-07

Added

Web UI adjustments for display of truncated JSON results on screen but still allowing download of full results.

[2.2.2] - 2024-11-07

Added

Web UI css adjustments for smaller screens

[2.2.1] - 2024-11-06

Added

Added Highlight.js to Web UI for syntax highlighting of JSON results and code samples
Added JSON results toggle button to turn line wrapping on/off

[2.2.0] - 2024-11-05

Added

New Web UI tool for experimenting with semantic chunking settings
- Interactive form interface for all chunking parameters
- Real-time text processing and results display
- Visual feedback for similarity thresholds
- Model selection and configuration
- Results download in JSON format
- Code generation for settings
- Example texts for testing
- Dark mode interface
Added excludeChunkPrefixInResults option to chunkit and cramit functions
- Allows removal of chunk prefix from final results while maintaining prefix for embedding calculations

Updated

Improved error handling and feedback in chunking functions
Enhanced documentation with Web UI usage examples
Added more embedding models to supported list

Fixed

Fixed issue with chunk prefix handling in embedding calculations
Improved token length calculation reliability

If you enjoy this package please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

06 Nov 07:18

jparkerweb

2.2.0

1fd5ce6

v2.2.0 - 🎯 Web UI for Tuning

[2.2.0] - 2024-11-05

Added

New Web UI tool for experimenting with semantic chunking settings
- Interactive form interface for all chunking parameters
- Real-time text processing and results display
- Visual feedback for similarity thresholds
- Model selection and configuration
- Results download in JSON format
- Code generation for settings
- Example texts for testing
- Dark mode interface
Added excludeChunkPrefixInResults option to chunkit and cramit functions
- Allows removal of chunk prefix from final results while maintaining prefix for embedding calculations

If you enjoy this package please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

04 Nov 20:49

jparkerweb

2.1.4

57c2e61

v2.1.4 - 💎 Rich Output

🎉 What's New

[2.1.4] - 2024-11-04

Updated

Updated README cramit example script to use updated document object input format.

[2.1.3] - 2024-11-04

Fixed

Fixed cramit function to properly pack sentences up to maxTokenSize

Updated

Improved chunk creation logic to better handle both chunkit and cramit modes
Enhanced token size calculation efficiency

[2.1.2] - 2024-11-04

Fixed

Improved semantic chunking accuracy with stricter similarity thresholds
Enhanced logging in similarity calculations for better debugging
Fixed chunk creation to better respect semantic boundaries

Updated

Default similarity threshold increased to 0.5
Default dynamic threshold bounds adjusted (0.4 - 0.8)
Improved chunk rebalancing logic with similarity checks
Updated logging for similarity scores between sentences

[2.1.1] - 2024-11-01

Updated

Updated example scripts in README.

[2.1.0] - 2024-11-01

Updated

⚠️ BREAKING: Input format now accepts array of document objects
Output array of chunks extended with the following new properties:
- document_id: Timestamp in milliseconds when processing started
- document_name: Original document name or ""
- number_of_chunks: Total number of chunks for the document
- chunk_number: Current chunk number (1-based)
- model_name: Name of the embedding model used
- is_model_quantized: Whether the model is quantized

[2.0.0] - 2024-11-01

Added

Added returnEmbedding option to chunkit and cramit functions to include embeddings in the output.
Added returnTokenLength option to chunkit and cramit functions to include token length in the output.
Added chunkPrefix option to prefix each chunk with a task instruction (e.g., "search_document: ", "search_query: ").
Updated README to document new options and add RAG tips for using chunkPrefix with embedding models that support task prefixes.

Updated

⚠️ BREAKING: Returned array of chunks is now an array of objects with text, embedding, and tokenLength properties. Previous versions returned an array of strings.

If you find this library useful please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

02 Nov 06:47

jparkerweb

2.1.0

1e1ac77

v2.1.0 - 💎 Rich Output

🎉 What's New

[2.1.0] - 2024-11-01

Updated

⚠️ BREAKING: Input format now accepts array of document objects
Output array of chunks extended with the following new properties:
- document_id: Timestamp in milliseconds when processing started
- document_name: Original document name or ""
- number_of_chunks: Total number of chunks for the document
- chunk_number: Current chunk number (1-based)
- model_name: Name of the embedding model used
- is_model_quantized: Whether the model is quantized

If you find this library useful please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

01 Nov 23:38

jparkerweb

2.0.0

3af2806

v2.0.0 - 🔍 Embeddings, Tokens and Prefixes Oh My!

🎉 What's New

Added

Added returnEmbedding option to chunkit and cramit functions to include embeddings in the output.
Added returnTokenLength option to chunkit and cramit functions to include token length in the output.
Added chunkPrefix option to prefix each chunk with a task instruction (e.g., "search_document: ", "search_query: ").
Updated README to document new options and add RAG tips for using chunkPrefix with embedding models that support task prefixes.

⚠️ Breaking Change

Returned array chunks is now an array of objects with text, embedding, and tokenLength properties. Previous versions returned an array of strings.

If you find this library useful please consider sending me a tip to support my work 😀

🍵 tip me here

Assets 2

25 Sep 00:01

jparkerweb

1.4.0

e7e3726

v1.4.0

What's Changed

breakup main chunkit file into modules by @jparkerweb in #6

Full Changelog: 1.3.0...1.4.0

If you enjoy this plugin please consider sending me a tip to support my work 😀
🍵 tip me here

Contributors

jparkerweb

Assets 2

Releases: jparkerweb/semantic-chunking

v2.4.1 - ✂️ Clean Split

What's New 🎉

2.4.1

📦 Updated

v2.4.0

✨ Added

🍵 tip me here

v2.4.0 - ✂️ Clean Split

What's New 🎉

v2.4.0

✨ Added

🍵 tip me here

v2.3.7 - 📦 Transformers.js v3

[2.3.1] - [2.3.7] 2024-11-25

Updated

[2.3.0] - 2024-11-11

Updated

🍵 tip me here

v2.3.4 - 📦 Transformers.js v3

[2.3.1] - [2.3.4] 2024-11-12

Updated

[2.3.0] - 2024-11-11

Updated

🍵 tip me here

v2.2.4 - 🎯 Web UI for Tuning

[2.2.4] - 2024-11-08

Fixed

[2.2.3] - 2024-11-07

Added

[2.2.2] - 2024-11-07

Added

[2.2.1] - 2024-11-06

Added

[2.2.0] - 2024-11-05

Added

Updated

Fixed

🍵 tip me here

v2.2.0 - 🎯 Web UI for Tuning

[2.2.0] - 2024-11-05

Added

🍵 tip me here

v2.1.4 - 💎 Rich Output

🎉 What's New

[2.1.4] - 2024-11-04

Updated

[2.1.3] - 2024-11-04

Fixed

Updated

[2.1.2] - 2024-11-04

Fixed

Updated

[2.1.1] - 2024-11-01

Updated

[2.1.0] - 2024-11-01

Updated

[2.0.0] - 2024-11-01

Added

Updated

🍵 tip me here

v2.1.0 - 💎 Rich Output

🎉 What's New

[2.1.0] - 2024-11-01

Updated

🍵 tip me here

v2.0.0 - 🔍 Embeddings, Tokens and Prefixes Oh My!

🎉 What's New

Added

⚠️ Breaking Change

🍵 tip me here

v1.4.0

What's Changed

Contributors