v2.1.4 - π Rich Output
π What's New
[2.1.4] - 2024-11-04
Updated
- Updated README
cramit
example script to use updated document object input format.
[2.1.3] - 2024-11-04
Fixed
- Fixed
cramit
function to properly pack sentences up to maxTokenSize
Updated
- Improved chunk creation logic to better handle both chunkit and cramit modes
- Enhanced token size calculation efficiency
[2.1.2] - 2024-11-04
Fixed
- Improved semantic chunking accuracy with stricter similarity thresholds
- Enhanced logging in similarity calculations for better debugging
- Fixed chunk creation to better respect semantic boundaries
Updated
- Default similarity threshold increased to 0.5
- Default dynamic threshold bounds adjusted (0.4 - 0.8)
- Improved chunk rebalancing logic with similarity checks
- Updated logging for similarity scores between sentences
[2.1.1] - 2024-11-01
Updated
- Updated example scripts in README.
[2.1.0] - 2024-11-01
Updated
β οΈ BREAKING: Input format now accepts array of document objects- Output array of chunks extended with the following new properties:
document_id
: Timestamp in milliseconds when processing starteddocument_name
: Original document name or ""number_of_chunks
: Total number of chunks for the documentchunk_number
: Current chunk number (1-based)model_name
: Name of the embedding model usedis_model_quantized
: Whether the model is quantized
[2.0.0] - 2024-11-01
Added
- Added
returnEmbedding
option tochunkit
andcramit
functions to include embeddings in the output. - Added
returnTokenLength
option tochunkit
andcramit
functions to include token length in the output. - Added
chunkPrefix
option to prefix each chunk with a task instruction (e.g., "search_document: ", "search_query: "). - Updated README to document new options and add RAG tips for using
chunkPrefix
with embedding models that support task prefixes.
Updated
β οΈ BREAKING: Returned array of chunks is now an array of objects withtext
,embedding
, andtokenLength
properties. Previous versions returned an array of strings.
If you find this library useful please consider sending me a tip to support my work π