All notable changes to this project will be documented in this file.
- [breaking] Refactor the
Input
implementors with automatic padding (#276).- Padding and alignment is now handled automatically by the input types, allowing them to work safely without copying the entire input. The overhead is now limited to the padding, which is at most 256 bytes in total.
BorrowedBytes
is now safe to construct.OwnedBytes
no longer copies the entire source on construction.
- Atomic values getting invalid spans (#327). (#327)
- Fixed an issue where atomic values would be matched with all trailing characters up until the next closing.
- Improve SIMD codegen.
- Improved the way we dispatch to SIMD-intensive functions.
This results in slightly larger binaries, but massive speedups –
throughput increase of 5, 10, 20, or in case of
google_map::travel_modes/rsonpath_direct_count
59 (fifty-nine) percent.
- Improved the way we dispatch to SIMD-intensive functions.
This results in slightly larger binaries, but massive speedups –
throughput increase of 5, 10, 20, or in case of
- Harden GitHub Actions.
- We now use the StepSecurity harden-runner in audit mode to test a more secure approach to GitHub CI.
- End to end test refactor.
- tests are now generated into many separate files instead of one gigantic file. This improves compilation times, responsiveness of rust-analyzer, and in general makes the tooling happier.
- Bump arbitrary from 1.3.0 to 1.3.2.
- Bump clap from 4.4.6 to 4.4.7.
- Bump thiserror from 1.0.49 to 1.0.50.
-
Missing openings from node results. (#297)
- Fixed an issue where the opening characters of matched nodes would not be included in the result when head-skipping and the opening happened on a block boundary.
-
Lib MSRV.
- In v0.8.0 we inadvertently broke the MSRV, and the project only built with 1.71.1 It was restored to 1.70.0 for the binary and 1.67.1 for the lib.
- Bump clap from 4.4.4 to 4.4.6.
- Bump memmap2 from 0.7.1 to 0.9.0.
- Bump thiserror from 1.0.48 to 1.0.49.
- Improved handling of the root-only query
$
. (#160)- Full nodes result when asking for root: 2 times throughput increase.
- Indices/count result when asking for root: basically unboundedly faster, no longer looks at the entire document.
- Clarified that the
approximate_spans
guarantees.- Now documentation mentions that the returned
MatchSpan
s can potentially have their end indices farther than one would expect the input to logically end, due to internal padding.
- Now documentation mentions that the returned
- Fixed handling of the root-only query
$
on atomic documents. (#160)- Previously only object and array roots were supported.
- Fixed a bug when head-skipping to a single-byte key would panic. (#281)
- This was detected by fuzzing!
- The queries
$..["{"]
and$..["["]
would panic on inputs starting with the bytes{"
or["
, respectively.
- Fixed a bug where disabling the
simd
feature would not actually disable SIMD acceleration.
- Made the ClusterFuzzLite batch workflow automatically create an issue on failure to make sure the maintainers are notified.
- [breaking] Refactored the [
Match
]/[MatchSpan
] types.- [
Match
] now takes 32 bytes, down from 40. - All fields are now private, accessible via associated functions.
- Added the
len
function to [MatchSpan
].
- [
- Added
approximate_spans
result mode. (#242)- Engine can return an approximate span of the match, where "approximate" means the start index is correct, but the end index might include trailing whitespace after the match.
- This mode is much faster that full
matches
, close to the performance ofcount
, especially for large result sets. - This is a library-only feature.
- Library exposes a new optional feature,
arbitrary
.- When enabled, includes
arbitrary
as a dependency and provides anArbitrary
impl forJsonPathQuery
,JsonString
, andNonNegativeArrayIndex
.
- When enabled, includes
- Fixed a bug when memmem acceleration would fail for empty keys.
- This was detected by fuzzing! The query
$..[""]
would panic on certain inputs due to invalid indexing.
- This was detected by fuzzing! The query
- Fixed a panic when parsing invalid queries with wide UTF8 characters.
- This was detected by fuzzing! Parsing a query with invalid syntax caused by a longer-than-byte UTF-8 character would panic when the error handler tried to resume parsing from the next byte instead of respecting char boundaries.
- Fixed a panic caused by node results in invalid JSON documents.
- This was detected by fuzzing! Invalid JSON documents could cause the NodeRecorder to panic if the apparent match span was of length 1.
- Fixed erroneous match span end reporting. (#247)
- Fixed a bug where
MatchSpan
values given by the engine were almost always invalid.
- Fixed a bug where
- Fuzzing integration with libfuzzer and ClusterFuzzLite.
cargo-fuzz
can be used to fuzz the project with libfuzzer. Currently we have three fuzzing targets, one for stressing the query parser, one for stressing the engine with arbitrary bytes, and one stressing the engine with structure-aware queries and JSONs.- Fuzzing is now enabled on every PR. Using ClusterFuzzLite we will fuzz the project every day on a cron schedule to establish a corpus.
- Added correctness tests for match spans reporting (#247)
- Bump clap from 4.4.2 to 4.4.4.
- Bump vergen from 8.2.4 to 8.2.5.
- Portable binaries. (#231)
- SIMD capabilities are now discovered at runtime, allowing us to distribute one binary per target.
- Requirements for SIMD are now more granular,
allowing weaker CPUs to still get some of the acceleration:
- Base SIMD is either SSE2, SSSE3, or AVX2.
- Structural classification works on SSSE3 and above.
- Quote classification works if
pclmulqdq
is available. - Depth classification works if
popcnt
is available.
- To counteract the increased binary size debug info is no longer included in distributed binaries.
- Codegen for distributed binaries is improved with fat LTO and setting codegen units to 1.
- SIMD capabilities are listed with
rq --version
.
- Change clippy to auguwu/clippy-action
- The "official" action was not maintained for 3 years now. This one is actively maintained (thanks Noel!).
- Panic when head-skipping block boundary. (#249)
- Fixed an issue when head-skipping acceleration in nodes result mode would panic in very specific input circumstances, or if the input had really long JSON keys.
- Bump thiserror from 1.0.47 to 1.0.48.
- Added 32-bit and SSSE3 SIMD support.
- Refactored all SIMD code to enable modularity and more target feature types.
- Building for x86 now chooses one of four SIMD implementations:
- AVX2 64-bit
- AVX2 32-bit
- SSSE3 64-bit
- SSSE3 32-bit
- These are also now distributed as separate binaries.
-
Fine-grained action permissions.
- Actions now use explicit, lowest possible permissions for all jobs.
-
Add SLSA3 provenance to the release pipeline.
- Future releases will include cryptographically signed provenance for all binaries. See: https://slsa.dev/spec/v1.0/about
-
StepSecurity Apply security best practices.
- All CI uses hash-pinned dependencies now.
- Run the OSSF Scorecard check on each PR.
- Add Dependency review.
-
Removed test-codegen deps from
Cargo.lock
- By removing the codegen crate from the workspace their deps are now separated and don't pollute the lock of the actual end product.
-
cargo-deny
now runs with the CI to keep tabs on our deps.- Configured to reject Medium+ CVEs and non-compatible licenses.
- Bump clap from 4.3.19 to 4.4.2.
- Bump log from 0.4.19 to 0.4.20.
- Bump thiserror from 1.0.44 to 1.0.47.
- Bump trycmd from 0.14.16 to 0.14.17.
- Removed
memchr
as a dependency.- It was no longer needed after the custom
memmem
classifier introduced in v0.6.0.
- It was no longer needed after the custom
- Removed
replace_with
as a dependency.- That code path was refactored earlier, dep was now unused.
- Added the OpenSSF badge.
- We will be trying to achieve the Passing level before v1.0.0.
- Added the scorecard badge.
-
[breaking] Remove the `unique-members`` feature.
- This clutters the API more than anything.
If supporting duplicate keys is required in the future,
it can be easily added as a
const
config option, not a compilation feature.
- This clutters the API more than anything.
If supporting duplicate keys is required in the future,
it can be easily added as a
-
Add the
--json
CLI option for passing JSONs inline.
- Added snapshot tests for
rq
usingtrycmd
.- This is another layer of E2E tests, makes sure documentation examples
in the book are correct, and that our
--help
and--version
outputs remain consistent.
- This is another layer of E2E tests, makes sure documentation examples
in the book are correct, and that our
- We have a book!
- The first part is a usage guide for
rq
, and contains a short JSONPath reference. - Other parts will follow, with a plan to finalize at least the library usage guide before 1.0.0.
- The first part is a usage guide for
-
[breaking] Full match result mode. (#56) This includes a revamp of all the internals that would be too long to describe in the log. In short:
memmem
was rewritten to a custom implementation (courtesy of @charles-paperman)- Each of the result modes has a separate
Recorder
that takes care of producing the results - The results are written to a
Sink
, provided by the user; this might be aVec
, the stdout, or some otherio::Write
implementation. - Matches contain the full byte span of the value matched.
- A lot of
Input
and classifier APIs have massive breaking changes to accomodate this.
-
[breaking] Removed the Recursive engine.
- The Recursive implementation has outlived its usefulness.
Over time it became a near-duplicate of Main,
which was manifested by a need to implement
the same features twice with the exact same code
and to refactor/fix bugs with exact same code changes
but in two different files. We will focus efforts on the Main engine.
The
--engine
CLI option was disabled, as there is only one engine now.
- The Recursive implementation has outlived its usefulness.
Over time it became a near-duplicate of Main,
which was manifested by a need to implement
the same features twice with the exact same code
and to refactor/fix bugs with exact same code changes
but in two different files. We will focus efforts on the Main engine.
The
- Qol improvement by separate test gen crate.
- This removes the confusing
gen-tests
feature from lib, reduces its build dependencies, should improve build times.
- This removes the confusing
- Bump clap from 4.3.10 to 4.3.19.
- Bump colored (dependency of simple_logger) from 2.0.0 to 2.0.4.
- This removes a transitive dependency on atty with a CVE.
- Bump rustflags from 0.1.3 to 0.1.4.
- Bump smallvec from 1.10.0 to 1.11.0.
- Bump thiserror from 1.0.40 to 1.0.44.
- Consistent index result output. (#161)
- The
--result bytes
mode now consistently reports the first byte of the value it matched. This can be used to extract the actual value from the JSON by parsing from the reported byte.
- The
- Remove SHA from --version on crates.io. (#157)
- The Commit SHA part was incorrect, and there seems to be no way to get it when the crate is in registry
- [breaking] Remove
tail-skip
andhead-skip
features.- These are now non-optional and integrated into the engines.
-
Generate strings in classifier tests. (#173, #20)
- Improve classifier correctness tests by including quoted strings with escapes in the generated proptest cases.
-
More tests for wildcard compilation.
- Added more cases for compiling the NFA and minimizing for queries with wildcards.
-
Automated declarative end-to-end engine tests. (#134)
- Engine tests were rewritten to use declarative TOML configurations for ease of creating new tests, maintenance and debugging ease. Test coverage was increased, since compressed variants of inputs are automatically generated and tested, and we now test all combinations of input-engine-result types.
- Bump clap from 4.3.4 to 4.3.10.
- Bump memmap2 from 0.7.0 to 0.7.1.
- Bump vergen from v8.2.1 to v8.2.3
- Rearrange readme to put usage first.
- Update bug report issue form.
- Changed the issue form to be more streamlined and use more polite language.
- Add MSRV to README.
- Parser support for array index selector. (#60)
- Parser now recognizes the array index selector with positive index values conforming to the I-JSON specification.
- Index selector engine support (#132). (#132#132)
- The automaton transition model has been changed to incorporate index-labelled transitions.
- Both engines now support queries with the index selector.
- New
Input
API. (#23#23)- A more abstract API to access the underlying byte stream replacing the reliance of the engines on a direct
&[u8]
slice access, to allow adding buffered input streams (#23) in the future. Two types were added,OwnedBytes
andBorrowedBytes
, to support the current easy scenario of having the bytes already in memory.
- A more abstract API to access the underlying byte stream replacing the reliance of the engines on a direct
- Rename bin to
rq
and lib torsonpath
. - Add long version to CLI.
- Mmap support. (#23)
- Added
MmapInput
which maps a file into memory on unix and windows.
- Added
- The CLI app now automatically decides which input to use, favoring mmap in most cases. This can be overridden with
--force-input
.
- Rename
Label
toJsonString
(#139). (#139#131)query::Label
is nowquery::JsonString
- The
unique-labels
feature is nowunique-members
EngineError:MalformedLabelQuotes
renamed toEngineError:MalformedStringQuotes
- Proptests for parsing array indices queries. (#51)
- Bump clap from 4.1.11 to 4.3.4.
- Bump log from 0.4.17 to 0.4.19.
- Bump proptest from 1.1.0 to 1.2.0.
- Bump simple_logger from 4.1.0 to 4.2.0.
-
Wildcard descendant support.
- You can now use the
..*
/..[*]
selector that selects all nodes in the document it acts upon.
- You can now use the
-
Switch
Structural
toBracketType
. (#10)- The
Opening
andClosing
variants now differentiate between curly and square brackets with a value of theBracketType
enum.
- The
-
Fix parser incorrectly escaping labels.
- Queries like
$['\'']
would cause a parsing error, even though they were valid (match a child with key equal to "'
"). - The
\u
escape sequence is no longer recognized, since without UTF-8 handling they were meaningless. See (#117).
- Queries like
-
Empty query array behavior.
- Running the query
$
on a document[]
was giving zero results. Now correctly matches the root array.
- Running the query
- The grammar in top-level documentation now matches the implementation.
- Added proptests for query parsing.
- Currently checks that correct queries are parsed correctly. We still need tests for error conditions (see #51).
- Properly flow simd feature to dependencies. (#111)
- This fixes build issues with the
aarch64
target. It also turns out our CI did not actually compile to all the targets we claimed it did, which is a bit embarrassing. We now do actually support all Rust Tier 1 targets and run tests for all exceptaarch64-unknown-linux-gnu
, because there's no image for aarch64 on GitHub.
- This fixes build issues with the
-
Bump clap from 4.1.6 to 4.1.11.
-
Bump thiserror from 1.0.38 to 1.0.40.
-
Bump simple_logger from 4.0.0 to 4.1.0.
-
Bump dev-dependency tempfile from v3.3.0 to v3.4.0.
- Resolves a dev-dependency security vulnerability of
remove-dir-all
by removing the dependency entirely.
- Resolves a dev-dependency security vulnerability of
- Faster toggling of commas/colons.
- Shortened toggle by 1 SIMD instruction, improving perf by ~5% on heavily switching queries
- Duplicate results with
.*
on singleton list. (#100#96)- If the query ended with a wildcard selector and was applied to a list with a singleton complex value, that value was being matched twice.
- Bump clap from 4.1.4 to 4.1.6 (#99). (#99)
- Update main plot in README. (#98)
- Better error reporting. (#88)
- Added separate
engine::error::DepthError
type. - Additional context for depth-related
EngineError
s including the character at which depth overflow occurred. - New error,
EngineError::MissingClosingCharacter
reported if the engine reaches end of JSON and cannot match opening characters. - Improvements to the CLI error reporting/display.
- Added separate
- Increase max automaton size to 256 from 128.
- Compiling wildcard child selectors. (#90, #7)
- Expressions parsed in #6 are now compiled into correct automata.
- Wildcard child support in engines. (#9, #8, #73)
- Large overhaul to the query engines to enable processing the wildcard child selector. This closes the #9 epic of wildcard child support.
- Both
main
andrecursive
engines now support wildcard child selectors. - The
commas
feature flag was removed. - Feature flags of
head-skip
,tail-skip
, andunique-members
were introduced to guard optimization paths.- The
head-skip
andtail-skip
features make the code faster without significant tradeoffs. - The
unique-members
feature utilizes the assumption of key uniqueness within a single JSON object to speed up query execution, but it will not work correctly when an object with duplicate keys is given. Currently only the first occurence of such a key will be processed.
- The
- Many changes to the library structure and module visibility.
- Too complex query now produces an error. (#88)
- Previously the compiled automaton was silently truncated, which would cause incorrect results.
- Rename engine modules. (#88)
- The
Runner
trait was renamed toEngine
. - The
stackless
module is nowengine::main
. - The
stack_based
module is nowengine::recursive
. - The
StacklessRunner
is now theMainEngine
, and is also reexported asengine::RsonpathEngine
- The
StackBasedRunner
is now theRecursiveEngine
.
- The
- Added the
Compiler
trait. (#88)- The
compile_query
function creating engines is now part of that trait.
- The
- Rename
NotSupportedError
toNotSupported
. - Moved
result
to a standalone module. - Move all classifiers to
classification
.- Module
classify
renamed toclassification
. - Moved all resumption related things to
classification
proper.
- Module
- Removed only use of
unsafe
outside of SIMD. - Forbid unsafe code outside of simd.
- Added test for heterogenous list.
- Hide
debug
andbin
macros. - Added
Compiler::from_compiled_query
.
-
Bump clap from 4.0.25 to 4.1.4.
-
Bumped a number of dependencies.
- backtrace (required by color-eyre) from 0.3.65 to 0.3.67.
- once_cell (required by color-eyre) from 1.16.0 to 1.17.0.
- owo-colors (required by color-eyre) from 3.3.0 to 3.5.0.
- ppv-lite86 (required by thiserror) from 0.2.16 to 0.2.17.
- itoa (required by simple_logger) from 1.0.2 to 1.0.5.
-
Remove benchmarks crate from workspace.
- This drastically reduces the number of dependencies tracked for the binary.
-
Make some deps optional.
memchr
is now included only with thehead-skip
featurereplace_with
is now included only with thetail-skip
feature
- Update toplevel lib docs.
- Add separate README for lib.
- Updated most of module docs.
- Added architecture diagram to lib README.
-
Wildcard child selector parser support. (#6)
- Both shorthand
.*
and full[*]
forms are recognised.
- Both shorthand
-
Compile-only CLI flag. (#76)
- Specifying
--compile
or-c
will cause rsonpath to compile the query and output its automaton, without running the engine. This option is mutually exclusive with--engine
or providing an input path.
- Specifying
- Compile error on
cargo install rsonpath
. (#86)
- Added install check to release CI/CD. (#86)
- This will catch issues with the simplest
cargo install rsonpath
invocation before release to avoid these issues in the future.
- This will catch issues with the simplest
-
Librification (#41)
- Project split into two crates: binary
rsonpath
and libraryrsonpath-lib
- Project split into two crates: binary
-
Separate quote from structural classifiers. (#17)
-
Implemented flexible classifiers.
-
Implemented depth tail-skipping.
-
Escape classifier boundary error.
-
Correctly set features for rsonpath-lib.
-
Flaky jsonski benches due to their bugs
-
Reenable Windows tests.
-
Update for benchmarks integration.
-
Update workflows and create Release workflow. (#44)
- Created a
release
workflow that automatically build the crate on supported targets and creates a GitHub Release with appropriate artifacts.
- Created a
Updated the rust
workflow to run tests for all configurations supported in release
, and properly run clippy on both SIMD and no-SIMD versions of the code.
List of supported targets at this point:
Target triple | nosimd build | SIMD support |
---|---|---|
aarch64-unknown-linux-gnu | Yes | No |
i686-unknown-linux-gnu | Yes | Yes, avx2+pclmulqdq |
x86_64-unknown-linux-gnu | Yes | Yes, avx2+pclmulqdq |
x86_64-apple-darwin | Yes | No |
i686-pc-windows-gnu | Yes | Yes, avx2+pclmulqdq |
i686-pc-windows-msvc | Yes | Yes, avx2+pclmulqdq |
x86_64-pc-windows-gnu | Yes | Yes, avx2+pclmulqdq |
x86_64-pc-windows-msvc | Yes | Yes, avx2+pclmulqdq |
-
query
module is now panic-free. (#38 )- All errors are now reported via
QueryError
.
- All errors are now reported via
-
Panic-free classifiers and engines. (#39 #40 #31)
- Detectable errors now use proper error types instead of panics. Added lints to prevent adding more panics or undocumented errors.
-
Bumped Criterion to 0.4.0.
-
Removed usage of eyre from library code.
-
Bump simple_logger to 4.0.0.
-
Update clap to v4.
-
Bump a bunch of minor versions.
-
Removed
len_trait
dependency (#46). (#46)
- Classify commas to prepare for the new wildcard selectors
- Non-ASCII characters like (¡) breaking SIMD classification.
- Include usage in README.md.
-
Supported simd is now autodetected
- Instead of relying on the target_feature compiler flag the build script now autodetects whether AVX2 is supported and compiles the correct version.
-
Update to use
criterion_decimal_throughput
. -
Equalise
aligners
versions (0.0.9
across the project). -
Remove unnecessary dependencies.
- Removed
memchr
andstatic_assertions
.
- Removed
-
Changelog, code of conduct, contributing (#2). (#2)
-
Badges for crates.io.
- Engine implementation for child and recursive selectors.