Skip to content

Zstandard v1.4.7

Compare
Choose a tag to compare
@Cyan4973 Cyan4973 released this 17 Dec 03:32
· 2585 commits to dev since this release
645a297

Note : this version features a minor bug, which can be present on systems others than x64 and arm64. Update v1.4.8 is recommended for all other platforms.

v1.4.7 unleashes several months of improvements across many axis, from performance to various fixes, to new capabilities, of which a few are highlighted below. It’s a recommended upgrade.

(Note: if you ever wondered what happened to v1.4.6, it’s an internal release number reserved for synchronization with Linux Kernel)

Improved --long mode

--long mode makes it possible to analyze vast quantities of data in reasonable time and memory budget. The --long mode algorithm runs on top of the regular match finder, and both contribute to the final compressed outcome.
However, the fact that these 2 stages were working independently resulted in minor discrepancies at highest compression levels, where the cost of each decision must be carefully monitored. For this reason, in situations where the input is not a good fit for --long mode (no large repetition at long distance), enabling it could reduce compression performance, even if by very little, compared to not enabling it (at high compression levels). This situation made it more difficult to "just always enable" the --long mode by default.
This is fixed in this version. For compression levels 16 and up, usage of --long will now never regress compared to compression without --long. This property made it possible to ramp up --long mode contribution to the compression mix, improving its effectiveness.

The compression ratio improvements are most notable when --long mode is actually useful. In particular, --patch-from (which implicitly relies on --long) shows excellent gains from the improvements. We present some brief results here (tested on Macbook Pro 16“, i9).

long_v145_v147

Since --long mode is now always beneficial at high compression levels, it’s now automatically enabled for any window size >= 128MB and up.

Faster decompression of small blocks

This release includes optimizations that significantly speed up decompression of small blocks and small data. The decompression speed gains will vary based on the block size according to the table below:

Block Size Decompression Speed Improvement
1 KB ~+30%
2 KB ~+30%
4 KB ~+25%
8 KB ~+15%
16 KB ~+10%
32 KB ~+5%

These optimizations come from improving the process of reading the block header, and building the Huffman and FSE decoding tables. zstd’s default block size is 128 KB, and at this block size the time spent decompressing the data dominates the time spent reading the block header and building the decoding tables. But, as blocks become smaller, the cost of reading the block header and building decoding tables becomes more prominent.

CLI improvements

The CLI received several noticeable upgrades with this version.
To begin with, zstd can accept a new parameter through environment variable, ZSTD_NBTHREADS . It’s useful when zstd is called behind an application (tar, or a python script for example). Also, users which prefer multithreaded compression by default can now set a desired nb of threads with their environment. This setting can still be overridden on demand via command line.
A new command --output-dir-mirror makes it possible to compress a directory containing subdirectories (typically with -r command) producing one compressed file per source file, and reproduce the arborescence into a selected destination directory.
There are other various improvements, such as more accurate warning and error messages, full equivalence between conventions --long-command=FILE and --long-command FILE, fixed confusion risks between stdin and user prompt, or between console output and status message, as well as a new short execution summary when processing multiple files, cumulatively contributing to a nicer command line experience.

New experimental features

Shared Thread Pool

By default, each compression context can be set to use a maximum nb of threads.
In complex scenarios, there might be multiple compression contexts, working in parallel, and each using some nb of threads. In such cases, it might be desirable to control the total nb of threads used by all these compression contexts altogether.

This is now possible, by making all these compression contexts share the same threadpool. This capability is expressed thanks to a new advanced compression parameter, ZSTD_CCtx_refThreadPool(), contributed by @marxin. See its documentation for more details.

Faster Dictionary Compression

This release introduces a new experimental dictionary compression algorithm, applicable to mid-range compression levels, employing strategies such as ZSTD_greedy, ZSTD_lazy, and ZSTD_lazy2. This new algorithm can be triggered by selecting the compression parameter ZSTD_c_enableDedicatedDictSearch during ZSTD_CDict creation (experimental section).

Benchmarks show the new algorithm providing significant compression speed gains :

Level Hot Dict Cold Dict
5 ~+17% ~+30%
6 ~+12% ~+45%
7 ~+13% ~+40%
8 ~+16% ~+50%
9 ~+19% ~+65%
10 ~+24% ~+70%

We hope it will help making mid-levels compression more attractive for dictionary scenarios. See the documentation for more details. Feedback is welcome!

New Sequence Ingestion API

We introduce a new entry point, ZSTD_compressSequences(), which makes it possible for users to define their own sequences, by whatever mechanism they prefer, and present them to this new entry point, which will generate a single zstd-compressed frame, based on provided sequences.

So for example, users can now feed to the function an array of externally generated ZSTD_Sequence:
[(offset: 5, matchLength: 4, litLength: 10), (offset: 7, matchLength: 6, litLength: 3), ...] and the function will output a zstd compressed frame based on these sequences.

This experimental API has currently several limitations (and its relevant params exist in the “experimental” section). Notably, this API currently ignores any repeat offsets provided, instead always recalculating them on the fly. Additionally, there is no way to forcibly specify existence of certain zstd features, such as RLE or raw blocks.
If you are interested in this new entry point, please refer to zstd.h for more detailed usage instructions.

Changelog

There are many other features and improvements in this release, and since we can’t highlight them all, they are listed below:

  • perf: stronger --long mode at high compression levels, by @senhuang42
  • perf: stronger --patch-from at high compression levels, thanks to --long improvements
  • perf: faster decompression speed for small blocks, by @terrelln
  • perf: faster dictionary compression at medium compression levels, by @felixhandte
  • perf: small speed & memory usage improvements for ZSTD_compress2(), by @terrelln
  • perf: minor generic decompression speed improvements, by @helloguo
  • perf: improved fast compression speeds with Visual Studio, by @animalize
  • cli : Set nb of threads with environment variable ZSTD_NBTHREADS, by @senhuang42
  • cli : new --output-dir-mirror DIR command, by @xxie24 (#2219)
  • cli : accept decompressing files with *.zstd suffix
  • cli : --patch-from can compress stdin when used with --stream-size, by @bimbashrestha (#2206)
  • cli : provide a condensed summary by default when processing multiple files
  • cli : fix : stdin input can no longer be confused with user prompt
  • cli : fix : console output no longer mixes stdout and status messages
  • cli : improve accuracy of several error messages
  • api : new sequence ingestion API, by @senhuang42
  • api : shared thread pool: control total nb of threads used by multiple compression jobs, by @marxin
  • api : new ZSTD_getDictID_fromCDict(), by @LuAPi
  • api : zlibWrapper only uses public API, and is compatible with dynamic library, by @terrelln
  • api : fix : multithreaded compression has predictable output even in special cases (see #2327) (issue not present on cli)
  • api : fix : dictionary compression correctly respects dictionary compression level (see #2303) (issue not present on cli)
  • api : fix : return dstSize_tooSmall error whenever appropriate
  • api : fix : ZSTD_initCStream_advanced() with static allocation and no dictionary
  • build: fix cmake script when employing path including spaces, by @terrelln
  • build: new ZSTD_NO_INTRINSICS macro to avoid explicit intrinsics
  • build: new STATIC_BMI2 macro for compile time detection of BMI2 on MSVC, by @Niadb (#2258)
  • build: improved compile-time detection of aarch64/neon platforms, by @bsdimp
  • build: Fix building on AIX 5.1, by @likema
  • build: compile paramgrill with cmake on Windows, requested by @mirh
  • build: install pkg-config file with CMake and MinGW, by @tonytheodore (#2183)
  • build: Install DLL with CMake on Windows, by @BioDataAnalysis (#2221)
  • build: fix : cli compilation with uclibc
  • misc: Improve single file library and include dictBuilder, by @cwoffenden
  • misc: Fix single file library compilation with Emscripten, by @yoshihitoh (#2227)
  • misc: Add freestanding translation script in contrib/freestanding_lib, by @terrelln
  • doc : clarify repcode updates in format specification, by @felixhandte