Skip to content

Releases: pola-rs/polars

Python Polars 1.21.0

24 Jan 17:57
1993d59
Compare
Choose a tag to compare

🚀 Performance improvements

  • Use BitmapBuilder in yet more places (#20868)
  • Make an owned version of append (#20800)
  • Use BitmapBuilder in a lot more places (#20776)

✨ Enhancements

  • Stabilize methods/functions (#20850)
  • Add linear_space (#20678)
  • Improve string → temporal parsing in read_excel and read_ods (#20845)
  • Implement df.unique() on new-streaming engine (#20875)
  • Experimental credential provider support for Delta read/scan/write (#20842)
  • Allow column expressions in DataFrame unnest (#20846)
  • Auto-initialize Python credential providers in more cases (#20843)
  • Add unique operations for Decimal dtype (#20855)
  • Add NDJson sink for the new streaming engine (#20805)
  • Support nested keys in window functions (#20837)
  • Add CSV sink for the new streaming engine (#20804)
  • Periodically check python signals ('CTRL-C' handling) (#20826)
  • Experimental unity catalog client (#20798)
  • Support cumulative aggregations for Decimal dtype (#20802)
  • Account for SurrealDB Python API updates (handle both SurrealDB and AsyncSurrealDB classes) in read_database (#20799)
  • Drop nest-asyncio in favor of custom logic (#20793)
  • Improve window function caching strategy (#20791)
  • Support lakefs:// URI for delta scanner (#20757)
  • Additional support for loading numpy.float16 values (as Float32) (#20769)

🐞 Bug fixes

  • Warn if asof keys not sorted (#20887)
  • Ensure explicit values given to column_widths override autofit in write_excel (#20893)
  • Avoid name collisions and panicking in object conversion (#20890)
  • Incorrect scale used in log and exp for Decimal type (#20888)
  • Don't deep clone manuallydrop in GroupsPosition (#20886)
  • Fix DuplicateError when selecting columns after join_where or cross join + filter (#20865)
  • Incorrect Decimal value for fill_null(strategy="one") (#20844)
  • Fix one edge case (out of many) of int128 literals not working (#20830)
  • Add height check to frame-level row indexing when key is int (#20778)
  • Remove assert that panics on group_by followed by head(n), where n is larger then the frame height (#20819)
  • Selectors should raise on + between themselves (#20825)
  • Fix panic InvalidHeaderValue scanning from S3 on Windows (#20820)
  • Fix clip for Decimal returning wrong values (#20814)
  • Incorrect height from slicing after projecting only the file path column (#20817)
  • Shift mask when skipping Bitpacked values in Parquet (#20810)
  • Error instead of truncate if length mismatch for several str functions (#20781)
  • Support cumulative aggregations for Decimal dtype (#20802)
  • Allow is_in values to be given as custom Collection (#20801)
  • Propagate null instead of panicking in pl.repeat_by() (#20787)
  • Do not print sensitive information to output on POLARS_VERBOSE (#20797)
  • Ignore file cache allocation error if fallocate() is not permitted (#20796)
  • Incorrect logic in assert_series_equal for infinities (#20763)

📖 Documentation

  • Update source URL for legislators-historical.csv (#20858)
  • Update ML part of ecosystem user guide page (#20596)

🛠️ Other improvements

  • Disable 'catalog' in build (#20897)
  • Implement negative slice for new streaming IPC (#20866)
  • Debloat Series bitops (#20873)
  • Reduce python map bloat (#20871)
  • Remove todo and test restriction for new-streaming (#20861)
  • Dispatch to the in-mem engine for AExpr::Gather (#20862)
  • Dispatch to the in-memory engine for multifile sources (#20860)
  • Add tests for open issues (#20857)
  • Mark 'register_startup' as unsafe (#20841)
  • Reduce mode bloat (#20839)
  • Rename ContainsMany to ContainsAny (#20785)
  • Unpin NumPy in type checking workflow (#20792)
  • Add various tests (#20768)
  • Small drive-by's (#20772)
  • Touch the upload probe for the remote benchmark (#20767)

Thank you to all our contributors for making this release possible!
@alexander-beedie, @arnabanimesh, @braaannigan, @burakemir, @coastalwhite, @etiennebacher, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 1.20.0

16 Jan 18:41
725c960
Compare
Choose a tag to compare

⚠️ Deprecations

  • Make parameter of str.to_decimal keyword-only (#20570)

🚀 Performance improvements

  • Extend functionality on BitmapBuilder and use in Growables (#20754)
  • Specialize first/last agg for simple types in new-streaming engine (#20728)
  • Use PyO3 to convert between Python and Rust datetimes (#20660)
  • Improve state caching and parallelism of window functions (#20689)
  • Broadcast without materialization in concat_arr (#20681)
  • Cache rolling groups (#20675)
  • Use downcast_ref instead of dtype equality in <dyn SeriesTrait as AsRef<ChunkedArray<T>> (#20664)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Make Parquet verify_dict_indices SIMD (#20623)
  • Move to zlib-rs by default and use zstd::with_buffer (#20614)
  • Skip filter expansion in eager (#20586)
  • Improve unique pred-pd (#20569)

✨ Enhancements

  • Allow different python versions for pickle (#20740)
  • Add SQL support for the NORMALIZE string function (#20705)
  • Add 'allow_exact_matches' join_asof' (#20723)
  • Add new-streaming first/last aggregations (#20716)
  • Add Parquet Sink to new streaming engine (#20690)
  • Make automatic use of Azure storage account keys opt-in (#20652)
  • Reduce scan_csv() (and friends') memory usage when using BytesIO (#20649)
  • Improve GroupsProxy/GroupsPosition to be sliceable and cheaply cloneable (#20673)
  • Add str.normalize() (#20483)
  • Allow more group_by agg expressions in the new streaming engine (#20663)
  • Support loading Excel Table objects by name (#20654)
  • Support writing to file objects from write_excel (#20638)
  • Raise DuplicateError if given a pyarrow Table object with duplicate column names (#20624)
  • Support writing partitioned parquet to cloud (#20590)
  • Add hint to error message for extra struct field in JSON (#20612)
  • Add index_of() function to Series and Expr (#19894)
  • Update sqlparser-rs, enabling "LEFT" keyword to be optional for anti/semi joins in SQL queries (#20576)
  • Add cat.starts_with/cat.ends_with (#20257)

🐞 Bug fixes

  • Avoid blocking on async runtime when resolving cloud scans (#20750)
  • Fix allow_invalid_certificates being ignored in storage_options (#20744)
  • Incorrect output type for map_groups returning all-NULL column (#20743)
  • Fix unique(maintain_order=True) raising InvalidOperationError for null array (#20737)
  • Don't collapse into a Nested Loop Join if the cross join maintains order (#20729)
  • Don't serialize credentials provider (#20741)
  • Fix Series.n_unique raising for list of struct (#20724)
  • Fix incorrect top-k by sorted column, fix head() returning extra rows (#20722)
  • Add outer validity to AnyValueBufferTrusted for structs (#20713)
  • Don't partition group-by with non-scalar literals in agg (#20704)
  • Fix xor operation of selector with Expr (#20702)
  • Incorrect view buffer dedup (#20691)
  • Only verify Parquet ConvertedType if no LogicalType is given (#20682)
  • Validate length of schema_overrides in read_csv (#20672)
  • Fix map_elements ignoring skip_nulls=True for struct dtype (#20668)
  • Check for MAP-GROUPS in cloud-eligible (#20662)
  • Fix empty output of to_arrow() on filtered unit height DataFrame (#20656)
  • Add .default to azure credential provider scope URL (#20651)
  • Fix join_asof panicking for invalid tolerance input (#20643)
  • Incorrect flag check on is_elementwise (#20646)
  • Don't panic but set null type if type is unknown (#20647)
  • Fix performance regression for DataFrame serialization/pickling (#20641)
  • Fix Int128 dtype serialization (#20629)
  • Ensure read_excel and read_ods support reading from raw bytes for all engines (#20636)
  • Ensure that SQL LIKE and ILIKE operators support multi-line matches (#20613)
  • Properly broadcast in sort_by (#20434)
  • Properly load nested Parquet Statistics (#20610)
  • AWS environment config was not loaded when credential provider was used (#20611)
  • Fix order observability of group-by-dyn (#20615)
  • Soundness when loading Parquet string statistics (#20585)
  • Fix error filtering after with_columns() on unit height LazyFrame (#20584)
  • Propagate tenant_id to CredentialProviderAzure if given (#20583)
  • Restore symbols on Apple by bumping nightly version (#20563)
  • Fix type annotation of str.strip_chars_* methods (#20565)
  • Fix variable name in error message for "unsupported data type" in rolling and upsampling operations (#20553)

📖 Documentation

  • Add more information for cross joins (#20753)
  • Fix typo in sql functions (cosinus -> cosine) (#20676)
  • Add links to read_excel "engine_options" and "read_options" docstring (#20661)
  • Fix small typo in plugins (polars-dt -> polars-st) (#20657)
  • Add polars-h3 and polars-st to plugin list (#20653)
  • Add docs reference for Field (#20625)
  • Update DataFrame join examples (#20587)
  • Miscellaneous minor updates/fixes (#20573)
  • Update "group_by_rolling" (deprecated) to "rolling" in user guide (#20548)

📦 Build system

  • Update to official release of PyO3 0.23.4 (#20683)
  • Officially support Python 3.13 (#20549)

🛠️ Other improvements

  • Fix remote benchmark script (#20755)
  • Fix tests (#20745)
  • Simplify hive predicate handling in NEW_MULTIFILE (#20730)
  • Add tests for various open issues (#20720)
  • Fixes an Excel test following new fastexcel release (#20703)
  • Add tests for various open issues that have been fixed (#20680)
  • Don't include debug symbols in benchmark run (#20571)
  • Implement CSV, IPC and NDJson in the MultiScanExec node (#20648)
  • Don't rely on argument order of optimization_toggle (#20622)
  • Fix Python deps installation in remote-benchmark workflow (#20619)
  • Fix flaky categorical test (#20591)
  • Bump multiversion from 0.7 to 0.8 (#20543)
  • Remove unused nested function in LazyFrame.fill_null (#20558)
  • Improve bin size info (#20551)

Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @MarcoGorelli, @MoizesCBF, @SamuelAllain, @alexander-beedie, @bschoenmaeckers, @coastalwhite, @eitsupi, @etiennebacher, @itamarst, @jqnatividad, @lukemanley, @mcrumiller, @nameexhaustion, @orlp, @ritchie46 and @stinodego

Python Polars 1.19.0

03 Jan 21:11
841c387
Compare
Choose a tag to compare

🚀 Performance improvements

  • Collapse expanded filters in eager (#20493)
  • Remove predicate from IR::DataFrame (#20492)
  • Use different binview dedup strategy depending on chunks ratio (#20451)
  • Generalize the arg_sort fast path onto Column (#20437)
  • Dedup binviews up front (#20449)
  • Re-enable common subplan elim for new-streaming engine (#20443)
  • Don't collect all LHS arrays in gather (#20441)
  • Remove prepare_series for gather kernels (#20439)
  • Don't always take all data buffers when gathering views (#20435)

✨ Enhancements

  • Add Int128 IO support for csv & ipc (#20535)
  • Support arbitrary expressions in 'join_where' (#20525)
  • Allow use of Python types in cs.by_dtype and col (#20491)
  • Add an "include_file_paths" parameter to read_excel and read_ods (#20476)
  • Allow more join lossless casting (#20474)
  • Accept more generic Iterable[bool] in Series.filter (#20431)
  • Allow loading data from multiple Excel/ODS workbooks and worksheets (#20465)

🐞 Bug fixes

  • Output index type instead of u32 for sum_horizontal with boolean inputs (#20531)
  • Fix more global categorical issues (#20547)
  • Update eager join doctest on multiple columns (#20542)
  • Revert categorical unique code (#20540)
  • Add unique fast path for empty categoricals (#20536)
  • Fix various Int128 operations (#20515)
  • Fix global cat unique (#20524)
  • Fix union (#20523)
  • Fix rolling aggregations for various integer types (#20512)
  • Ensure ignore_nulls is respected in horizontal sum/mean (#20469)
  • Fix incorrectly added sorted flag after append for lexically ordered categorical series (#20414)
  • More Int128 testing and related fixes (#20494)
  • Validate column names in unique() for empty DataFrames (#20411)
  • Implement list.min and list.max for list[i128] (#20488)
  • Decimal from physical in horizontal min/max and shift (#20487)
  • Don't remove sort if first/last strategy is set in unique (#20481)
  • Fix join literal behavior (#20477)
  • Validate asof join by args in IR resolving phase (#20473)
  • Fix align_frames with single row panicking (#20466)
  • Allow multiple column sort for Decimal (#20452)
  • Fix mode panicking for String dtype (#20458)
  • Return correct schema for sum_horizontal with boolean dtype (#20459)
  • Fix return type for add_business_days, millennium, century and combine methods in Series.dt namespace (#20436)

📖 Documentation

  • Fix typo in DataFrame.cast (#20532)
  • Fix flaky doctests (#20516)
  • Add examples for bitwise expressions (#20503)
  • Clarify the join pre-condition of join_asof (#20509)
  • Fix Expr.all description of Kleene logic (#20409)

🛠️ Other improvements

  • Increase categorical test coverage (#20514)
  • Report wheel sizes (#20541)
  • Add tests for floor/ceil on integers (#20479)
  • Expose and rewrite 'can_pre_agg' (#20450)
  • Skip test on windows; kuzu import segfaults (#20463)
  • Add a TypeCheckRule to the optimizer (#20425)

Thank you to all our contributors for making this release possible!
@Biswas-N, @IndexSeek, @Prathamesh-Ghatole, @Terrigible, @alexander-beedie, @brifitz, @coastalwhite, @dependabot, @dependabot[bot], @jqnatividad, @lukemanley, @mcrumiller, @orlp, @ritchie46 and @siddharth-vi

Python Polars 1.18.0

24 Dec 08:51
Compare
Choose a tag to compare

🏆 Highlights

🚀 Performance improvements

  • Order observability optimizations (#20396)
  • Purge ChunkedArray Metadata (#20371)
  • Explicit transpose in new-streaming equi-join finalize (#20363)
  • Cache dtype on ExprIR (#20331)
  • Lower overhead for BytecodeParser on introspection of incompatible UDFs (#20280)

✨ Enhancements

  • Always resolve dynamic types in schema (#20406)
  • Support loading data from multiple Excel/ODS workbooks (#20404)
  • Add "drop_empty_cols" parameter for read_excel and read_ods (#20430)
  • Order observability optimizations (#20396)
  • Add FirstArgLossless supertype (#20394)
  • Add dt.replace (#19708)
  • Polars build for Pyodide (#20383)
  • Add Azure credential provider using DefaultAzureCredential() (#20384)
  • Add env var to ignore file cache allocate error (#20356)
  • Enable joins between compatible differing numeric key columns (#20332)
  • Cache dtype on ExprIR (#20331)
  • Serialize DataFrame/Series using IPC in serde (#20266)
  • Improve error message on SchemaError (#20326)
  • Use better error messages when opening files (#20307)
  • Add 'skip_lines' for CSV (#20301)
  • Allow subtraction of time dtype columns (#20300)
  • Add bin.reinterpret (#20263)
  • Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
  • Streamline creation of empty frame from Schema (#20267)
  • Add cat.len_chars and cat.len_bytes (#20211)
  • Expose AexprArena (#20230)

🐞 Bug fixes

  • Fix nullable object in map_elements (#20422)
  • Properly handle to_physical_repr of nested types (#20413)
  • Properly raise UDF errors (#20417)
  • Workaround for mmap crash under Emscripten (#20418)
  • Fix using new_columns in scan_csv with compressed file (#20412)
  • Fix return type of Series.dt.add_business_days (#20402)
  • Fix decimal series dispatch (#20400)
  • Fix decimal arithmetic schema (#20398)
  • Raise on categorical search_sorted (#20395)
  • Fix plotting f-strings and docstrings (#20399)
  • Don't try to load non-existend List/FSL statistics (#20388)
  • Propagate nulls for float methods on all numeric types (#20386)
  • Add env var to ignore file cache allocate error (#20356)
  • Flip order on right join (#20358)
  • Correctly parse special float values in from_repr (#20351)
  • Fix incorrect object store caching for ADLS URI (#20357)
  • Use the same encoding for nullable as non-nullable arrays (#20323)
  • Improve error message on SchemaError (#20326)
  • Boolean optional slice pushdown (#20315)
  • Properly handle from_physical for List/Array (#20311)
  • Ignore quotes in csv comments (#20306)
  • Ensure pl.datetime returns empty column when input columns are empty (#20278)
  • Ensure output height does not change on lazy projection pushdown with aggregations (#20223)
  • Fix error writing on Windows to locations outside of C drive (#20245)
  • Incorrect comparison in some cases with filtered list/array columns (#20243)
  • Ensure height is maintained in SQL SELECT 1 FROM (#20241)
  • Properly account for updated Categorical in .unique() kernel (#20235)

📖 Documentation

  • Improve docstring clarity (#20416)
  • Update GPU engine installation instructions to remove --extra-index-url from CUDA 12 packages (#20381)
  • Remove Plugins overview page without information (#20348)
  • Small fixes/clarifications in user guide (#20335)
  • Improve docs about NaN (#20310)
  • Fix substr function param definition (#19054)
  • Include parquet options in BigQuery I/O write sample (#20292)
  • Fix typo in fork warning (#20258)

📦 Build system

  • Add project.dynamic = ["version"] to pyproject.toml (#20345)
  • Update pyo3 and numpy crates to version 0.23 (#20111)
  • Build wheels for ARM Windows in Python release workflow (#20247)

🛠️ Other improvements

  • Enable masked out list, struct and array elements in parametric tests (#20365)
  • Move hive partitioning/multi-file handling outside of readers (#20203)
  • Purge ChunkedArray Metadata (#20371)
  • Correcting misspelled return value and unifying regional spelling (#20375)
  • Add test for select(len()) (#20343)
  • Make parametric tests include pl.List and pl.Array by default (#20319)
  • Use Column in Row Encoding (#20312)
  • Don't warn on fork hook (#20309)
  • Don't deconstruct CsvParseOptions (#20302)
  • Allow decoding of non-Polars arrow dictionaries in Arrow and Parquet (#20248)
  • Prepare test suite for Python 3.13 support (#20297)
  • Add FunctionCastOptions and conservative IR-level cast type-checking (#20286)
  • Add more descriptive error message for failure of vstack/extend (#20299)
  • Clean up some remnants of Python 3.8 support (#20293)
  • Add new Int128Type (#20232)
  • Add test for BytesIO overwritten after scan (#20240)
  • Expose AexprArena (#20230)

Thank you to all our contributors for making this release possible!
@Jesse-Bakker, @Terrigible, @ZemanOndrej, @alexander-beedie, @balbok0, @beckernick, @bschoenmaeckers, @coastalwhite, @georgestagg, @hamdanal, @haocheng6, @kszlim, @lukemanley, @mcrumiller, @nameexhaustion, @noexecstack, @orlp, @ptiza, @r-brink, @ritchie46, @rodrigogiraoserrao, @stijnherfst, @stinodego, @tswast and @zero-stroke

Python Polars 1.17.1

09 Dec 13:56
87feed7
Compare
Choose a tag to compare

🐞 Bug fixes

  • Fix incorrect lazy select(len()) with some select orderings (#20222)
  • Fix assertion panic on LazyFrame scratch.is_empty() (#20219)

Thank you to all our contributors for making this release possible!
@nameexhaustion and @ritchie46

Rust Polars 0.45.0

08 Dec 11:16
58a38af
Compare
Choose a tag to compare

💥 Breaking changes

  • Remove dedicated sink_(parquet/ipc)_cloud functions (#20164)
  • Experimental cloud write support (#20129)

🚀 Performance improvements

  • Add fast paths for series.arg_sort and dataframe.sort (#19872)
  • Utilize the RangedUniqueKernel for Enum/Categorical (#20150)
  • Reduce memory copy when scanning from Python objects (#20142)
  • Don't instantiate validity mask when unneeded in Parquet (#20149)
  • Expand more filters (#20022)
  • Cache the DataFrame schema in get_column_index (#20021)
  • Reduce the size of row encoding UTF-8 (#19911)
  • Memoize duplicates in rolling-gb-dyn (#19939)
  • More efficient row encoding for pl.List (#19907)
  • Half the size of Booleans in row encoding (#19927)
  • Rolling 'iter_lookbehind' breeze through duplicates (#19922)
  • Initially trim leading and trailing filtered rows (#19850)
  • Increase default async thread count for low core count systems (#19829)
  • Move row group decode off async thread for local streaming parquet scan (#19828)
  • Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
  • Improve DataFrame.sort().limit/top_k performance (#19731)
  • Improve cloud scan performance (#19728)
  • Fix quadratic 'with_columns' behavior (#19701)
  • Improve hive partition pruning with datetime predicates from SQL (#19680)
  • Allow for arbitrary skips in Parquet Dictionary Decoding (#19649)
  • Reorder conditions in is_leap_year (#19602)
  • Rechunk in DataFrame.rows if needed (#19628)
  • Dispatch Parquet Primitive PLAIN decoding to faster kernels when possible (#19611)
  • Use faster iteration in 'starts_with'/'ends_with' (#19583)
  • Branchless Parquet Prefiltering (#19190)

✨ Enhancements

  • Retry with reloaded credentials on cloud error (#20185)
  • Support reading Enum dtype from csv (#20188)
  • Allow sorting of lists and arrays (#20169)
  • Add maintain_order parameter to joins (#20026)
  • Allow for to_datetime / strftime to automatically parse dates with single-digit hour/minute/second (#20144)
  • Experimental cloud write support (#20129)
  • Allow setting and reading custom schema-level IPC metadata (#20066)
  • Add optimized row encoding for Decimals (#20050)
  • Add drop_nans method to DataFrame and LazyFrame (#20029)
  • Catch use of 'polars' in to_string for non-Duration dtypes and raise an informative error (#19977)
  • Add AhoCorasick backed 'find_many' (#19952)
  • Speed up starts_with for small prefixes (#19904)
  • Auto-enable hive partitioning if hive_schema was given (#19902)
  • Add pl.concat_arr to concatenate columns into an Array column (#19881)
  • Support both "iso" and "iso:strict" format options for dt.to_string (#19840)
  • Add rounding for Decimal type (#19760)
  • Improved array arithmetic support (#19837)
  • Raise informative error on Unknown unnest (#19830)
  • Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
  • Allow specification of chunk_size on LazyCsvReader.read_options (#19819)
  • Add an is_literal method to expression meta namespace (#19773)
  • A different approach to warning users of fork() issues with Polars (#19197)
  • Add dylib (#19759)
  • Add IPC source node for new streaming engine (#19454)
  • Implement max/min methods for dtypes (#19494)
  • Improve hive partition pruning with datetime predicates from SQL (#19680)
  • Parallel IPC sink for the new streaming engine (#19622)
  • Add SQL support for RIGHT JOIN, fix an issue with wildcard aliasing (#19626)
  • Add show_graph to display a GraphViz plot for expressions (#19365)

🐞 Bug fixes

  • Don't trigger length check in array construction (#20205)
  • Allow row encoding for 32-bit architectures (e.g. WASM) (#20186)
  • Properly project unordered column in parquet prefiltered (#20189)
  • Csv stop simd cache if eol char is hit (#20199)
  • Estimated size for object (#20191)
  • Respect parallel argument in parquet (#20187)
  • Only validate UTF-8 for selected items when all below len 128 (#20183)
  • Serialize categories of Enum in arrow metadata (#20181)
  • Don't use RLE encoding for Parquet Boolean (#20172)
  • Invalid bitwise_xor for ScalarColumn (#20140)
  • Add temporal feature gate in is_elementwise_top_level (#20177)
  • Column name mismatch or not found in Parquet scan with filter (#20178)
  • Raise if apply returns different types (#20168)
  • Deal with masked out list elements (#20161)
  • Fix index out of bounds in uniform_hist_count (#20133)
  • Implement arg_sort for Null series (#20135)
  • Handle slice pushdown in PythonUDF GroupBy (#20132)
  • Check shape for *_horizontal functions (#20130)
  • Properly coerce types in lists (#20126)
  • Incorrect aggregation of empty groups after slice (#20127)
  • DataFrame .get_column after drop_in_place (#20120)
  • Subtraction with underflow on empty FixedSizeBinaryArray (#20109)
  • Materialize smallest dyn ints to use feature gate for i8/i16 (#20108)
  • Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series (#20077)
  • Only slice after sort when slice is smaller than frame length (#20084)
  • Preserve Series name in __rpow__ operation (#20072)
  • Allow nested is_in() in when()/then() for full-streaming (#20052)
  • Fix datetime cast behavior for pre-epoch times (#19949)
  • Improve hist binning around breakpoints (#20054)
  • Fix invalid len due to projection pushdown selection of scalar (#20049)
  • Fix empty scalar agg type (#20051)
  • Improve binning in Series.hist with bin_count when all values are the same (#20034)
  • Less intrusive forking warnings (#20032)
  • Reading nullable sliced / masked Categoricals from Parquet (#20024)
  • Regression in hist panicking on out of bounds index (#20016)
  • Fix starts_with out of bounds (#20006)
  • Fix incorrect column order for parquet scan with hive columns in file (#19996)
  • Incorrectly gave list.len() for masked-out rows (#19999)
  • Bug fix in existing fast path for sorted series (#20004)
  • Incorrect collect_schema() for fill_null() after an aggregation expression in group-by context (#19993)
  • Fix Decimal type fill_null (#19981)
  • Fix panic on schema merge for prefiltering (#19972)
  • Fix lazy frame join expression (#19974)
  • Fix gather_every for Scalar (#19964)
  • Toggle 'fast_unique' on new_from_index (#19956)
  • Raise proper error message when too small interval is passed to datetime_range (#19955)
  • Fix scalar object (#19940)
  • Raise InvalidOperationError for invalid float to decimal casts (e.g. Inf, NaN) (#19938)
  • Fix panic with combination of hive and parquet prefiltering (#19905)
  • Fix panic when joining with empty frame (debug only) (#19896)
  • Fix incorrect result from inequality filter after join on LazyFrame (#19898)
  • Misleading ShapeError error message on dataframe creation (#19901)
  • Fix panic with empty delta scan, or empty parquet scan with a provided schema (#19884)
  • Ensure type object of inputs for cached any-value conversion functions are kept alive (#19866)
  • Fix panic using scan_parquet().with_row_index() with hive partitioning enabled (#19865)
  • Improve histogram bin logic (#18761)
  • Raise informative error instead of panicking for list arithmetic on some invalid dtypes (#19841)
  • Properly handle Zero-Field Structs in row encoding (#19846)
  • Incorrect explode schema for LazyFrame.explode() (#19860)
  • Ensure List element truncation ellipses respect ASCII* table formats (#19835)
  • Validate subnodes in validate IR (#19831)
  • Raise if merge non-global categoricals in unpivot (#19826)
  • Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
  • Don't panic if column not found (#19824)
  • Fix gather of Scalar null + idx w/ validity (#19823)
  • Fix object chunked gather (#19811)
  • Fix inconsistency between code and comment (#19810)
  • Fix filter scalar nulls (#19786)
  • Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
  • Fix scanning google cloud with service account credentials file (#19782)
  • Fix incorrect filter after right-join on LazyFrame (#19775)
  • Fix incorrect lazy schema for explode on array columns (#19776)
  • Fix incorrect lazy schema for aggregations (#19753)
  • Fix validation for inner and left join when join_nulls unflaged (#19698)
  • SQL ELSE clause should be implicitly NULL when omitted (#19714)
  • In group_by_dynamic, period and every were getting applied in reverse order for the window upper boundary (#19706)
  • Only allow list.to_struct to be elementwise when width is fixed (#19688)
  • Make Array arithmetic ops fully elementwise (#19682)
  • Update line-splitting logic in batched CSV reader (#19508)
  • Fix incorrect lazy schema for explode() in agg() (#19629)
  • Fix filter incorrectly pushed past struct unnest when unnested column name matches upper column name (#19638)
  • Ensure mean_horizontal raises on non-numeric input (#19648)
  • Reorder conditions in is_leap_year (#19602)
  • Copy height in .vstack() for empty dataframes (#19641) (#19642)
  • Run join type coercion with correct schemas active (#19625)
  • Correct wildcard and input expansion for some more functions (#19588)
  • Allow .struct.with_fields inside list.eval (#19617)
  • Sortedness was incorrectly being preserved in dt.offset_by when offsetting by non-constant durations in the timezone-naive case (#19616)
  • Fix incorrect scan_parquet().with_row_index() with non-zero slice or with streaming collect (#19609)
  • Fix mask and validity confusion in Parquet String decoding (#19614)
  • Parquet decoding of nested dictionary values (#19605)
  • Do not attempt to load default credentials when credential_provider is given (#19589)
  • Fix gather len in group-by state (#19586)
  • Added input validation for explode operation in the array namespace (#19163)
  • Improve error message (#19546)
  • Fix predica...
Read more

Python Polars 1.17.0

08 Dec 10:26
5f6bc77
Compare
Choose a tag to compare

🚀 Performance improvements

  • Add fast paths for series.arg_sort and dataframe.sort (#19872)
  • Much faster Series construction from subclasses of standard Python types (#20166)
  • Utilize the RangedUniqueKernel for Enum/Categorical (#20150)
  • Reduce memory copy when scanning from Python objects (#20142)
  • Construct Series for bytes/binary data 10x faster when dtype not explicitly set (#20157)
  • Don't instantiate validity mask when unneeded in Parquet (#20149)

✨ Enhancements

  • Retry with reloaded credentials on cloud error (#20185)
  • Support reading Enum dtype from csv (#20188)
  • Improve dtype inference and load for DataFrame cols constructed from Python Enum values (#20180)
  • Allow sorting of lists and arrays (#20169)
  • Add maintain_order parameter to joins (#20026)
  • Allow for to_datetime / strftime to automatically parse dates with single-digit hour/minute/second (#20144)
  • Issue warning when using to_struct() without a list of field names (#20158)
  • Experimental cloud write support (#20129)
  • Add lazy support for pl.select (#20091)
  • Enable view arrow export in write_delta (#20092)

🐞 Bug fixes

  • Don't trigger length check in array construction (#20205)
  • Allow row encoding for 32-bit architectures (e.g. WASM) (#20186)
  • Properly project unordered column in parquet prefiltered (#20189)
  • Csv stop simd cache if eol char is hit (#20199)
  • Estimated size for object (#20191)
  • Respect parallel argument in parquet (#20187)
  • Only validate UTF-8 for selected items when all below len 128 (#20183)
  • Serialize categories of Enum in arrow metadata (#20181)
  • Don't use RLE encoding for Parquet Boolean (#20172)
  • Invalid bitwise_xor for ScalarColumn (#20140)
  • Series construct with large nested u64 (#20167)
  • Add temporal feature gate in is_elementwise_top_level (#20177)
  • Column name mismatch or not found in Parquet scan with filter (#20178)
  • Raise if apply returns different types (#20168)
  • Deal with masked out list elements (#20161)
  • Fix index out of bounds in uniform_hist_count (#20133)
  • Implement arg_sort for Null series (#20135)
  • Handle slice pushdown in PythonUDF GroupBy (#20132)
  • Check shape for *_horizontal functions (#20130)
  • Properly coerce types in lists (#20126)
  • Incorrect aggregation of empty groups after slice (#20127)
  • DataFrame .get_column after drop_in_place (#20120)
  • Subtraction with underflow on empty FixedSizeBinaryArray (#20109)
  • Materialize smallest dyn ints to use feature gate for i8/i16 (#20108)
  • Return null instead of 0. for rolling_std when window contains a single element and ddof=1 and there are nulls elsewhere in the Series (#20077)
  • Only slice after sort when slice is smaller than frame length (#20084)
  • Preserve Series name in __rpow__ operation (#20072)
  • Allow nested is_in() in when()/then() for full-streaming (#20052)

📖 Documentation

  • Add more Rust examples to User Guide (#20194)
  • Expand plotting docs (#19719)
  • Fix Rust examples in user guide (#20075)
  • Update by param description for rolling_*_by functions (#19715)
  • Correct supported compression formats (#20085)
  • Specify strictness in cast (#20067)

📦 Build system

  • Upgrade sqlparser-rs from version 0.49 to 0.52 (#20110)
  • Bump memmap2 to version 0.9 (#20105)
  • Bump object_store to version 0.11 (#20102)
  • Bump fs4 to version 0.12 (#20101)
  • Bump thiserror to version 2 (#20097)
  • Bump atoi_simd to version 0.16 (#20098)
  • Bump chrono-tz to 0.10 (#20094)
  • Update Rust dependency ndarray to 0.16 (#20093)
  • Bump Rust toolchain to nightly-2024-11-28 (#20064)

🛠️ Other improvements

  • Deprecate ddof parameter for correlation coefficient (#20197)
  • Move Bitwise aggregations to FunctionExpr (#20193)
  • Add ragged lines test (#20182)
  • Set delta version check higher (#20153)
  • Fix typo in assertion in datatype copy test (#20121)
  • Move horizontal methods to polars-ops (#20134)
  • Remove useless SeriesTrait::get implementations (#20136)
  • Add a bunch more automated row encoding sortedness tests (#20056)

Thank you to all our contributors for making this release possible!
@DzenanJupic, @MarcoGorelli, @YichiZhang0613, @alexander-beedie, @coastalwhite, @dependabot, @dependabot[bot], @flowlight0, @henryharbeck, @iharthi, @ion-elgreco, @jqnatividad, @lukapeschke, @lukemanley, @mcrumiller, @nameexhaustion, @ptiza, @ritchie46, @siddharth-vi, @stijnherfst, @stinodego and @wsyxbcl

Python Polars 1.16.0

29 Nov 11:22
44ddbc2
Compare
Choose a tag to compare

🚀 Performance improvements

  • Expand more filters (#20022)
  • Cache the DataFrame schema in get_column_index (#20021)

✨ Enhancements

  • Enable creation of independently reusable Config instances (#20053)
  • Improved error message on invalid Python Enum init (#20060)
  • Improve Polars Enum dtype init from standard Python enums (#19997)
  • Add optimized row encoding for Decimals (#20050)
  • Add drop_nans method to DataFrame and LazyFrame (#20029)

🐞 Bug fixes

  • Improve hist binning around breakpoints (#20054)
  • Fix invalid len due to projection pushdown selection of scalar (#20049)
  • Fix empty scalar agg type (#20051)
  • Improve binning in Series.hist with bin_count when all values are the same (#20034)
  • Less intrusive forking warnings (#20032)
  • Reading nullable sliced / masked Categoricals from Parquet (#20024)
  • Regression in hist panicking on out of bounds index (#20016)
  • Fix starts_with out of bounds (#20006)
  • Fix incorrect column order for parquet scan with hive columns in file (#19996)
  • Incorrectly gave list.len() for masked-out rows (#19999)
  • Bug fix in existing fast path for sorted series (#20004)
  • Incorrect collect_schema() for fill_null() after an aggregation expression in group-by context (#19993)
  • Fix row_by_key typing (#19888)

📖 Documentation

  • Remove note about guaranteed left join order (#20048)
  • Fix broken links to user guide (#19989)

📦 Build system

Thank you to all our contributors for making this release possible!
@alexander-beedie, @coastalwhite, @gab23r, @lukemanley, @mcrumiller, @nameexhaustion, @ritchie46, @siddharth-vi, @stijnherfst and @stinodego

Python Polars 1.15.0

25 Nov 21:55
f0d087d
Compare
Choose a tag to compare

🚀 Performance improvements

  • Reduce the size of row encoding UTF-8 (#19911)
  • Memoize duplicates in rolling-gb-dyn (#19939)
  • More efficient row encoding for pl.List (#19907)
  • Half the size of Booleans in row encoding (#19927)
  • Rolling 'iter_lookbehind' breeze through duplicates (#19922)
  • Initially trim leading and trailing filtered rows (#19850)

✨ Enhancements

  • Catch use of 'polars' in to_string for non-Duration dtypes and raise an informative error (#19977)
  • Add AhoCorasick backed 'find_many' (#19952)
  • Allow Python Enums as dtype inputs (#19926)
  • Speed up starts_with for small prefixes (#19904)
  • Auto-enable hive partitioning if hive_schema was given (#19902)
  • Add pl.concat_arr to concatenate columns into an Array column (#19881)
  • Support both "iso" and "iso:strict" format options for dt.to_string (#19840)
  • Add rounding for Decimal type (#19760)
  • Improved array arithmetic support (#19837)

🐞 Bug fixes

  • Fix Decimal type fill_null (#19981)
  • Fix panic on schema merge for prefiltering (#19972)
  • Fix lazy frame join expression (#19974)
  • Fix gather_every for Scalar (#19964)
  • Toggle 'fast_unique' on new_from_index (#19956)
  • Parse uppercase config keys (#19852)
  • Raise proper error message when too small interval is passed to datetime_range (#19955)
  • Fix scalar object (#19940)
  • Raise InvalidOperationError for invalid float to decimal casts (e.g. Inf, NaN) (#19938)
  • Address indexing edge-case with numpy arrays (#19895)
  • Fix panic with combination of hive and parquet prefiltering (#19905)
  • Fix panic when joining with empty frame (debug only) (#19896)
  • Fix incorrect result from inequality filter after join on LazyFrame (#19898)
  • Misleading ShapeError error message on dataframe creation (#19901)
  • Fix panic with empty delta scan, or empty parquet scan with a provided schema (#19884)
  • Ensure type object of inputs for cached any-value conversion functions are kept alive (#19866)
  • Improve export from 2D Array dtype columns to PyTorch Tensors (to_torch) and Jax Arrays (to_jax) (#19862)
  • Fix panic using scan_parquet().with_row_index() with hive partitioning enabled (#19865)
  • Improve histogram bin logic (#18761)
  • Raise informative error instead of panicking for list arithmetic on some invalid dtypes (#19841)
  • Properly handle Zero-Field Structs in row encoding (#19846)
  • Incorrect explode schema for LazyFrame.explode() (#19860)
  • DataFrame rows_by_key returning key tuples with elements in wrong order (#19486)
  • Ensure List element truncation ellipses respect ASCII* table formats (#19835)

📖 Documentation

  • Remove duplicate sentence in Series.bottom_k docstring (#19947)
  • Complete parameters description and add an example for clip() (#19875)
  • Fix some warnings during docs build (#19848)

📦 Build system

  • Use public windows runners in python release (#19982)
  • Add windows-aarch64 to python binaries (#19966)

🛠️ Other improvements

  • Minor non-breaking space (&nbsp;) tweak for HTML rendering (#19864)
  • Implement nested row encoding / decoding (#19874)
  • Switch back to PyO3 0.22 (#19851)
  • Adjust flaky with_columns test (#19844)
  • Add proper tests for row encoding (#19843)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @barak1412, @coastalwhite, @etiennebacher, @ion-elgreco, @itamarst, @lukemanley, @mcrumiller, @mhogervo, @nameexhaustion, @orlp, @ritchie46, @stijnherfst and @stinodego

Python Polars 1.14.0

17 Nov 18:50
34ee4ee
Compare
Choose a tag to compare

🚀 Performance improvements

  • Increase default async thread count for low core count systems (#19829)
  • Move row group decode off async thread for local streaming parquet scan (#19828)
  • Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)

✨ Enhancements

  • Raise informative error on Unknown unnest (#19830)
  • Support DataFrame init from raw SQLAlchemy rows (#19820)
  • Support use of Duration in to_string, ergonomic/perf improvement, tz-aware Datetime bugfix (#19697)
  • Add an is_literal method to expression meta namespace (#19773)
  • A different approach to warning users of fork() issues with Polars (#19197)

🐞 Bug fixes

  • Fix read_database(…,iter_batches=True) type annotations (#19832)
  • Validate subnodes in validate IR (#19831)
  • Raise if merge non-global categoricals in unpivot (#19826)
  • Type hints for window_size incorrectly included timedelta in some rolling functions (#19827)
  • Don't panic if column not found (#19824)
  • Fix gather of Scalar null + idx w/ validity (#19823)
  • Replace _kwargs in collect method (#19618)
  • Fix object chunked gather (#19811)
  • Fix filter scalar nulls (#19786)
  • Replace spaces with &nbsp; to support showing multiple spaces in HTML repr (#19783)
  • Altair tooltip was being incorrectly applied to plots which did not accept it (#19789)
  • Respect schema_overrides in batched csv reader (#19755)
  • Fix scanning google cloud with service account credentials file (#19782)
  • Release the GIL in Python APIs, part 2 of 2 (#19762)
  • Fix incorrect filter after right-join on LazyFrame (#19775)
  • Fix incorrect lazy schema for explode on array columns (#19776)
  • Fixed typo in file lazy.py (#19769)

📖 Documentation

  • Update bokeh to use cdn to avoid Bokeh Error (#19788)
  • Change dprint config (#19747)
  • Mention row_by_keys in the to_dict documentation (#19767)
  • Fix link to Graphviz download (#19791)

🛠️ Other improvements

  • Add ToField context for common args (#19833)
  • Use polars parquet reader for delta scan (#19103)
  • Migrate polars-expr AggregationContext to use Column (#19736)

Thank you to all our contributors for making this release possible!
@MarcoGorelli, @TNieuwdorp, @YichiZhang0613, @alexander-beedie, @braaannigan, @coastalwhite, @engylemure, @gab23r, @iliya-malecki, @ion-elgreco, @itamarst, @jackxxu, @nameexhaustion, @orlp, @ritchie46, @rodrigogiraoserrao and @sn0rkmaiden