Releases: pola-rs/polars
Python Polars 0.16.1
🏆 Highlights
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
- automagically upconvert
with_columns
kwarg expressions with multiple output names to struct; extend**named_kwargs
support toselect
(#6497)
⚠️ Breaking changes
- error on string <-> date cmp (#6498)
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
- show where error messages originated (#6482)
- Remove deprecated paths from
Series.__getitem__
(#6048) - change behaviour of named rows (#6302)
- Remove deprecated
read/write_json
arguments (#5990) - make
schema
,schema_overrides
, andorient
consistent on all user-facing interfaces (#6387) - Groupby iteration now returns tuples of (name, data) (#6350)
- Remove
Groupby.pivot
(#6016) - Remove deprecated argument aliases (#5993)
- Change
Series.shuffle
default behaviour (#5991) - Change
Expr.is_between
default behaviour (#5985) - Restrict certain function parameters to be keyword-only (#6464)
✨ Enhancements
- let cast_time_zone accept None (#6539)
- automagically upconvert
with_columns
kwarg expressions with multiple output names to struct; extend**named_kwargs
support toselect
(#6497) - add some missing type annotation in
series
dispatch methods (#6523) - better errors in get_ptr and a probability on a boolean… (#6522)
- add utc parameter to strptime (#6496)
- add meta 'has_multiple_outputs', 'is_regex_projec… (#6500)
- error on string <-> date cmp (#6498)
- ~30% faster
iter_rows(named=True)
andto_dicts()
, if pyarrow available (#6493) - show where error messages originated (#6482)
- Remove deprecated paths from
Series.__getitem__
(#6048) - change behaviour of named rows (#6302)
- Remove deprecated
read/write_json
arguments (#5990) - Groupby iteration now returns tuples of (name, data) (#6350)
- Remove
Groupby.pivot
(#6016) - Remove deprecated argument aliases (#5993)
- Change
Series.shuffle
default behaviour (#5991) - Change
Expr.is_between
default behaviour (#5985) - Restrict certain function parameters to be keyword-only (#6464)
🐞 Bug fixes
- implement ser/de for BinaryChunked (#6543)
- on frame-init from generator, initial
chunk_size
cannot be smaller thaninfer_schema_length
(#6541) - raise if tz_localize called on UTC-aware (#6526)
- make concat_list group aware (#6527)
- error on invalid expanding expression (#6521)
- create from dicts directly as struct categorical (#6520)
- fix oob in arr.get by expressions (#6519)
- fix cse schema (#6518)
- panic when max_len -1 is reached (#6494)
- Formalize list aggregation difference between groupbys, selection and window functions (#6487)
- fix(rust, python) validate tz in with_time_zone (#6417)
🛠️ Other improvements
- Remove
verify_series_and_expr_api
util (#6524) - Disable some tests for Windows (#6532)
- Remove unnecessary brackets in doc examples (#6332)
- Enable some tests for Windows (#6511)
- Fix test issue with tmp directory (#6508)
- Fix some deprecation warnings (#6495)
- added all missing examples for temporal expressions (#6488)
- Utilize pytest-xdist for faster unittests (#6483)
- test(python) I/O test improvements (#6475)
- make
schema
,schema_overrides
, andorient
consistent on all user-facing interfaces (#6387) - improved error message from Expr on incorrect usage in boolean context (#6473)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @gab23r, @papparapa, @ritchie46, @romanovacca, @stinodego and @zundertj
Python Polars 0.15.18
✨ Enhancements
- More precise pipe type annotation (#6457)
🐞 Bug fixes
🛠️ Other improvements
- Specify deltalake minimum version (#6363)
- deprecate
iterrows
in favour ofiter_rows
, add new@redirect
class decorator (#6461) - Improve IO test structure (#6453)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @josh, @ritchie46 and @stinodego
Python Polars 0.15.17
✨ Enhancements
- allow expr in str.contains (#6443)
- Deprecate
with_column
(#6128) - expose efficient iterator over
DataFrame
slices (#6414) - add float formatting option (#6432)
- 10% speedup for
to_dicts
method (#6415) - add datetime/duration dtype selector groups covering the different timeunits (#6425)
- allow internal api to get pointer to values buffer (#6385)
- infer ISO8601 datetimes (#6357)
- minor improvement to auto-detection of ambiguous data orientation (#6376)
- allow expressions as arguments in
str.ends_with
(#6361) - Make groupby rolling/dynamic iterable (#6372)
- accept expr in
str.starts_with
(#6355) - Move
explode
to namespaces (#6351) - Rename
Series.struct.to_frame
to.struct.unnest
(#6352) - auto-detect %+ as tz-aware (#6434)
🐞 Bug fixes
- fix projection pushdown on double semi join (#6440)
- ensure column-exclusion works with the new
dtype
groups, and improve some related typing (#6442) - ensure
from_dicts
andDataFrame
init from list of dicts behave consistently, update/improve related docstrings (#6431) - cumulative_eval ensure output dtype is respected (#6435)
- allow from pandas null structs (#6430)
- fixed interaction of
schema_overrides
with frame-init from list of dicts (#6424) - only use float simd on specific alignment (#6427)
- no early escape when window is equal to len in rolling_float (#6408)
is_between
typing with time in start and end (#6393)- dont incorrectly infer Zulu time (#6378)
- raise error on invalid sort_by argument (#6382)
- take offset into account with str.explode (#6384)
- Return empty batch for pl.read_csv_batched().next_… (#6381)
- ensure pyarrow.compute module is loaded (#6353)
- implement ser/de for StructChunked (#6359)
- series of empty structs (#6347)
🛠️ Other improvements
- add explicit note about use of
Config
as a context manager (#6439) - ensure
from_dicts
andDataFrame
init from list of dicts behave consistently, update/improve related docstrings (#6431) - Fix docstring of series.interpolate (#6399)
- Remove duplicate test (#6390)
- deprecate
columns
param forDataFrame
init; transitioning toschema
(#6366) - Add docs and tests to
Expr.flatten
(#6370) - Example of filtering partitioned delta tables (#6365)
- Uppercase project URL refs (#6362)
Thank you to all our contributors for making this release possible!
@ChayimFriedman2, @MarcoGorelli, @alexander-beedie, @c-peters, @flowlight0, @gab23r, @gam-phon, @ghuls, @jgmartin, @josh, @ritchie46, @romanovacca, @stinodego, @universalmind303 and @zundertj
Python Polars 0.15.16
🚀 Performance improvements
- Improve rechunk check (#6268)
- reuse allocated scratches in ipc writer (#6287)
- use dedicated writer thread for sink_parquet (#6285)
✨ Enhancements
- add strict parameter to decoding expressions (#6342)
- allow unordered struct creating from anyvalues (#6321)
- allow pass_name in aggregation apply (#6318)
- parse abbrev month name (#6314)
- Add warning for new behaviour of named rows (#6300)
- add
dt.combine
for combining date and time components (#6121) - improvements to dtype-based column selection (#6295)
- add sink_ipc (#6286)
- additional
schema_overrides
param for more ergonomicDataFrame
init (#6230)
🐞 Bug fixes
- don't cast nulls before trying normal cast (#6339)
- properly dispatch categorical string comparison (#6336)
- expand all nested wildcards in functions (#6334)
- fix groupby rolling by_key if groups are empty (#6333)
- Fix some type hints and bugs for groupby (#6329)
- Reject
None
input forhead
/tail
(#6326) - parse abbrev month name (#6314)
- default to pyarrow for writing parquet (#6313)
- disallow alias in inline join expressions (#6312)
- block proj-pd and pred-pd on swapping rename (#6303)
- convert nested dictionary with i64 keys (#6299)
- fix(python) Print instantiated dtypes in glimpse (#6298)
- infer y-m-d datetime even if single element (#6297)
- fix panic dynamic_groupby on empty dataframe (#6294)
- implement missing DataFrame
__floordiv__
op (#6280) - Allow low and high in date_range to be str (#6275)
- allow integer-compatible row indexes that are not strictly typed as
int
(#6266) - Parse negative dates with polars parser (#6256)
🛠️ Other improvements
- run cse optimization only if joins and caches… (#6337)
- Fix wrong description for variable_name argument in melt (#6331)
- Fix random groupby test failure (#6327)
- fixup test names, adjust test_struct (#6317)
- simplify
_from_pandas
constructor (#6310) - Ignore hash doctests (#6304)
- Fix docstring formatting for truncate (#6291)
- Move package metadata to
pyproject.toml
(#6271) - Move
io
tests to the same folder (#6277) - Enable Dependabot (#5036)
Thank you to all our contributors for making this release possible!
@MarcoGorelli, @alexander-beedie, @c-peters, @dependabot, @dependabot[bot], @ghuls, @n8henrie, @ritchie46, @stinodego and @universalmind303
Python Polars 0.15.15
✨ Enhancements
- ensure ooc sort works ooc with all-constant values (#6235)
- The 1 billion row sort (#6156)
- optionally treat missing UTF8 values as the empty string at CSV parse-time (#6203)
- check file target is not an existing directory (#6187)
- support -ve indexing for DataFrame
head
andtail
methods (#6173) - Implement
DataFrame.unique(keep="none")
(#6169) - support use of explicit
Struct
dtypes on DataFrame/Series init (#6145)
🐞 Bug fixes
- Add list inner dtype when printing Series (#6233)
- strptime now respects pl.Datetime's time_unit (#6231)
- fix when then otherwise with arity and aggregation… (#6224)
- collect now uses the storage_options given to scan_parquet (#6223)
- set_sorted keep schema (#6222)
- pass name to value counts in aggregation (#6221)
- don't set fast_explode on list of structs (#6220)
- address a frame init/construction error, and expose
infer_schema_length
to frame init (#6210) - explode of empty nullable list (#6190)
- fix oob arr.take (#6189)
- Make
with_columns
inwith_columns_kwargs
mode compatible with more data types (#6126) - Update docstring
with_columns
to reflect a new dataframe is being returned (#6122) - fix empty streaming joins (#6149)
- fix streaming joins where the join order has been … (#6143)
- write tz-aware datetimes to csv (#6135)
- add null behavior for oob indices (#6133)
🛠️ Other improvements
- Create
DataFrame
from schema (#6225) - don't set aggregated flag on null propagated aggregation. (#6191)
- undo cargo.toml change (#6219)
- Improve drop_nulls docstrings (#6127)
- Clarify docstrings for
closed
argument (#6198) - minor docs and typing updates (plus additional test coverage for related areas) (#6182)
- explain n_field_strategy (#6158)
Thank you to all our contributors for making this release possible!
@MarceColl, @MarcoGorelli, @alexander-beedie, @gab23r, @ghuls, @jvanbuel, @n8henrie, @rben01, @ritchie46, @ropoctl, @sorhawell, @stinodego, @winding-lines and @zundertj
Python Polars 0.15.14
🚀 Performance improvements
- first check rev-map on categorical equality check (#6085)
✨ Enhancements
- add
arr.take
expression (#6116) - allow
extend_constant
to work with date literals (#6114) - allow nested categorical cast (#6113)
- add a
rounded_corners
modifier topl.Config.set_tbl_formatting
(#6108) - huge speedup of scalar-to-array expansion on frame init from dict (#6111)
- extend existing fast range->Series init to lists of ranges in a Series (#6099)
- additional (opt-in) options for assert_frame_equal (#6096)
- add search_sorted for arrays and utf8 dtype (#6083)
🐞 Bug fixes
- ensure multi-line type hints are parenthesised (#6100)
- fix invalid dtype in chunked array after struct cast (#6093)
- don't run cse cache_states if no projections found (#6087)
- Support all datatypes in glimpse and align with head/tail (#6091)
- Update
read_csv
error message (#6082) - propogate nulls in binary arithmetic/aggregation (#6076)
🛠️ Other improvements
- Fix docstring with_context (#6118)
- Use Dataframe.item internally and in tests (#6109)
- Assert deprecation warning on check_column_names (#6110)
- enable
unused import
autofix via ruff (#6102)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @gitkwr, @huitseeker, @ritchie46, @stinodego and @zundertj
Python Polars 0.15.13
✨ Enhancements
- Improve iterating over
GroupBy
(#6051) - much faster lazy type-checks (#6064)
- support array-expansion of scalars on frame init from dict (#6034)
- improve error message when writing nested data to… (#6040)
🐞 Bug fixes
- bound complex type from 3.8 to 3.11 (#6071)
- deal with unnest schema expansion in projection pd (#6063)
- correct output dtype for cummin/cumsum/cummax (#6062)
- block streaming on literal series/range (#6058)
- improve handling of dict-type "columns" param on frame-init (#6045)
- Fix typing for
DataFrame.select
(#6047) - ndjson struct inference (#6049)
- fix stringcache. latest refactor introduced a hashing error (#6056)
- allow mixed field order and availability in apply that r… (#6041)
- deal with empty structs (#6039)
- fix aggregation that filters out all data (#6036)
- fix diff overflow (#6033)
- keep column names in is_null/is_not_null (#6032)
- keep name when sorting categorical in lexial order (#6029)
- tweaked property/accessor behaviour (#6021)
- properly set null anyvalue if categorical is neste… (#6025)
- Fix
from_epoch
function signature (#6024) - Validate
estimated_size
parameter (#6018)
🛠️ Other improvements
- suggest forward fill in cumsum/cummax (#6061)
- Fix SIM105 issues. (#6042)
- Remove trailing spaces in glimpse output (#6037)
- Remove unnecessary noqa's (#6035)
- Fix flake8-pytest-style errors in tests. (#6031)
- update
read_sql
androw
docstrings (#6028) - Enable the
isort
-style import autofix viaruff
(#6020) - Update py-polars/Cargo.lock (#6013)
- Refactor pivot tests (#6012)
- Use ruff instead of isort, flake8 and pyupgrade (#5916)
- Properly deprecate
groupby.pivot
(#6000)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ghuls, @ritchie46, @stinodego and @universalmind303
Python Polars 0.15.11
🚀 Performance improvements
- ensure set_at_idx is O(1) (#5977)
✨ Enhancements
- allow eq,ne,lt etc (#5995)
- Improve
Expr.is_between
API (#5981) - large speedup for
df.iterrows
(~200-400%) (#5979) - updated default table format from "UTF8_FULL" to "UTF8_FULL_CONDENSED" (#5967)
- Access rows as namedtuples (#5966)
- Improve
assert_frame_equal
messages (#5962)
🐞 Bug fixes
- make weekday tz-aware (#5989)
- fix categorical in struct anyvalue issue (#5987)
- fix invalid boolean simplification (#5976)
- allow empty sort on any dtype (#5975)
- properly deal with categoricals in streaming queries (#5974)
Thank you to all our contributors for making this release possible!
@alexander-beedie, @ritchie46 and @stinodego
Python Polars 0.15.9
🚀 Performance improvements
- improve reducing window function performance ~33% (#5878)
✨ Enhancements
str.strip
with multiple chars (#5929)- add iterrrows (#5945)
- read decimal as f64 (#5938)
- improve query plan scan formatting (#5937)
- allow all null cast (#5933)
- allow objects in struct types (#5925)
- handle Series init from python sequence of numpy arrays (#5918)
- merge sorted dataframes (#5817)
- impl hex and base64 for binary (#5892)
- Add datatype hierarchy (#5901)
- Add .item() on DataFrame and Series (#5893)
- make get_any_value fallible (#5877)
- Add string representation for data types (#5861)
- directly push all operator result into sink, prev… (#5856)
🐞 Bug fixes
- don't panic on ignored context (#5958)
- don't allow named expression in arr.eval (#5957)
- error on invalid dtype (#5956)
- fix panic in join expressions (#5954)
- block ordered predicates before explode (#5951)
- adhere to schema in arr.eval of empty list (#5947)
- fix from_dict schema_inference=0 (#5948)
- fix arrow nested null conversion (#5946)
- allow None in arr.slice length (#5934)
- fix time to duration cast (#5932)
- error on addition with datetime/time (#5931)
- don't create categoricals in streaming (#5926)
- object filter should keep single chunk (#5913)
- csv, read escaped "" as missing (#5912)
- fix pivot of signed integers (#5909)
- don't allow duplicate columns in read_csv arg (#5908)
- fix latest oob in streaming convertion (#5902)
- adapt k to len in topk (#5888)
- fix lazy swapping rename (#5884)
- fix window function with nullable values; regression due… (#5874)
- improve equality consistency between types (#5873)
- evaluate whole branch expression to determine if r… (#5864)
- fix top_k on empty (#5865)
- fix slice in streaming (#5854)
- Fix type hint for IO
*_options
arguments (#5852)
🛠️ Other improvements
- Fix docs for sink_parquet (#5952)
- Fix misspelling in LazyFrame docstring (#5917)
- add bin, series.is_sorted and merge_sorted (#5914)
Thank you to all our contributors for making this release possible!
@AnatolyBuga, @alexander-beedie, @cannero, @chitralverma, @dannyvankooten, @johngunerli, @ozgrakkurt, @ritchie46, @stinodego, @winding-lines and @zundertj
Rust Polars 0.26.0
⚠️ Breaking changes
🚀 Performance improvements
- improve reducing window function performance ~33% (#5878)
- impove performance reducing window functions with numeric output
~-14%
(#5841) - set_sorted flag when creating from literal (#5728)
- use sorted fast path in streaming groupby (#5727)
- ensure fast_explode propagates (#5676)
- fix quadratic time complexity of groupby in stream… (#5614)
- Aggregate projection pushdown (#5556)
- improve streaming primitve groupby (#5575)
- vectorize integer vec-hash by using very simple, … (#5572)
- specialized utf8 groupby in streaming (#5535)
✨ Enhancements
- make get_any_value fallible (#5877)
- directly push all operator result into sink, prev… (#5856)
- add sink_parquet (#5480)
- Support parsing more float string representations. (#5824)
- implement mean aggregation for duration (#5807)
- implement sensible boolean aggregates (#5806)
- allow expression as quantile input (#5751)
- accept expression in str.extract_all (#5742)
- tz-aware strptime (#5736)
- Add "fmt_no_tty" feature for formatting support without r… (#5725)
- lazy diagonal concat. (#5647)
- to_struct add upper_bound (#5714)
- inversely scale chunk_size with thread count in s… (#5699)
- add streaming minmax (#5693)
- improve dynamic inference of anyvalues and structs (#5690)
- support is_in for boolean dtype (#5682)
- add a cache to strptime (#5628)
- add nearest interpolation strategy (#5626)
- make cast recursive (#5596)
- add arg_min/arg_max for series of dtype boolean (#5592)
- prefer streaming groupby if partitionable (#5580)
- make map_alias fallible (#5532)
- pl.min & pl.max accept wildcard similar to pl.sum (#5511)
- add predicate pushdown to anonymous_scan (#5467)
- make streaming work with multiple sinks in a sing… (#5474)
- add streaming slice operation (#5466)
- run partial streaming queries (#5464)
- streaming left joins (#5456)
- file statistics so we only (try to) keep smallest table in memory (#5454)
- streaming inner joins. (#5400)
- build_info() provides detailed information how polars was built (#5423)
- add missing
width
property toLazyFrame
(#5431) - allow regex and wildcard in groupby (#5425)
- Streaming joins architecture and Cross join implementation. (#5339)
- add support for am/pm notation in parse_dates read_csv (#5373)
- add reduce/cumreduce expression as an easier fold (#5364)
🐞 Bug fixes
- fix lazy swapping rename (#5884)
- improve equality consistency between types (#5873)
- evaluate whole branch expression to determine if r… (#5864)
- fix top_k on empty (#5865)
- fix slice in streaming (#5854)
- correct invalid type in struct anyvalue access (#5844)
- don't set fast_explode if null values in list (#5838)
- duration formatting (#5837)
- respect fetch in union (#5836)
- keep f32 dtype in fill_null by int (#5834)
- err on epoch on time dtype (#5831)
- fix panic in hmean (#5808)
- asof join by logical groups (#5805)
- fix parquet regression upstream in arrow2 (#5797)
- Fix lazy cumsum and cumprod result types (#5792)
- fix nested writer (#5777)
- fix(rust, python) Summation on empty series evaluates to
Some(0)
(#5773) - empty concat utf8 (#5768)
- projection pushdown with union and asof join (#5763)
- check null values in asof_join + groupby (#5756)
- fix generic streaming groupby on logical types (#5752)
- fix date_range on expressions (#5750)
- fix dtypes in join_asof_by (#5746)
- fix group order in binary aggregation (#5744)
- implement min/max aggregation for utf8 in groupby (#5737)
- fix all_null/sorted into_groups panic (#5733)
- asof join 'by', 'forward' combination (#5720)
- fix pivot on floating point indexes (#5704)
- fix arange with column/literal input (#5703)
- fix double projection that leads to uneven union d… (#5700)
- Fix a bug in floating regex handling used in CSV type inference (#5695)
- fix asof join schema (#5686)
- fix owned arithmetic schema (#5685)
- take glob into account in scan_csv 'with_schema_mo… (#5683)
- fix boolean schema in agg_max/min (#5678)
- fix boolean arg-max if all equal (#5680)
- early error on duplicate names in streaming groupby (#5638)
- fix streaming groupby aggregate types (#5636)
- convert panic to err in concat_list (#5637)
- fix dot diagram of single nodes (#5624)
- fix dynamic struct inference (#5619)
- keep dtype when eval on empty list (#5597)
- fix ternary with list output on empty frame (#5595)
- fix tz-awareness of truncate (#5591)
- check chunks before doing chunked_id join optimiza… (#5589)
- invert cast_time_zone conversion (#5587)
- asof join ensure join column is not dropped when '… (#5585)
- fix ub due to invalid dtype on splitting dfs (#5579)
- fix(rust, python); fix projection pushdown in asof joins (#5542)
- streaming hstack allow duplicates (#5538)
- fix streaming empty join panic (#5534)
- fix duplicate caches in cse and prevent quadratic … (#5528)
- allow appending categoricals that are all null (#5526)
- tz-aware strftime (#5525)
- make 'truncate' tz-aware (#5522)
- fix coalesce expreession expansion (#5521)
- fix nested aggregatin in when then and window expr… (#5520)
- fix sort_by expression if groups already aggregated (#5518)
- fix bug in batched parquet reader that dropped dfs… (#5506)
- fix bugs in skew and kurtosis (#5484)
- compute correct offset for streaming join on multi… (#5479)
- return error on invalid sortby expression (#5478)
- add missing
AnyValueBuffer
specialisation forDuration
dtype (#5436) - fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
- properly handle json with unclosed strings (#5427)
- fix null poisoning in rank operation (#5417)
- correct expr::diff dtype for temporal columns (#5416)
- fix cse for nested caches (#5412)
- don't set sorted flag in argsort (#5410)
- explicit nan comparison in min/max agg (#5403)
- Correct CSV row indexing (#5385)
🛠️ Other improvements
- Update rustc and fix clippy (#5880)
- update arrow (#5862)
- move join dispatch to polars-ops (#5809)
- Remove dbg statement from union (#5791)
- Continue removing compilation warnings (#5778)
- shrink anyvalue size (#5770)
- update arrow (#5766)
- chore(rust,python) Change allow_streaming to streaming (#5747)
- remove rev-map from ChunkedArray (#5721)
- simplify fast projection by schema (#5716)
- Reindent df! docs code (#5698)
- remove Series::append_array (#5681)
- Remove unused symbols and uneeded
mut
qualifier (#5672) - Include license files in Rust crates (#5675)
- Use
NaiveTime::from_hms_opt
instead ofNaiveTime::from_hms
(#5664) - use xxhash3 for string types (#5617)
- iso weekday (#5598)
- Improve contributing guide (#5558)
- streaming improvements (#5541)
- Refer to DataFrame::unique instead of
distinct
(#5482) - don't panic if part of query cannot run strea… (#5458)
- make generic join builder more dry (#5439)
- use IdHash for streaming groupby generic (#5435)
- fix freeze/stall when writing more than 2^31 string values to parquet (#5366)
Thank you to all our contributors for making this release possible!
@AnatolyBuga, @CalOmnie, @Kuhlwein, @MarcoGorelli, @OneRaynyDay, @YuRiTan, @alexander-beedie, @andrewpollack, @ankane, @braaannigan, @chitralverma, @dannyvankooten, @ghais, @ghuls, @jjerphan, @matteosantama, @messense, @owrior, @pickfire, @ritchie46, @s1ck, @sa-, @slonik-az, @sorhawell, @stinodego, @universalmind303 and @zundertj