5.0.0
5.0.0 (2023-03-15)
⚠ BREAKING CHANGES
- api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
- backend: Backends now raise
ibis.common.exceptions.UnsupportedOperationError
in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends. - ux:
Table.info
now returns an expression - ux: Passing a sequence of column names to
Table.drop
is removed. Replacedrop(cols)
withdrop(*cols)
. - The
spark
plugin alias is removed. Usepyspark
instead - ir: removed
ibis.expr.scope
andibis.expr.timecontext
modules, access them underibis.backends.base.df.<module>
- some methods have been removed from the top-level
ibis.<backend>
namespaces, access them on a connected backend instance instead. - common: removed
ibis.common.geospatial
, import the functions fromibis.backends.base.sql.registry.geospatial
- datatypes:
JSON
is no longer a subtype ofString
- datatype:
Category
,CategoryValue
/Column
/Scalar
are removed. Use string types instead. - ux: The
metric_name
argument tovalue_counts
is removed. UseTable.relabel
to change the metric column's name. - deps: the minimum version of
parsy
is now 2.0 - ir/backends: removed the following symbols:
ibis.backends.duckdb.parse_type()
functionibis.backends.impala.Backend.set_database()
methodibis.backends.pyspark.Backend.set_database()
methodibis.backends.impala.ImpalaConnection.ping()
methodibis.expr.operations.DatabaseTable.change_name()
methodibis.expr.operations.ParseURL
classibis.expr.operations.Value.to_projection()
methodibis.expr.types.Table.get_column()
methodibis.expr.types.Table.get_columns()
methodibis.expr.types.StringValue.parse_url()
method
- schema:
Schema.from_dict()
,.delete()
and.append()
methods are removed - datatype:
struct_type.pairs
is removed, usestruct_type.fields
instead - datatype:
Struct(names, types)
is not supported anymore, pass a dictionary toStruct
constructor instead
Features
- add
max_columns
option for table repr (a3aa236) - add examples API (b62356e)
- api: add
map
/array
accessors for easy conversion of JSON to stronger-typed values (d1e9d11) - api: add array to string join operation (74de349)
- api: add builtin support for relabeling columns to snake case (1157273)
- api: add support for passing a mapping to
ibis.map
(d365fd4) - api: allow single argument set operations (bb0a6f0)
- api: implement
to_pandas()
API for ecosystem compatibility (cad316c) - api: implement isin (ac31db2)
- api: make
cache
evaluate only once per session per expression (5a8ffe9) - api: make create_table uniform (833c698)
- api: more selectors (5844304)
- api: upcast pandas DataFrames to memtables in
rlz.table
rule (8dcfb8d) - backends: implement
ops.Time
for sqlalchemy backends (713cd33) - bigquery: add
BIGNUMERIC
type support (5c98ea4) - bigquery: add UUID literal support (ac47c62)
- bigquery: enable subqueries in select statements (ef4dc86)
- bigquery: implement create and drop table method (5f3c22c)
- bigquery: implement create_view and drop_view method (a586473)
- bigquery: support creating tables from in-memory tables (c3a25f1)
- bigquery: support in-memory tables (37e3279)
- change Rich repr of dtypes from blue to dim (008311f)
- clickhouse: implement
ArrayFilter
translation (f2144b6) - clickhouse: implement
ops.ArrayMap
(45000e7) - clickhouse: implement
ops.MapLength
(fc82eaa) - clickhouse: implement ops.Capitalize (914c64c)
- clickhouse: implement ops.ExtractMillisecond (ee74e3a)
- clickhouse: implement ops.RandomScalar (104aeed)
- clickhouse: implement ops.StringAscii (a507d17)
- clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
- clickhouse: improve error message for invalid types in literal (e4d7799)
- clickhouse: support asof_join (7ed5143)
- common: add abstract mapping collection with support for set operations (7d4aa0f)
- common: add support for variadic positional and variadic keyword annotations (baea1fa)
- common: hold typehint in the annotation objects (b3601c6)
- common: support
Callable
arguments and return types inValidator.from_annotable()
(ae57c36) - common: support positional only and keyword only arguments in annotations (340dca1)
- dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
- datafusion: implement ops.Degress, ops.Radians (7e61391)
- datafusion: implement ops.Exp (7cb3ade)
- datafusion: implement ops.Pi, ops.E (5a74cb4)
- datafusion: implement ops.RandomScalar (5d1cd0f)
- datafusion: implement ops.StartsWith (8099014)
- datafusion: implement ops.StringAscii (b1d7672)
- datafusion: implement ops.StrRight (016a082)
- datafusion: implement ops.Translate (2fe3fc4)
- datafusion: support substr without end (a19fd87)
- datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
- datatype: enable inference of
Decimal
type (8761732) - datatype: implement
Mapping
abstract base class forStructType
(5df2022) - deps: add Python 3.11 support and tests (6f3f759)
- druid: add Apache Druid backend (c4cc2a6)
- druid: implement bitwise operations (3ac7447)
- druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
- druid: implement ops.Sign (35f52cc)
- druid: implement ops.StringJoin (42cd9a3)
- duckdb: add support for reading tables from sqlite databases (9ba2211)
- duckdb: add UUID type support (5cd6d76)
- duckdb: implement
ArrayFilter
translation (5f35d5c) - duckdb: implement
ops.ArrayMap
(063602d) - duckdb: implement create_view and drop_view method (4f73953)
- duckdb: implement ops.Capitalize (b17116e)
- duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
- duckdb: implement uuid result type (3150333)
- duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
- duckdb: use
read_json_auto
when reading json (4193867) - examples: add imdb dataset examples (3d63203)
- examples: add movielens small dataset (5f7c15c)
- examples: add wowah_data data to examples (bf9a7cc)
- examples: enable progressbar and faster hashing (4adfe29)
- impala: implement ops.Clip (279fd78)
- impala: implement ops.Radians, ops.Degress (a794ace)
- impala: implement ops.RandomScalar (874f2ff)
- io: add to_parquet, to_csv to backends (fecca42)
- ir: add
ArrayFilter
operation (e719d60) - ir: add
ArrayMap
operation (49e5f7a) - mysql: support in-memory tables (4dfabbd)
- pandas/dask: implement bitwise operations (4994add)
- pandas/dask: implement ops.Pi, ops.E (091be3c)
- pandas: add basic unnest support (dd36b9d)
- pandas: implement ops.StartsWith, ops.EndsWith (2725423)
- pandas: support more pandas extension dtypes (54818ef)
- polars: implement
ops.Union
(17c6011) - polars: implement ops.Pi, ops.E (6d8fc4a)
- postgres: allow connecting with an explicit
schema
(39c9ea8) - postgres: fix interval literal (c0fa933)
- postgres: implement
argmin
/argmax
(82668ec) - postgres: parse tsvector columns as strings (fac8c47), closes #5402
- pyspark: add support for
ops.ArgMin
andops.ArgMax
(a3fa57c) - pyspark: implement ops.Between (ed83465)
- return Table from create_table(), create_view() (e4ea597)
- schema: implement
Mapping
abstract base class forSchema
(167d85a) - selectors: support ranges (e10caf4)
- snowflake: add support for alias in snowflake (b1b947a)
- snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
- snowflake: add UUID literal support (436c781)
- snowflake: implement argmin/argmax (8b998a5)
- snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
- snowflake: implement ops.GroupConcat (2219866)
- snowflake: implement remaining map functions (c48c9a6)
- snowflake: support binary variance reduction with filters (eeabdee)
- snowflake: support cross-database table access (79cb445)
- sqlalchemy: generalize unnest to work on backends that don't support it (5943ce7)
- sqlite: add sqlite type support (addd6a9)
- sqlite: support in-memory tables (1b24848)
- sql: support for creating temporary tables in sql based backends (466cf35)
- tables: cast table using schema (96ce109)
- tables: implement
pivot_longer
API (11c5736) - trino: enable
MapLength
operation (a7ad1db) - trino: implement
ArrayFilter
translation (50f6fcc) - trino: implement
ops.ArrayMap
(657bf61) - trino: implement
ops.Between
(d70b9c0) - trino: support sqlalchemy 2 (0d078c1)
- ux: accept selectors in
Table.drop
(325140f) - ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
- ux: easy interactive setup (6850146)
- ux: expose
between
,rows
andrange
keyword arguments invalue.over()
(5763063)
Bug Fixes
- analysis: extract
Limit
subqueries (62f6e14) - api: add a
name
attribute to backend proxy modules (d6d8e7e) - api: fix broken
__radd__
array concat operation (121d9a0) - api: only include valid python identifiers in struct tab completion (8f33775)
- api: only include valid python identifiers in table tab completion (031a48c)
- backend: provide useful error if default backend is unavailable (1dbc682)
- backends: fix capitalize implementations across all backends (d4f0275)
- backends: fix null literal handling (7f46342)
- bigquery: ensure that memtables are translated correctly (d6e56c5)
- bigquery: fix decimal literals (4a04c9b)
- bigquery: regenerate negative string index sql snapshots (3f02c73)
- bigquery: regenerate sql for predicate pushdown fix (509806f)
- cache: remove bogus schema argument and validate database argument type (c4254f6)
- ci: fix invalid test id (f70de1d)
- clickhouse: fix decimal literal (4dcd2cb)
- clickhouse: fix set ops with table operands (86bcf32)
- clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
- clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
- clickhouse: use a bool type supported by
clickhouse_driver
(ab8f064) - clickhouse: workaround sqlglot's insistence on uppercasing (6151f37)
- compiler: generate aliases in a less clever way (04a4aa5)
- datafusion: support sum aggregation on bool column (9421400)
- deps: bump duckdb to 0.7.0 (38d2276)
- deps: bump snowflake-connector-python upper bound (b368b04)
- deps: ensure that pyspark depends on sqlalchemy (60c7382)
- deps: update dependency pyarrow to v11 (2af5d8d)
- deps: update dependency sqlglot to v11 (e581e2f)
- don't expose backend methods on
ibis.<backend>
directly (5a16431) - druid: remove invalid operations (19f214c)
- duckdb: add
null
to duckdb datatype parser (07d2a86) - duckdb: ensure that
temp_directory
exists (00ba6cb) - duckdb: explicitly set timezone to UTC on connection (6ae4a06)
- duckdb: fix blob type in literal (f66e8a1)
- duckdb: fix memtable
to_pyarrow
/to_pyarrow_batches
(0e8b066) - duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
- duckdb: quote identifiers if necessary in
struct_pack
(6e598cc) - duckdb: support casting to unsigned integer types (066c158)
- duckdb: treat
g
re_replace
flag as literal text (aa3c31c) - duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
- duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
- imports: remove top-level import of sqlalchemy from base backend (b13cf25)
- io: add
read_parquet
andread_csv
to base backend mixin (ce80d36), closes #5420 - ir: incorrect predicate pushdown (9a9204f)
- ir: make
find_subqueries
return in topological order (3587910) - ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
- ir: reorder the right schema of set operations to align with the left schema (58e60ae)
- ir: use
rlz.map_to()
rule instead ofisin
to normalize temporal units (a1c46a2) - ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
- mssql: set sqlglot to tsql (1044573)
- mysql: remove invalid operations (8f34a2b)
- pandas/dask: handle non numpy scalar results in
wrap_case_result
(a3b82f7) - pandas: don't try to dispatch on arrow dtype if not available (d22ae7b)
- pandas: handle casting to arrays with None elements (382b90f)
- pandas: handle NAs in array conversion (06bd15d)
- polars: back compat for
concat_str
separator argument (ced5a61) - polars: back compat for the
reverse
/descending
argument (f067d81) - polars: polars execute respect limit kwargs (d962faf)
- polars: properly infer polars categorical dtype (5a4707a)
- polars: use metric name in aggregate output to dedupe columns (234d8c1)
- pyspark: fix incorrect
ops.EndsWith
translation rule (4c0a5a2) - pyspark: fix isnan and isinf to work on bool (8dc623a)
- snowflake: allow loose casting of objects and arrays (1cf8df0)
- snowflake: ensure that memtables are translated correctly (b361e07)
- snowflake: ensure that null comparisons are correct (9b83699)
- snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
- snowflake: ensure that we do not try to use a None schema or database (03e0265)
- snowflake: handle the case where pyarrow isn't installed (b624fa3)
- snowflake: make
array_agg
preserve nulls (24b95bf) - snowflake: quote column names on construction of
sa.Column
(af4db5c) - snowflake: remove broken pyarrow fetch support (c440adb)
- snowflake: return
NULL
when trying to call map functions on non-object JSON (d85fb28) - snowflake: use
_flatten
to avoid overriding unrelated function in other backends (8c31594) - sqlalchemy: ensure that isin contains full column expression (9018eb6)
- sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
- sqlalchemy: make
strip
family of functions behave like Python (dd0a04c) - sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
- sqlalchemy: use
sa.true
instead of Python literal (8423eba) - sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
- sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
- sqlite: explicite disallow array in literal (de73b37)
- sqlite: fix random scalar range (26d0dde)
- support negative string indices (f84a54d)
- trino: workaround broken dialect (b502faf)
- types: fix argument types of Table.order_by() (6ed3a97)
- util: make convert_unit work with python types (cb3a90c)
- ux: give the
value_counts
aggregate column a better name (abab1d7) - ux: make string range selectors inclusive (7071669)
- ux: make top level set operations work (f5976b2)
Performance
-
duckdb: faster
to_parquet
/to_csv
implementations (6071bb5) -
fix duckdb insert-from-dataframe performance (cd27b99)
-
deps: bump minimum required version of parsy (22020cb)
-
remove spark alias to pyspark and associated cruft (4b286bd)
Refactors
- analysis: slightly simplify
find_subqueries()
(ab3712f) - backend: normalize exceptions (065b66d)
- clickhouse: clean up parsing rules (6731772)
- common: move
frozendict
andDotDict
toibis.common.collections
(4451375) - common: move the
geospatial
module to the base SQL backend (3e7bfa3) - dask: remove unneeded create_table() (86885a6)
- datatype: clean up parsing rules (c15fb5f)
- datatype: remove
Category
type and related APIs (bb0ee78) - datatype: remove
StructType.pairs
property in favor of identicalfields
attribute (6668122) - datatypes: move sqlalchemy datatypes to specfic backend (d7b49eb)
- datatypes: remove
String
parent type fromJSON
type (34f3898) - datatype: use a dictionary to store
StructType
fields rather thannames
andtypes
tuples (84455ac) - datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
- drop
limit
kwarg fromto_parquet
/to_csv
(a54460c) - duckdb: clean up parsing rules (30da8f9)
- duckdb: handle parsing timestamp scale (16c1443)
- duckdb: remove unused
list<...>
parsing rule (f040b86) - duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
- ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
- ir/backends: remove various deprecated functions and methods (a8d3007)
- ir: reorganize the
scope
andtimecontext
utilities (80bd494) - ir: update
ArrayMap
to use the newcallable_with
validation rule (560474e) - move pretty repr tests back to their own file (4a75988)
- nix: clean up marker argument construction (12eb916)
- postgres: clean up datatype parsing (1f61661)
- postgres: clean up literal arrays (21b122d)
- pyspark: remove another private function (c5081cf)
- remove unnecessary top-level rich console (8083a6b)
- rules: remove unused
non_negative_integer
andpair
rules (e00920a) - schema: remove deprecated
Schema.from_dict()
,.delete()
and.append()
methods (8912b24) - snowflake: remove the need for
parsy
(c53403a) - sqlalchemy: set session parameters once per connection (ed4b476)
- sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
- test_sqlalchemy.py: move to snapshot testing (96998f0)
- tests: reorganize
rules
test file to theibis.expr
subpackage (47f0909) - tests: reorganize
schema
test file to theibis.expr
subpackage (40033e1) - tests: reorganize datatype test files to the datatypes subpackage (16199c6)
- trino: clean up datatype parsing (84c0e35)
- ux: return expression from
Table.info
(71cc0e0)
Deprecations
Documentation
- add a bunch of string expression examples (18d3112)
- add Apache Druid to backend matrix (764d9c3)
- add CNAME file to mkdocs source (6d19111)
- add druid to the backends index docs page (ad0b6a3)
- add missing DataFusion entry to the backends in the README (8ce025a)
- add redirects for common old pages (c9087f2)
- api: document deferred API and its pitfalls (8493604)
- api: improve
collect
method API documentation (b4fcef1) - array expression examples (6812c17)
- backends: document default backend configuration (6d917d3)
- backends: link to configuration from the backends list (144044d)
- blob: blog on ibis + substrait + duckdb (5dc7a0a)
- blog: adds examples sneak peek blog + assets folder (fcbb3d5)
- blog: adds to file sneak peek blog (128194f)
- blog: specify parsy 2.0 in substrait blog article (c264477)
- bump query engine count in README and use project-preferred names (11169f7)
- don't sort backends by coverage percentage by default (68f73b1)
- drop docs versioning (d7140e7)
- duckdb: fix broken docstring examples (51084ad)
- enable light/dark mode toggle in docs (b9e812a)
- fill out table API with working examples (16fc8be)
- fix notebook logging example (04b75ef)
- how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
- improve Expr.substitute() docstring (b954edd)
- improve/update pandas walkthrough (80b05d8)
- io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
- io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
- move tutorial from docs to external ibis-examples repo (11b0237)
- parquet: add docstring examples for to_parquet incl. partitioning (8040164)
- point to
ibis-examples
repo in the README (1205636) - README.md: clean up readme, fix typos, alter the example (383a3d3)
- remove duplicate "or" (b6ef3cc)
- remove duplicate spark backend in install docs (5954618)
- render
__dunder__
method API documentation (b532c63) - rerender ci-analysis notebook with new table header colors (50507b6)
- streamlit: fix url for support matrix (594199b)
- tutorial: remove impala from sql tutorial (7627c13)
- use teal for primary & accent colors (24be961)