Release 5.0.0 · ibis-project/ibis

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
ux: Table.info now returns an expression
ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
The spark plugin alias is removed. Use pyspark instead
ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
datatypes: JSON is no longer a subtype of String
datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
deps: the minimum version of parsy is now 2.0
ir/backends: removed the following symbols:

ibis.backends.duckdb.parse_type() function
ibis.backends.impala.Backend.set_database() method
ibis.backends.pyspark.Backend.set_database() method
ibis.backends.impala.ImpalaConnection.ping() method
ibis.expr.operations.DatabaseTable.change_name() method
ibis.expr.operations.ParseURL class
ibis.expr.operations.Value.to_projection() method
ibis.expr.types.Table.get_column() method
ibis.expr.types.Table.get_columns() method
ibis.expr.types.StringValue.parse_url() method

schema: Schema.from_dict(), .delete() and .append() methods are removed
datatype: struct_type.pairs is removed, use struct_type.fields instead
datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

add max_columns option for table repr (a3aa236)
add examples API (b62356e)
api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
api: add array to string join operation (74de349)
api: add builtin support for relabeling columns to snake case (1157273)
api: add support for passing a mapping to ibis.map (d365fd4)
api: allow single argument set operations (bb0a6f0)
api: implement to_pandas() API for ecosystem compatibility (cad316c)
api: implement isin (ac31db2)
api: make cache evaluate only once per session per expression (5a8ffe9)
api: make create_table uniform (833c698)
api: more selectors (5844304)
api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
backends: implement ops.Time for sqlalchemy backends (713cd33)
bigquery: add BIGNUMERIC type support (5c98ea4)
bigquery: add UUID literal support (ac47c62)
bigquery: enable subqueries in select statements (ef4dc86)
bigquery: implement create and drop table method (5f3c22c)
bigquery: implement create_view and drop_view method (a586473)
bigquery: support creating tables from in-memory tables (c3a25f1)
bigquery: support in-memory tables (37e3279)
change Rich repr of dtypes from blue to dim (008311f)
clickhouse: implement ArrayFilter translation (f2144b6)
clickhouse: implement ops.ArrayMap (45000e7)
clickhouse: implement ops.MapLength (fc82eaa)
clickhouse: implement ops.Capitalize (914c64c)
clickhouse: implement ops.ExtractMillisecond (ee74e3a)
clickhouse: implement ops.RandomScalar (104aeed)
clickhouse: implement ops.StringAscii (a507d17)
clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
clickhouse: improve error message for invalid types in literal (e4d7799)
clickhouse: support asof_join (7ed5143)
common: add abstract mapping collection with support for set operations (7d4aa0f)
common: add support for variadic positional and variadic keyword annotations (baea1fa)
common: hold typehint in the annotation objects (b3601c6)
common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
common: support positional only and keyword only arguments in annotations (340dca1)
dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
datafusion: implement ops.Degress, ops.Radians (7e61391)
datafusion: implement ops.Exp (7cb3ade)
datafusion: implement ops.Pi, ops.E (5a74cb4)
datafusion: implement ops.RandomScalar (5d1cd0f)
datafusion: implement ops.StartsWith (8099014)
datafusion: implement ops.StringAscii (b1d7672)
datafusion: implement ops.StrRight (016a082)
datafusion: implement ops.Translate (2fe3fc4)
datafusion: support substr without end (a19fd87)
datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
datatype: enable inference of Decimal type (8761732)
datatype: implement Mapping abstract base class for StructType (5df2022)
deps: add Python 3.11 support and tests (6f3f759)
druid: add Apache Druid backend (c4cc2a6)
druid: implement bitwise operations (3ac7447)
druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
druid: implement ops.Sign (35f52cc)
druid: implement ops.StringJoin (42cd9a3)
duckdb: add support for reading tables from sqlite databases (9ba2211)
duckdb: add UUID type support (5cd6d76)
duckdb: implement ArrayFilter translation (5f35d5c)
duckdb: implement ops.ArrayMap (063602d)
duckdb: implement create_view and drop_view method (4f73953)
duckdb: implement ops.Capitalize (b17116e)
duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
duckdb: implement uuid result type (3150333)
duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
duckdb: use read_json_auto when reading json (4193867)
examples: add imdb dataset examples (3d63203)
examples: add movielens small dataset (5f7c15c)
examples: add wowah_data data to examples (bf9a7cc)
examples: enable progressbar and faster hashing (4adfe29)
impala: implement ops.Clip (279fd78)
impala: implement ops.Radians, ops.Degress (a794ace)
impala: implement ops.RandomScalar (874f2ff)
io: add to_parquet, to_csv to backends (fecca42)
ir: add ArrayFilter operation (e719d60)
ir: add ArrayMap operation (49e5f7a)
mysql: support in-memory tables (4dfabbd)
pandas/dask: implement bitwise operations (4994add)
pandas/dask: implement ops.Pi, ops.E (091be3c)
pandas: add basic unnest support (dd36b9d)
pandas: implement ops.StartsWith, ops.EndsWith (2725423)
pandas: support more pandas extension dtypes (54818ef)
polars: implement ops.Union (17c6011)
polars: implement ops.Pi, ops.E (6d8fc4a)
postgres: allow connecting with an explicit schema (39c9ea8)
postgres: fix interval literal (c0fa933)
postgres: implement argmin/argmax (82668ec)
postgres: parse tsvector columns as strings (fac8c47), closes #5402
pyspark: add support for ops.ArgMin and ops.ArgMax (a3fa57c)
pyspark: implement ops.Between (ed83465)
return Table from create_table(), create_view() (e4ea597)
schema: implement Mapping abstract base class for Schema (167d85a)
selectors: support ranges (e10caf4)
snowflake: add support for alias in snowflake (b1b947a)
snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
snowflake: add UUID literal support (436c781)
snowflake: implement argmin/argmax (8b998a5)
snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
snowflake: implement ops.GroupConcat (2219866)
snowflake: implement remaining map functions (c48c9a6)
snowflake: support binary variance reduction with filters (eeabdee)
snowflake: support cross-database table access (79cb445)
sqlalchemy: generalize unnest to work on backends that don't support it (5943ce7)
sqlite: add sqlite type support (addd6a9)
sqlite: support in-memory tables (1b24848)
sql: support for creating temporary tables in sql based backends (466cf35)
tables: cast table using schema (96ce109)
tables: implement pivot_longer API (11c5736)
trino: enable MapLength operation (a7ad1db)
trino: implement ArrayFilter translation (50f6fcc)
trino: implement ops.ArrayMap (657bf61)
trino: implement ops.Between (d70b9c0)
trino: support sqlalchemy 2 (0d078c1)
ux: accept selectors in Table.drop (325140f)
ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
ux: easy interactive setup (6850146)
ux: expose between, rows and range keyword arguments in value.over() (5763063)

Bug Fixes

analysis: extract Limit subqueries (62f6e14)
api: add a name attribute to backend proxy modules (d6d8e7e)
api: fix broken __radd__ array concat operation (121d9a0)
api: only include valid python identifiers in struct tab completion (8f33775)
api: only include valid python identifiers in table tab completion (031a48c)
backend: provide useful error if default backend is unavailable (1dbc682)
backends: fix capitalize implementations across all backends (d4f0275)
backends: fix null literal handling (7f46342)
bigquery: ensure that memtables are translated correctly (d6e56c5)
bigquery: fix decimal literals (4a04c9b)
bigquery: regenerate negative string index sql snapshots (3f02c73)
bigquery: regenerate sql for predicate pushdown fix (509806f)
cache: remove bogus schema argument and validate database argument type (c4254f6)
ci: fix invalid test id (f70de1d)
clickhouse: fix decimal literal (4dcd2cb)
clickhouse: fix set ops with table operands (86bcf32)
clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
clickhouse: use a bool type supported by clickhouse_driver (ab8f064)
clickhouse: workaround sqlglot's insistence on uppercasing (6151f37)
compiler: generate aliases in a less clever way (04a4aa5)
datafusion: support sum aggregation on bool column (9421400)
deps: bump duckdb to 0.7.0 (38d2276)
deps: bump snowflake-connector-python upper bound (b368b04)
deps: ensure that pyspark depends on sqlalchemy (60c7382)
deps: update dependency pyarrow to v11 (2af5d8d)
deps: update dependency sqlglot to v11 (e581e2f)
don't expose backend methods on ibis.<backend> directly (5a16431)
druid: remove invalid operations (19f214c)
duckdb: add null to duckdb datatype parser (07d2a86)
duckdb: ensure that temp_directory exists (00ba6cb)
duckdb: explicitly set timezone to UTC on connection (6ae4a06)
duckdb: fix blob type in literal (f66e8a1)
duckdb: fix memtable to_pyarrow/to_pyarrow_batches (0e8b066)
duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
duckdb: quote identifiers if necessary in struct_pack (6e598cc)
duckdb: support casting to unsigned integer types (066c158)
duckdb: treat g re_replace flag as literal text (aa3c31c)
duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
imports: remove top-level import of sqlalchemy from base backend (b13cf25)
io: add read_parquet and read_csv to base backend mixin (ce80d36), closes #5420
ir: incorrect predicate pushdown (9a9204f)
ir: make find_subqueries return in topological order (3587910)
ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
ir: reorder the right schema of set operations to align with the left schema (58e60ae)
ir: use rlz.map_to() rule instead of isin to normalize temporal units (a1c46a2)
ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
mssql: set sqlglot to tsql (1044573)
mysql: remove invalid operations (8f34a2b)
pandas/dask: handle non numpy scalar results in wrap_case_result (a3b82f7)
pandas: don't try to dispatch on arrow dtype if not available (d22ae7b)
pandas: handle casting to arrays with None elements (382b90f)
pandas: handle NAs in array conversion (06bd15d)
polars: back compat for concat_str separator argument (ced5a61)
polars: back compat for the reverse/descending argument (f067d81)
polars: polars execute respect limit kwargs (d962faf)
polars: properly infer polars categorical dtype (5a4707a)
polars: use metric name in aggregate output to dedupe columns (234d8c1)
pyspark: fix incorrect ops.EndsWith translation rule (4c0a5a2)
pyspark: fix isnan and isinf to work on bool (8dc623a)
snowflake: allow loose casting of objects and arrays (1cf8df0)
snowflake: ensure that memtables are translated correctly (b361e07)
snowflake: ensure that null comparisons are correct (9b83699)
snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
snowflake: ensure that we do not try to use a None schema or database (03e0265)
snowflake: handle the case where pyarrow isn't installed (b624fa3)
snowflake: make array_agg preserve nulls (24b95bf)
snowflake: quote column names on construction of sa.Column (af4db5c)
snowflake: remove broken pyarrow fetch support (c440adb)
snowflake: return NULL when trying to call map functions on non-object JSON (d85fb28)
snowflake: use _flatten to avoid overriding unrelated function in other backends (8c31594)
sqlalchemy: ensure that isin contains full column expression (9018eb6)
sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
sqlalchemy: make strip family of functions behave like Python (dd0a04c)
sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
sqlalchemy: use sa.true instead of Python literal (8423eba)
sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
sqlite: explicite disallow array in literal (de73b37)
sqlite: fix random scalar range (26d0dde)
support negative string indices (f84a54d)
trino: workaround broken dialect (b502faf)
types: fix argument types of Table.order_by() (6ed3a97)
util: make convert_unit work with python types (cb3a90c)
ux: give the value_counts aggregate column a better name (abab1d7)
ux: make string range selectors inclusive (7071669)
ux: make top level set operations work (f5976b2)

Performance

duckdb: faster to_parquet/to_csv implementations (6071bb5)
fix duckdb insert-from-dataframe performance (cd27b99)
deps: bump minimum required version of parsy (22020cb)
remove spark alias to pyspark and associated cruft (4b286bd)

Refactors

analysis: slightly simplify find_subqueries() (ab3712f)
backend: normalize exceptions (065b66d)
clickhouse: clean up parsing rules (6731772)
common: move frozendict and DotDict to ibis.common.collections (4451375)
common: move the geospatial module to the base SQL backend (3e7bfa3)
dask: remove unneeded create_table() (86885a6)
datatype: clean up parsing rules (c15fb5f)
datatype: remove Category type and related APIs (bb0ee78)
datatype: remove StructType.pairs property in favor of identical fields attribute (6668122)
datatypes: move sqlalchemy datatypes to specfic backend (d7b49eb)
datatypes: remove String parent type from JSON type (34f3898)
datatype: use a dictionary to store StructType fields rather than names and types tuples (84455ac)
datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
drop limit kwarg from to_parquet/to_csv (a54460c)
duckdb: clean up parsing rules (30da8f9)
duckdb: handle parsing timestamp scale (16c1443)
duckdb: remove unused list<...> parsing rule (f040b86)
duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
ir/backends: remove various deprecated functions and methods (a8d3007)
ir: reorganize the scope and timecontext utilities (80bd494)
ir: update ArrayMap to use the new callable_with validation rule (560474e)
move pretty repr tests back to their own file (4a75988)
nix: clean up marker argument construction (12eb916)
postgres: clean up datatype parsing (1f61661)
postgres: clean up literal arrays (21b122d)
pyspark: remove another private function (c5081cf)
remove unnecessary top-level rich console (8083a6b)
rules: remove unused non_negative_integer and pair rules (e00920a)
schema: remove deprecated Schema.from_dict(), .delete() and .append() methods (8912b24)
snowflake: remove the need for parsy (c53403a)
sqlalchemy: set session parameters once per connection (ed4b476)
sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
test_sqlalchemy.py: move to snapshot testing (96998f0)
tests: reorganize rules test file to the ibis.expr subpackage (47f0909)
tests: reorganize schema test file to the ibis.expr subpackage (40033e1)
tests: reorganize datatype test files to the datatypes subpackage (16199c6)
trino: clean up datatype parsing (84c0e35)
ux: return expression from Table.info (71cc0e0)

Deprecations

api: deprecate summary API (e449c07)
api: mark ibis.sequence() for removal (3589f80)

Documentation

add a bunch of string expression examples (18d3112)
add Apache Druid to backend matrix (764d9c3)
add CNAME file to mkdocs source (6d19111)
add druid to the backends index docs page (ad0b6a3)
add missing DataFusion entry to the backends in the README (8ce025a)
add redirects for common old pages (c9087f2)
api: document deferred API and its pitfalls (8493604)
api: improve collect method API documentation (b4fcef1)
array expression examples (6812c17)
backends: document default backend configuration (6d917d3)
backends: link to configuration from the backends list (144044d)
blob: blog on ibis + substrait + duckdb (5dc7a0a)
blog: adds examples sneak peek blog + assets folder (fcbb3d5)
blog: adds to file sneak peek blog (128194f)
blog: specify parsy 2.0 in substrait blog article (c264477)
bump query engine count in README and use project-preferred names (11169f7)
don't sort backends by coverage percentage by default (68f73b1)
drop docs versioning (d7140e7)
duckdb: fix broken docstring examples (51084ad)
enable light/dark mode toggle in docs (b9e812a)
fill out table API with working examples (16fc8be)
fix notebook logging example (04b75ef)
how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
improve Expr.substitute() docstring (b954edd)
improve/update pandas walkthrough (80b05d8)
io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
move tutorial from docs to external ibis-examples repo (11b0237)
parquet: add docstring examples for to_parquet incl. partitioning (8040164)
point to ibis-examples repo in the README (1205636)
README.md: clean up readme, fix typos, alter the example (383a3d3)
remove duplicate "or" (b6ef3cc)
remove duplicate spark backend in install docs (5954618)
render __dunder__ method API documentation (b532c63)
rerender ci-analysis notebook with new table header colors (50507b6)
streamlit: fix url for support matrix (594199b)
tutorial: remove impala from sql tutorial (7627c13)
use teal for primary & accent colors (24be961)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

5.0.0

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

Features

Bug Fixes

Performance

Refactors

Deprecations

Documentation