Skip to content

5.0.0

Compare
Choose a tag to compare
@ibis-project-bot ibis-project-bot released this 15 Mar 22:36

5.0.0 (2023-03-15)

⚠ BREAKING CHANGES

  • api: Snowflake identifiers are now kept as is from the database. Many table names and column names may now be in SHOUTING CASE. Adjust code accordingly.
  • backend: Backends now raise ibis.common.exceptions.UnsupportedOperationError in more places during compilation. You may need to catch this error type instead of the previous type, which differed between backends.
  • ux: Table.info now returns an expression
  • ux: Passing a sequence of column names to Table.drop is removed. Replace drop(cols) with drop(*cols).
  • The spark plugin alias is removed. Use pyspark instead
  • ir: removed ibis.expr.scope and ibis.expr.timecontext modules, access them under ibis.backends.base.df.<module>
  • some methods have been removed from the top-level ibis.<backend> namespaces, access them on a connected backend instance instead.
  • common: removed ibis.common.geospatial, import the functions from ibis.backends.base.sql.registry.geospatial
  • datatypes: JSON is no longer a subtype of String
  • datatype: Category, CategoryValue/Column/Scalar are removed. Use string types instead.
  • ux: The metric_name argument to value_counts is removed. Use Table.relabel to change the metric column's name.
  • deps: the minimum version of parsy is now 2.0
  • ir/backends: removed the following symbols:
  • ibis.backends.duckdb.parse_type() function
  • ibis.backends.impala.Backend.set_database() method
  • ibis.backends.pyspark.Backend.set_database() method
  • ibis.backends.impala.ImpalaConnection.ping() method
  • ibis.expr.operations.DatabaseTable.change_name() method
  • ibis.expr.operations.ParseURL class
  • ibis.expr.operations.Value.to_projection() method
  • ibis.expr.types.Table.get_column() method
  • ibis.expr.types.Table.get_columns() method
  • ibis.expr.types.StringValue.parse_url() method
  • schema: Schema.from_dict(), .delete() and .append() methods are removed
  • datatype: struct_type.pairs is removed, use struct_type.fields instead
  • datatype: Struct(names, types) is not supported anymore, pass a dictionary to Struct constructor instead

Features

  • add max_columns option for table repr (a3aa236)
  • add examples API (b62356e)
  • api: add map/array accessors for easy conversion of JSON to stronger-typed values (d1e9d11)
  • api: add array to string join operation (74de349)
  • api: add builtin support for relabeling columns to snake case (1157273)
  • api: add support for passing a mapping to ibis.map (d365fd4)
  • api: allow single argument set operations (bb0a6f0)
  • api: implement to_pandas() API for ecosystem compatibility (cad316c)
  • api: implement isin (ac31db2)
  • api: make cache evaluate only once per session per expression (5a8ffe9)
  • api: make create_table uniform (833c698)
  • api: more selectors (5844304)
  • api: upcast pandas DataFrames to memtables in rlz.table rule (8dcfb8d)
  • backends: implement ops.Time for sqlalchemy backends (713cd33)
  • bigquery: add BIGNUMERIC type support (5c98ea4)
  • bigquery: add UUID literal support (ac47c62)
  • bigquery: enable subqueries in select statements (ef4dc86)
  • bigquery: implement create and drop table method (5f3c22c)
  • bigquery: implement create_view and drop_view method (a586473)
  • bigquery: support creating tables from in-memory tables (c3a25f1)
  • bigquery: support in-memory tables (37e3279)
  • change Rich repr of dtypes from blue to dim (008311f)
  • clickhouse: implement ArrayFilter translation (f2144b6)
  • clickhouse: implement ops.ArrayMap (45000e7)
  • clickhouse: implement ops.MapLength (fc82eaa)
  • clickhouse: implement ops.Capitalize (914c64c)
  • clickhouse: implement ops.ExtractMillisecond (ee74e3a)
  • clickhouse: implement ops.RandomScalar (104aeed)
  • clickhouse: implement ops.StringAscii (a507d17)
  • clickhouse: implement ops.TimestampFromYMDHMS, ops.DateFromYMD (05f5ae5)
  • clickhouse: improve error message for invalid types in literal (e4d7799)
  • clickhouse: support asof_join (7ed5143)
  • common: add abstract mapping collection with support for set operations (7d4aa0f)
  • common: add support for variadic positional and variadic keyword annotations (baea1fa)
  • common: hold typehint in the annotation objects (b3601c6)
  • common: support Callable arguments and return types in Validator.from_annotable() (ae57c36)
  • common: support positional only and keyword only arguments in annotations (340dca1)
  • dask/pandas: raise OperationNotDefinedError exc for not defined operations (2833685)
  • datafusion: implement ops.Degress, ops.Radians (7e61391)
  • datafusion: implement ops.Exp (7cb3ade)
  • datafusion: implement ops.Pi, ops.E (5a74cb4)
  • datafusion: implement ops.RandomScalar (5d1cd0f)
  • datafusion: implement ops.StartsWith (8099014)
  • datafusion: implement ops.StringAscii (b1d7672)
  • datafusion: implement ops.StrRight (016a082)
  • datafusion: implement ops.Translate (2fe3fc4)
  • datafusion: support substr without end (a19fd87)
  • datatype/schema: support datatype and schema declaration using type annotated classes (6722c31)
  • datatype: enable inference of Decimal type (8761732)
  • datatype: implement Mapping abstract base class for StructType (5df2022)
  • deps: add Python 3.11 support and tests (6f3f759)
  • druid: add Apache Druid backend (c4cc2a6)
  • druid: implement bitwise operations (3ac7447)
  • druid: implement ops.Pi, ops.Modulus, ops.Power, ops.Log10 (090ff03)
  • druid: implement ops.Sign (35f52cc)
  • druid: implement ops.StringJoin (42cd9a3)
  • duckdb: add support for reading tables from sqlite databases (9ba2211)
  • duckdb: add UUID type support (5cd6d76)
  • duckdb: implement ArrayFilter translation (5f35d5c)
  • duckdb: implement ops.ArrayMap (063602d)
  • duckdb: implement create_view and drop_view method (4f73953)
  • duckdb: implement ops.Capitalize (b17116e)
  • duckdb: implement ops.TimestampDiff, ops.IntervalAdd, ops.IntervalSubtract (a7fd8fb)
  • duckdb: implement uuid result type (3150333)
  • duckdb: support dt.MACADDR, dt.INET as string (c4739c7)
  • duckdb: use read_json_auto when reading json (4193867)
  • examples: add imdb dataset examples (3d63203)
  • examples: add movielens small dataset (5f7c15c)
  • examples: add wowah_data data to examples (bf9a7cc)
  • examples: enable progressbar and faster hashing (4adfe29)
  • impala: implement ops.Clip (279fd78)
  • impala: implement ops.Radians, ops.Degress (a794ace)
  • impala: implement ops.RandomScalar (874f2ff)
  • io: add to_parquet, to_csv to backends (fecca42)
  • ir: add ArrayFilter operation (e719d60)
  • ir: add ArrayMap operation (49e5f7a)
  • mysql: support in-memory tables (4dfabbd)
  • pandas/dask: implement bitwise operations (4994add)
  • pandas/dask: implement ops.Pi, ops.E (091be3c)
  • pandas: add basic unnest support (dd36b9d)
  • pandas: implement ops.StartsWith, ops.EndsWith (2725423)
  • pandas: support more pandas extension dtypes (54818ef)
  • polars: implement ops.Union (17c6011)
  • polars: implement ops.Pi, ops.E (6d8fc4a)
  • postgres: allow connecting with an explicit schema (39c9ea8)
  • postgres: fix interval literal (c0fa933)
  • postgres: implement argmin/argmax (82668ec)
  • postgres: parse tsvector columns as strings (fac8c47), closes #5402
  • pyspark: add support for ops.ArgMin and ops.ArgMax (a3fa57c)
  • pyspark: implement ops.Between (ed83465)
  • return Table from create_table(), create_view() (e4ea597)
  • schema: implement Mapping abstract base class for Schema (167d85a)
  • selectors: support ranges (e10caf4)
  • snowflake: add support for alias in snowflake (b1b947a)
  • snowflake: add support for bulk upload for temp tables in snowflake (6cc174f)
  • snowflake: add UUID literal support (436c781)
  • snowflake: implement argmin/argmax (8b998a5)
  • snowflake: implement ops.BitwiseAnd, ops.BitwiseNot, ops.BitwiseOr, ops.BitwiseXor (1acd4b7)
  • snowflake: implement ops.GroupConcat (2219866)
  • snowflake: implement remaining map functions (c48c9a6)
  • snowflake: support binary variance reduction with filters (eeabdee)
  • snowflake: support cross-database table access (79cb445)
  • sqlalchemy: generalize unnest to work on backends that don't support it (5943ce7)
  • sqlite: add sqlite type support (addd6a9)
  • sqlite: support in-memory tables (1b24848)
  • sql: support for creating temporary tables in sql based backends (466cf35)
  • tables: cast table using schema (96ce109)
  • tables: implement pivot_longer API (11c5736)
  • trino: enable MapLength operation (a7ad1db)
  • trino: implement ArrayFilter translation (50f6fcc)
  • trino: implement ops.ArrayMap (657bf61)
  • trino: implement ops.Between (d70b9c0)
  • trino: support sqlalchemy 2 (0d078c1)
  • ux: accept selectors in Table.drop (325140f)
  • ux: allow creating unbound tables using annotated class definitions (d7bf6a2)
  • ux: easy interactive setup (6850146)
  • ux: expose between, rows and range keyword arguments in value.over() (5763063)

Bug Fixes

  • analysis: extract Limit subqueries (62f6e14)
  • api: add a name attribute to backend proxy modules (d6d8e7e)
  • api: fix broken __radd__ array concat operation (121d9a0)
  • api: only include valid python identifiers in struct tab completion (8f33775)
  • api: only include valid python identifiers in table tab completion (031a48c)
  • backend: provide useful error if default backend is unavailable (1dbc682)
  • backends: fix capitalize implementations across all backends (d4f0275)
  • backends: fix null literal handling (7f46342)
  • bigquery: ensure that memtables are translated correctly (d6e56c5)
  • bigquery: fix decimal literals (4a04c9b)
  • bigquery: regenerate negative string index sql snapshots (3f02c73)
  • bigquery: regenerate sql for predicate pushdown fix (509806f)
  • cache: remove bogus schema argument and validate database argument type (c4254f6)
  • ci: fix invalid test id (f70de1d)
  • clickhouse: fix decimal literal (4dcd2cb)
  • clickhouse: fix set ops with table operands (86bcf32)
  • clickhouse: raise OperationNotDefinedError if operation is not supported (71e2570)
  • clickhouse: register in-memory tables in pyarrow-related calls (09a045c)
  • clickhouse: use a bool type supported by clickhouse_driver (ab8f064)
  • clickhouse: workaround sqlglot's insistence on uppercasing (6151f37)
  • compiler: generate aliases in a less clever way (04a4aa5)
  • datafusion: support sum aggregation on bool column (9421400)
  • deps: bump duckdb to 0.7.0 (38d2276)
  • deps: bump snowflake-connector-python upper bound (b368b04)
  • deps: ensure that pyspark depends on sqlalchemy (60c7382)
  • deps: update dependency pyarrow to v11 (2af5d8d)
  • deps: update dependency sqlglot to v11 (e581e2f)
  • don't expose backend methods on ibis.<backend> directly (5a16431)
  • druid: remove invalid operations (19f214c)
  • duckdb: add null to duckdb datatype parser (07d2a86)
  • duckdb: ensure that temp_directory exists (00ba6cb)
  • duckdb: explicitly set timezone to UTC on connection (6ae4a06)
  • duckdb: fix blob type in literal (f66e8a1)
  • duckdb: fix memtable to_pyarrow/to_pyarrow_batches (0e8b066)
  • duckdb: in-memory objects registered with duckdb show up in list_tables (7772f79)
  • duckdb: quote identifiers if necessary in struct_pack (6e598cc)
  • duckdb: support casting to unsigned integer types (066c158)
  • duckdb: treat g re_replace flag as literal text (aa3c31c)
  • duckdb: workaround an ownership bug at the interaction of duckdb, pandas and pyarrow (2819cff)
  • duckdb: workaround duckdb bug that prevents multiple substitutions (0e09220)
  • imports: remove top-level import of sqlalchemy from base backend (b13cf25)
  • io: add read_parquet and read_csv to base backend mixin (ce80d36), closes #5420
  • ir: incorrect predicate pushdown (9a9204f)
  • ir: make find_subqueries return in topological order (3587910)
  • ir: properly raise error if literal cannot be coerced to a datatype (e16b91f)
  • ir: reorder the right schema of set operations to align with the left schema (58e60ae)
  • ir: use rlz.map_to() rule instead of isin to normalize temporal units (a1c46a2)
  • ir: use static connection pooling to prevent dropping temporary state (6d2ae26)
  • mssql: set sqlglot to tsql (1044573)
  • mysql: remove invalid operations (8f34a2b)
  • pandas/dask: handle non numpy scalar results in wrap_case_result (a3b82f7)
  • pandas: don't try to dispatch on arrow dtype if not available (d22ae7b)
  • pandas: handle casting to arrays with None elements (382b90f)
  • pandas: handle NAs in array conversion (06bd15d)
  • polars: back compat for concat_str separator argument (ced5a61)
  • polars: back compat for the reverse/descending argument (f067d81)
  • polars: polars execute respect limit kwargs (d962faf)
  • polars: properly infer polars categorical dtype (5a4707a)
  • polars: use metric name in aggregate output to dedupe columns (234d8c1)
  • pyspark: fix incorrect ops.EndsWith translation rule (4c0a5a2)
  • pyspark: fix isnan and isinf to work on bool (8dc623a)
  • snowflake: allow loose casting of objects and arrays (1cf8df0)
  • snowflake: ensure that memtables are translated correctly (b361e07)
  • snowflake: ensure that null comparisons are correct (9b83699)
  • snowflake: ensure that quoting matches snowflake behavior, not sqlalchemy (b6b67f9)
  • snowflake: ensure that we do not try to use a None schema or database (03e0265)
  • snowflake: handle the case where pyarrow isn't installed (b624fa3)
  • snowflake: make array_agg preserve nulls (24b95bf)
  • snowflake: quote column names on construction of sa.Column (af4db5c)
  • snowflake: remove broken pyarrow fetch support (c440adb)
  • snowflake: return NULL when trying to call map functions on non-object JSON (d85fb28)
  • snowflake: use _flatten to avoid overriding unrelated function in other backends (8c31594)
  • sqlalchemy: ensure that isin contains full column expression (9018eb6)
  • sqlalchemy: get builtin dialects working; mysql/mssql/postgres/sqlite (d2356bc)
  • sqlalchemy: make strip family of functions behave like Python (dd0a04c)
  • sqlalchemy: reflect most recent schema when view is replaced (62c8dea)
  • sqlalchemy: use sa.true instead of Python literal (8423eba)
  • sqlalchemy: use indexed group by key references everywhere possible (9f1ddd8)
  • sql: ensure that set operations generate valid sql in the presence of additional constructs such as sort keys (3e2c364)
  • sqlite: explicite disallow array in literal (de73b37)
  • sqlite: fix random scalar range (26d0dde)
  • support negative string indices (f84a54d)
  • trino: workaround broken dialect (b502faf)
  • types: fix argument types of Table.order_by() (6ed3a97)
  • util: make convert_unit work with python types (cb3a90c)
  • ux: give the value_counts aggregate column a better name (abab1d7)
  • ux: make string range selectors inclusive (7071669)
  • ux: make top level set operations work (f5976b2)

Performance

  • duckdb: faster to_parquet/to_csv implementations (6071bb5)

  • fix duckdb insert-from-dataframe performance (cd27b99)

  • deps: bump minimum required version of parsy (22020cb)

  • remove spark alias to pyspark and associated cruft (4b286bd)

Refactors

  • analysis: slightly simplify find_subqueries() (ab3712f)
  • backend: normalize exceptions (065b66d)
  • clickhouse: clean up parsing rules (6731772)
  • common: move frozendict and DotDict to ibis.common.collections (4451375)
  • common: move the geospatial module to the base SQL backend (3e7bfa3)
  • dask: remove unneeded create_table() (86885a6)
  • datatype: clean up parsing rules (c15fb5f)
  • datatype: remove Category type and related APIs (bb0ee78)
  • datatype: remove StructType.pairs property in favor of identical fields attribute (6668122)
  • datatypes: move sqlalchemy datatypes to specfic backend (d7b49eb)
  • datatypes: remove String parent type from JSON type (34f3898)
  • datatype: use a dictionary to store StructType fields rather than names and types tuples (84455ac)
  • datatype: use lazy dispatch when inferring pandas Timedelta objects (e5280ea)
  • drop limit kwarg from to_parquet/to_csv (a54460c)
  • duckdb: clean up parsing rules (30da8f9)
  • duckdb: handle parsing timestamp scale (16c1443)
  • duckdb: remove unused list<...> parsing rule (f040b86)
  • duckdb: use a proper sqlalchemy construct for structs and reduce casting (8daa4a1)
  • ir/api: introduce window frame operation and revamp the window API (2bc5e5e)
  • ir/backends: remove various deprecated functions and methods (a8d3007)
  • ir: reorganize the scope and timecontext utilities (80bd494)
  • ir: update ArrayMap to use the new callable_with validation rule (560474e)
  • move pretty repr tests back to their own file (4a75988)
  • nix: clean up marker argument construction (12eb916)
  • postgres: clean up datatype parsing (1f61661)
  • postgres: clean up literal arrays (21b122d)
  • pyspark: remove another private function (c5081cf)
  • remove unnecessary top-level rich console (8083a6b)
  • rules: remove unused non_negative_integer and pair rules (e00920a)
  • schema: remove deprecated Schema.from_dict(), .delete() and .append() methods (8912b24)
  • snowflake: remove the need for parsy (c53403a)
  • sqlalchemy: set session parameters once per connection (ed4b476)
  • sqlalchemy: use backend-specific startswith/endswith implementations (6101de2)
  • test_sqlalchemy.py: move to snapshot testing (96998f0)
  • tests: reorganize rules test file to the ibis.expr subpackage (47f0909)
  • tests: reorganize schema test file to the ibis.expr subpackage (40033e1)
  • tests: reorganize datatype test files to the datatypes subpackage (16199c6)
  • trino: clean up datatype parsing (84c0e35)
  • ux: return expression from Table.info (71cc0e0)

Deprecations

  • api: deprecate summary API (e449c07)
  • api: mark ibis.sequence() for removal (3589f80)

Documentation

  • add a bunch of string expression examples (18d3112)
  • add Apache Druid to backend matrix (764d9c3)
  • add CNAME file to mkdocs source (6d19111)
  • add druid to the backends index docs page (ad0b6a3)
  • add missing DataFusion entry to the backends in the README (8ce025a)
  • add redirects for common old pages (c9087f2)
  • api: document deferred API and its pitfalls (8493604)
  • api: improve collect method API documentation (b4fcef1)
  • array expression examples (6812c17)
  • backends: document default backend configuration (6d917d3)
  • backends: link to configuration from the backends list (144044d)
  • blob: blog on ibis + substrait + duckdb (5dc7a0a)
  • blog: adds examples sneak peek blog + assets folder (fcbb3d5)
  • blog: adds to file sneak peek blog (128194f)
  • blog: specify parsy 2.0 in substrait blog article (c264477)
  • bump query engine count in README and use project-preferred names (11169f7)
  • don't sort backends by coverage percentage by default (68f73b1)
  • drop docs versioning (d7140e7)
  • duckdb: fix broken docstring examples (51084ad)
  • enable light/dark mode toggle in docs (b9e812a)
  • fill out table API with working examples (16fc8be)
  • fix notebook logging example (04b75ef)
  • how-to: fix sessionize.md to use ibis.read_parquet (ff9cbf7)
  • improve Expr.substitute() docstring (b954edd)
  • improve/update pandas walkthrough (80b05d8)
  • io: doc/ux improvements for read_parquet and friends (2541556), closes #5420
  • io: update README.md to recommend installing duckdb as default backend (0a72ec0), closes #5423 #5420
  • move tutorial from docs to external ibis-examples repo (11b0237)
  • parquet: add docstring examples for to_parquet incl. partitioning (8040164)
  • point to ibis-examples repo in the README (1205636)
  • README.md: clean up readme, fix typos, alter the example (383a3d3)
  • remove duplicate "or" (b6ef3cc)
  • remove duplicate spark backend in install docs (5954618)
  • render __dunder__ method API documentation (b532c63)
  • rerender ci-analysis notebook with new table header colors (50507b6)
  • streamlit: fix url for support matrix (594199b)
  • tutorial: remove impala from sql tutorial (7627c13)
  • use teal for primary & accent colors (24be961)