[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

choec0130 · 2024-01-17T09:31:27Z

What type of bug is this?

Unexpected error

What subsystems and features are affected?

Continuous aggregate

What happened?

I am trying to call refresh my continuous aggregate via a trigger and get this error
{\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"}

here's my trigger
create trigger refresh_aggregates after update or insert or delete on pl execute function refresh_pl_continuous_aggregates();

and my trigger function
create function refresh_pl_continuous_aggregates() returns trigger as $$ BEGIN CALL refresh_continuous_aggregate('aggregates_by_pl_by_day', '2024-01-01', INTERVAL '2 day'); END; $$ LANGUAGE 'plpgsql';

continuous aggregate
CREATE MATERIALIZED VIEW aggregates_by_pl_by_day WITH (timescaledb.continuous) AS SELECT time_bucket('1 day' , modified_date) as day_bucket, sl.pl_id, resource_type, taxonomy, FROM activities JOIN s_licenses sl ON sl.s_id = activities.s_id GROUP BY 1, sl.pl_id, resource_type, taxonomy;

TimescaleDB version affected

2.13.0

PostgreSQL version used

15.5

What operating system did you use?

Ubuntu 15.5-1.pgdg22.04+1 on x86_64-pc-linux-gnu

What installation method did you use?

Not applicable

What platform did you run on?

Timescale Cloud

Relevant log output and stack trace

{\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"}

How can we reproduce the bug?

1. Create an "after" trigger that manually refreshes a continuous aggregate using `refresh_continuous_aggregate`
2. Set off the trigger.
3. Notice that the continuous aggregate doesn't refresh and instead the error is produced.

The text was updated successfully, but these errors were encountered:

konskov · 2024-01-17T12:55:53Z

Hello @choec0130 , thank you for reaching out.
Sharing a minimal example to reproduce, will look further into the issue.

CREATE TABLE metrics (
    time timestamptz NOT NULL,
    device_id int,
    device_id_peer int,
    v0 int,
    v1 int,
    v2 float,
    v3 float
);

SELECT create_hypertable('metrics', 'time');

INSERT INTO metrics (time, device_id, device_id_peer, v0, v1, v2, v3)
SELECT time,
    device_id,
    0,
    device_id + 1,
    device_id + 2,
    0.5,
    NULL
FROM generate_series('2024-01-01 0:00:00+0'::timestamptz, '2000-01-04 23:55:00+0', '20m') gtime (time),
    generate_series(1, 2, 1) gdevice (device_id);

CREATE MATERIALIZED VIEW metrics_cagg WITH (timescaledb.continuous,
  timescaledb.materialized_only = true)
AS
SELECT time_bucket('1 day', time) as dayb, device_id,
       sum(v0), avg(v3)
FROM metrics
GROUP BY 1, 2
WITH NO DATA;

CREATE FUNCTION refresh_pl_continuous_aggregates() RETURNS TRIGGER AS $$ BEGIN CALL refresh_continuous_aggregate(
    'metrics_cagg',
    '2024-01-01',
    INTERVAL '2 day'
);
END;
$$ LANGUAGE 'plpgsql';

CREATE TRIGGER refresh_aggregates AFTER UPDATE OR INSERT OR DELETE ON metrics
EXECUTE FUNCTION refresh_pl_continuous_aggregates();

As a side note, you can use realtime aggregation in your caggs (by specifying materialized_only = false) and adjust the schedule interval of a refresh policy to run as frequently as you need, in order to get up-to-date values when querying your continuous aggregate, instead of using a trigger like this.

choec0130 · 2024-01-17T16:33:35Z

@konskov Thank you! I have realtime aggregation on but I'm actually using the trigger in combination with invalidating the continuous aggregate because I need to account for updates on joined tables and not the hypertable itself, which is why I'm setting up a trigger. Basically the solution talked about here: #6353

Thanks so much for looking into this!

aiman-saeed · 2024-11-04T17:58:58Z

I faced this too on timescale/timescaledb-ha:pg15.8-ts2.17.1 - whenever I prepare a dynamic statement in a procedure.

For instance, this works:
CALL refresh_continuous_aggregate('visual_ds', NULL, INTERVAL '1 week');

But, this doesn't:
EXECUTE format('CALL refresh_continuous_aggregate(%L, NULL, INTERVAL %L)', _cagg.user_view_name, buffer_period);

Procedures that use multiple transactions cannot be run in a transaction block (from a function, from dynamic SQL) or in a subtransaction (from a procedure block with an EXCEPTION clause). Such procedures use PreventInTransactionBlock function to check whether they can be run. Though currently such checks are incompete, because PreventInTransactionBlock requires isTopLevel argument to throw a consistent error when the call originates from a function. This isTopLevel flag (that is a bit poorly named - see below) is not readily available inside C procedures. The source of truth for it - ProcessUtilityContext parameter is passed to ProcessUtility hooks, but is not included with the function calls. There is an undocumented SPI_inside_nonatomic_context function, that would have been sufficient for isTopLevel flag, but it currently returns false when SPI connection is absent (that is a valid scenario when C procedures are called from top-lelev SQL instead of PLPG procedures or DO blocks) so it cannot be used. To work around this the value of ProcessUtilityContext parameter is saved when TS ProcessUtility hook is entered and can be accessed from C procedures using new ts_process_utility_is_context_nonatomic function. The result is called "non-atomic" instead of "top-level" because the way how isTopLevel flag is determined from the ProcessUtilityContext value in standard_ProcessUtility is insufficient for C procedures - it excludes PROCESS_UTILITY_QUERY_NONATOMIC value (used when called from PLPG procedure without an EXCEPTION clause) that is a valid use case for C procedures with transactions. See details in the description of ExecuteCallStmt function. It is expected that calls to C procedures are done with CALL and always pass though the ProcessUtility hook. The ProcessUtilityContext parameter is set to PROCESS_UTILITY_TOPLEVEL value by default. In unlikely case when a C procedure is called without passing through ProcessUtility hook and the call is done in atomic context, then PreventInTransactionBlock checks will pass, but SPI_commit will fail when checking that all current active snapshots are portal-owned snapshots (the same behaviour that was observed before this change). In atomic context there will be an additional snapshot set in _SPI_execute_plan, see the snapshot handling invariants description in that function. Closes timescale#6533.

staticlibs · 2025-01-01T17:34:43Z

I think the current behaviour is correct - refresh_continuous_aggregate uses transactions internally, so cannot be called from functions or from dynamic SQL. But the error message reported could have been better. There is a transaction check in refresh_continuous_aggregate but it seems to be ineffective. I've filed #7566 PR to improve this check and report one of the following error messages:

ERROR: refresh_continuous_aggregate() cannot run inside a transaction block
ERROR: refresh_continuous_aggregate() cannot be executed from a function

instead of ERROR: portal snapshots (N) did not account for all active snapshots (N+1).

Procedures that use multiple transactions cannot be run in a transaction block (from a function, from dynamic SQL) or in a subtransaction (from a procedure block with an EXCEPTION clause). Such procedures use PreventInTransactionBlock function to check whether they can be run. Though currently such checks are incompete, because PreventInTransactionBlock requires isTopLevel argument to throw a consistent error when the call originates from a function. This isTopLevel flag (that is a bit poorly named - see below) is not readily available inside C procedures. The source of truth for it - ProcessUtilityContext parameter is passed to ProcessUtility hooks, but is not included with the function calls. There is an undocumented SPI_inside_nonatomic_context function, that would have been sufficient for isTopLevel flag, but it currently returns false when SPI connection is absent (that is a valid scenario when C procedures are called from top-lelev SQL instead of PLPG procedures or DO blocks) so it cannot be used. To work around this the value of ProcessUtilityContext parameter is saved when TS ProcessUtility hook is entered and can be accessed from C procedures using new ts_process_utility_is_context_nonatomic function. The result is called "non-atomic" instead of "top-level" because the way how isTopLevel flag is determined from the ProcessUtilityContext value in standard_ProcessUtility is insufficient for C procedures - it excludes PROCESS_UTILITY_QUERY_NONATOMIC value (used when called from PLPG procedure without an EXCEPTION clause) that is a valid use case for C procedures with transactions. See details in the description of ExecuteCallStmt function. It is expected that calls to C procedures are done with CALL and always pass though the ProcessUtility hook. The ProcessUtilityContext parameter is set to PROCESS_UTILITY_TOPLEVEL value by default. In unlikely case when a C procedure is called without passing through ProcessUtility hook and the call is done in atomic context, then PreventInTransactionBlock checks will pass, but SPI_commit will fail when checking that all current active snapshots are portal-owned snapshots (the same behaviour that was observed before this change). In atomic context there will be an additional snapshot set in _SPI_execute_plan, see the snapshot handling invariants description in that function. Closes timescale#6533.

choec0130 added the bug label Jan 17, 2024

konskov self-assigned this Jan 17, 2024

staticlibs linked a pull request Jan 1, 2025 that will close this issue

Improve transaction check in refresh_cagg #7566

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

choec0130 commented Jan 17, 2024

konskov commented Jan 17, 2024

choec0130 commented Jan 17, 2024 •

edited

Loading

aiman-saeed commented Nov 4, 2024 •

edited

Loading

staticlibs commented Jan 1, 2025

[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

[Bug]: {\"message\":\"ERROR: portal snapshots (0) did not account for all active snapshots (1) (SQLSTATE XX000)\"} when refreshing a continuous aggregate via a trigger #6533

Comments

choec0130 commented Jan 17, 2024

What type of bug is this?

What subsystems and features are affected?

What happened?

TimescaleDB version affected

PostgreSQL version used

What operating system did you use?

What installation method did you use?

What platform did you run on?

Relevant log output and stack trace

How can we reproduce the bug?

konskov commented Jan 17, 2024

choec0130 commented Jan 17, 2024 • edited Loading

aiman-saeed commented Nov 4, 2024 • edited Loading

staticlibs commented Jan 1, 2025

choec0130 commented Jan 17, 2024 •

edited

Loading

aiman-saeed commented Nov 4, 2024 •

edited

Loading