cirrus-run thinks the build is running long after it's done on Cirrus CI side #8

sio · 2021-10-15T11:53:18Z

Less than 1% of cirrus-run invocations hang indefinitely after Cirrus CI has long finished the corresponding build. CIRRUS_TIMEOUT is eventually reached and job failure is reported.

This issue needs further investigation. Is API server reporting incorrect build status sometimes? Is this some kind of cache/CDN issue?

Troubleshooting is difficult because of the rarity of this failure and because most invocations of cirrus-run happen non-interactively (via another CI service, e.g. GitLab).

Observer needs to act quickly upon encountering cirrus-run timeout:

Confirm that the build is in fact finished on Cirrus CI side. Link to the build is usually printed to stdout by cirrus-run.
Check API response for that particular build status:
- Optional: Ensure that CIRRUS_API_TOKEN environment variable is provided with a correct value. Without a token only public repos will be viewable, and API rate limits will probably be more strict.
- Execute make debug/build_status DEBUG_BUILD_ID=5735044040884224 from repo top-level directory (replace the number with your build ID)
Report the output here. If the script just keeps repeating the same "EXECUTING" status you can interrupt it with Ctrl+C or keep running to see when/if it fails.

The text was updated successfully, but these errors were encountered:

sio · 2022-03-23T07:30:00Z

I've caught another snowflake yesterday.

Looks like some kind of caching issue. I was running a lot of concurrent Cirrus CI jobs (30+) and most of them were delayed by community cluster scheduler, so a lot of cirrus-run instances were just sitting there each querying the API every few seconds. It appears that after some amount of repeated queries reply got cached somewhere and was not updated when job finished successfully. Running the same query from a different host (my workstation) produced the correct result immediately.

There is no caching built into cirrus-run, so stale cache must be coming from the API itself or from some middleware in between (CDN?). I'm not sure if/how this is fixable on our side.

GitLab CI log: cirrus-run hangs indefinitely. Output verbosity is set to low, unfortunately.
Cirrus CI build
Debugging script returned correct status immediately:

$ .venv/bin/python debug/build_status.py 6671363116105728
https://cirrus-ci.com/build/6671363116105728
2022-03-22 14:57:52+00:00 (UTC)
{
  "build": {
    "id": "6671363116105728",
    "durationInSeconds": 1193,
    "clockDurationInSeconds": 1264,
    "status": "COMPLETED",
    "buildCreatedTimestamp": 1647958124677,
    "changeTimestamp": 1647958124677
  }
}

sio added a commit that referenced this issue Oct 15, 2021

Helper script for issue #8

6071392

sio added the help wanted Extra attention is needed label Oct 15, 2021

sio added a commit that referenced this issue Jan 19, 2022

Helper script for issue #8

13ebc61

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cirrus-run thinks the build is running long after it's done on Cirrus CI side #8

cirrus-run thinks the build is running long after it's done on Cirrus CI side #8

sio commented Oct 15, 2021 •

edited

Loading

sio commented Mar 23, 2022 •

edited

Loading

cirrus-run thinks the build is running long after it's done on Cirrus CI side #8

cirrus-run thinks the build is running long after it's done on Cirrus CI side #8

Comments

sio commented Oct 15, 2021 • edited Loading

sio commented Mar 23, 2022 • edited Loading

sio commented Oct 15, 2021 •

edited

Loading

sio commented Mar 23, 2022 •

edited

Loading