Improve Reliability and Safeguards for RMS_Update Script #505
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background & Issues in the Original Script
Risk of Partial Copy/Corruption
cp
commands on.config
andmask.bmp
cp
leaves files in undefined stateNo Process Control
No Copy Recovery
cp
attempt with no retry on failureNo Space Verification
Interruption Handling
UPDATEINPROGRESSFILE
but doesn't verify file states after interruptionParameter Handling Risk
$# -eq 0
check for parameter detectionDependencies Not Updated in One Pass
How This PR Addresses These Issues
Atomic Copy Operations
retry_cp
function that:.tmp
filediff
against sourcemv
for atomic replacementProcess Isolation
/tmp/update.lock
with current PIDCopy Validation Cycle
RETRY_LIMIT
for failed operationsdiff
before acceptanceSpace Pre-checks
/tmp
, source, backup)State Tracking
UPDATEINPROGRESSFILE
Single Update Path
Dependencies Installed in the Same Run
The script now uses a single, verifiable update path. Every critical operation includes integrity checks and automatic recovery steps. The update process is atomic - either it completes fully or rolls back safely.
Remaining issue
If the update process fails, the script currently doesn’t provide an obvious signal for the operator (e.g., a prompt or notification). The system could continue running on the old RMS version without anyone realizing.
One solution is to modify First_Run so that if the update fails, it exits before launching start_capture, thereby stopping the station altogether. This makes the failure more evident (the station isn’t capturing), prompting the operator to investigate and fix the update issue. The tradeoff, of course, is that the station remains offline until the update is resolved.