fix(log_parser_bootstrap): guard Claude guardrail with safeOutputEntriesCount escape hatch#41886
Conversation
…iesCount escape hatch - [x] Fix Claude guardrail: gate `setFailed` on `safeOutputEntriesCount === 0`; downgrade to `core.warning` when safe outputs exist - [x] Suppress misleading 'parsed successfully' message for Claude on empty-logEntries fallback path - [x] Add 3 test cases: Claude + empty logEntries + safe outputs → warning (no setFailed); Claude + empty logEntries + no safe outputs → setFailed; Claude + empty logEntries → no 'parsed successfully' message Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR #41886 does not have the 'implementation' label and has 0 new lines of code in business logic directories. |
|
✅ Test Quality Sentinel completed test quality analysis. |
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ PR Code Quality Reviewer completed the code quality review. |
There was a problem hiding this comment.
Pull request overview
This PR adjusts the log_parser_bootstrap Claude guardrail so that an empty logEntries result does not fail the run when there is evidence the agent completed successfully (i.e., safe outputs were emitted). This prevents intermittent false-negative failures caused by post-completion infrastructure teardown races that can make the structured log unreadable.
Changes:
- Downgrade the Claude “empty structured log entries” guardrail from
core.setFailedtocore.warningwhensafeOutputEntriesCount > 0, while preserving failure when there are zero safe outputs. - Suppress the misleading
${parserName} log parsed successfullyinfo message on the Claude empty-logEntriesfallback path. - Add/extend unit tests covering the new warning vs failure behavior and the suppressed success-message behavior.
Show a summary per file
| File | Description |
|---|---|
| actions/setup/js/log_parser_bootstrap.cjs | Adds a safe-outputs-based escape hatch for the Claude empty-logEntries guardrail and avoids contradictory “parsed successfully” messaging on the fallback path. |
| actions/setup/js/log_parser_bootstrap.test.cjs | Adds test cases asserting the new non-fatal warning behavior (when safe outputs exist), preserved failure behavior (when safe outputs do not exist), and the absence of the “parsed successfully” log for Claude on the fallback path. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 2/2 changed files
- Comments generated: 0
- Review effort level: Low
🧪 Test Quality Sentinel Report✅ Test Quality Score: 90/100 — Excellent
📊 Metrics & Test Classification (3 tests analyzed)
Go: 0 (
|
There was a problem hiding this comment.
Non-blocking observations only — logic is sound
The escape-hatch design (safeOutputEntriesCount > 0 → warning instead of setFailed) correctly mirrors the existing MCP-failure pattern and addresses the described teardown-race root cause. No blocking issues found.
Findings summary
Test coverage gaps (both non-blocking):
- Third test (
"should not print...") only asserts the absence of the success message; it does not assertsetFailedwas not called orwarningwas called. Tests 1 and 2 provide that coverage, but the third test stands on weak ground as a standalone regression guard. - All three new tests use
logEntries: []exclusively; the!logEntries(null) branch of the guardrail is never exercised by the new suite. Both paths emit the same warning, so this is a coverage gap, not a correctness gap.
Design note (acknowledged):
The escape hatch assumes GH_AW_SAFE_OUTPUTS always reflects the current run. In the GitHub Actions environment this holds, and it is the same assumption made by the pre-existing MCP-failure escape hatch. Worth having a comment documenting that assumption explicitly so the intent is clear to future readers.
🔎 Code quality review by PR Code Quality Reviewer · 76.4 AIC · ⌖ 7.08 AIC · ⊞ 5.2K
| const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] }); | ||
| runLogParser({ parseLog: mockParseLog, parserName: "Claude" }); | ||
| const infoCalls = mockCore.info.mock.calls.map(c => c[0]); | ||
| expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false); |
There was a problem hiding this comment.
Weak assertion — the escape-hatch outcome is unverified in this test, so a future regression in the safeOutputEntriesCount > 0 branch would not be caught here.
💡 Suggested improvement
The test currently only checks that "Claude log parsed successfully" is absent from core.info, which would remain true even if core.setFailed fired. Adding explicit positive/negative assertions ties this test to the behavior it is supposed to guard:
// already covered by test 1, but worth repeating here for completeness
expect(mockCore.setFailed).not.toHaveBeenCalled();
expect(mockCore.warning).toHaveBeenCalledWith(
expect.stringContaining("treating as non-fatal post-completion infrastructure failure")
);Test 1 does assert both of these for the same code path, so this is non-blocking — but the third test as written only validates the hygiene side-effect, not the escape-hatch logic it is co-located with.
| process.env.GH_AW_SAFE_OUTPUTS = safeOutputsFile; | ||
| // Parser returns markdown but no structured logEntries — simulates sandbox teardown | ||
| // race leaving agent-stdio.log unreadable after the agent completed successfully. | ||
| const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] }); |
There was a problem hiding this comment.
Missing coverage for logEntries: null — all three new test cases use logEntries: [], leaving the !logEntries branch of the guardrail untested.
💡 Detail
The condition at log_parser_bootstrap.cjs line 341 is:
if (parserName === "Claude" && (!logEntries || logEntries.length === 0))logEntries is initialized as null and stays null if the parser returns undefined/nothing for that field. The new tests only exercise the logEntries.length === 0 path. A test like:
const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false /* no logEntries key */ });...would exercise !logEntries (null) with safe outputs present. Both code paths lead to the same warning, so this is not a correctness gap — but the null path is darker and would be worth a targeted test.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd and /diagnose — approving with minor suggestions on test completeness and observability.
📋 Key Themes & Highlights
Key Themes
- Test assertion gaps: two of the three new tests are underspecified (missing complementary assertions that would catch regressions in the other branch of the new conditional)
- Test pollution:
process.envis mutated but not restored infinallyblocks — a pre-existing pattern, but worth correcting now that three more tests follow it - Observability: the new warning message is clear but missing the log file path, which would help correlate future occurrences with storage/teardown logs
Positive Highlights
- ✅ Correct root-cause analysis: the escape hatch targets the right signal (
safeOutputEntriesCount > 0means the agent genuinely completed) - ✅ Mirrors the existing MCP failure handling pattern — consistent with the codebase's established convention
- ✅
setFailedis preserved for the zero-safe-outputs case, maintaining the safety guarantee - ✅ Three new test cases covering all three new paths — good coverage of the main scenarios
- ✅
countSafeOutputEntriesalready handles empty/malformed JSONL correctly (returns 0), so the escape hatch is robust by construction
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 71 AIC · ⌖ 7.42 AIC · ⊞ 6.6K
| delete process.env.GH_AW_SAFE_OUTPUTS; | ||
| const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] }); | ||
| runLogParser({ parseLog: mockParseLog, parserName: "Claude" }); | ||
| expect(mockCore.setFailed).toHaveBeenCalledWith(`${ERR_CONFIG}: Claude execution failed: no structured log entries were produced. This usually indicates a startup or configuration error before tool execution.`); |
There was a problem hiding this comment.
[/tdd] The startup-failure test only asserts setFailed was called, but does not verify core.warning was NOT called — making it impossible to detect a regression that accidentally fires both.
💡 Suggested addition
Add this assertion alongside the existing one:
expect(mockCore.setFailed).toHaveBeenCalledWith(`${ERR_CONFIG}: Claude execution failed: ...`);
expect(mockCore.warning).not.toHaveBeenCalled(); // confirm non-fatal path was not takenWithout it, a bug that emits both warning and setFailed would pass this test.
@copilot please address this.
| const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] }); | ||
| runLogParser({ parseLog: mockParseLog, parserName: "Claude" }); | ||
| const infoCalls = mockCore.info.mock.calls.map(c => c[0]); | ||
| expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false); |
There was a problem hiding this comment.
[/tdd] Test 3 only checks the absence of the misleading info message — it is silent about whether core.warning was actually emitted. This leaves the hygiene fix incompletely specified.
💡 Suggested addition
Add the companion positive assertion to complete the specification:
const infoCalls = mockCore.info.mock.calls.map(c => c[0]);
expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false);
// Also verify warning WAS emitted on this path
expect(mockCore.warning).toHaveBeenCalledWith(expect.stringContaining("Claude produced no structured log entries"));Without this, the test passes even if the warning call were accidentally removed.
@copilot please address this.
| expect(mockCore.setFailed).not.toHaveBeenCalled(); | ||
| expect(mockCore.warning).toHaveBeenCalledWith("Claude produced no structured log entries, but agent completed with 1 safe output entry — treating as non-fatal post-completion infrastructure failure"); | ||
| } finally { | ||
| fs.unlinkSync(logFile); |
There was a problem hiding this comment.
[/tdd] The finally block removes the temp files but leaves process.env.GH_AW_AGENT_OUTPUT and process.env.GH_AW_SAFE_OUTPUTS set — this is a pre-existing pattern in the suite, but extending it to three more tests increases the likelihood of ordering-dependent failures.
💡 Suggested finally block
} finally {
fs.unlinkSync(logFile);
fs.unlinkSync(safeOutputsFile);
fs.rmdirSync(tmpDir);
delete process.env.GH_AW_AGENT_OUTPUT; // add
delete process.env.GH_AW_SAFE_OUTPUTS; // add
}Applying the same fix to all three new tests (and ideally the existing ones) would make the suite fully order-independent.
@copilot please address this.
| core.setFailed(`${ERR_CONFIG}: Claude execution failed: no structured log entries were produced. This usually indicates a startup or configuration error before tool execution.`); | ||
| if (safeOutputEntriesCount > 0) { | ||
| core.warning( | ||
| `Claude produced no structured log entries, but agent completed with ${safeOutputEntriesCount} safe output ${safeOutputEntriesCount === 1 ? "entry" : "entries"} — treating as non-fatal post-completion infrastructure failure` |
There was a problem hiding this comment.
[/diagnose] The warning is clear about what happened but omits the log file path, making it harder to correlate with storage/teardown logs in a post-incident investigation.
💡 Suggested enhancement
const logFilePath = process.env.GH_AW_AGENT_OUTPUT ?? "(unknown)";
core.warning(
`Claude produced no structured log entries (log: ${logFilePath}), but agent completed with ${safeOutputEntriesCount} safe output ${...} — treating as non-fatal post-completion infrastructure failure`
);The log path is already available in scope and has low overhead to include; it would meaningfully reduce triage time on future occurrences.
@copilot please address this.
|
@copilot please run the
|
…restore process.env - Third test: add setFailed not-called + warning called assertions - Tests 1 and 3: add delete process.env.GH_AW_SAFE_OUTPUTS in finally blocks - Two new tests cover logEntries: null path (no safe outputs -> setFailed, safe outputs -> warning) 28 tests pass (was 26) Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
|
Great work, The escape hatch logic ( This looks ready for review. ✅
|
The Claude
log_parser_bootstrapguardrail was callingsetFailedon emptylogEntrieswith no completion escape hatch — marking runs as failed even when the agent completed and emitted safe outputs. This was caused by a sandbox/firewall teardown race leavingagent-stdio.logunreadable for the parse step, intermittently hitting Avenger and Daily Rendering Scripts Verifier.Changes
log_parser_bootstrap.cjsGuardrail escape hatch: When
safeOutputEntriesCount > 0, downgradesetFailed→core.warningfor the Claude empty-logEntriesguard. Mirrors the existing MCP failure handling pattern.setFailedis preserved for the zero-safe-outputs case (true startup failure).Hygiene: Suppress the misleading
Claude log parsed successfullymessage on the empty-logEntriesfallback path — it was printing immediately beforesetFailed, creating a contradictory log sequence.log_parser_bootstrap.test.cjsThree new cases covering: empty
logEntries+ safe outputs → warning only; emptylogEntries+ no safe outputs →setFailed; emptylogEntries→ no "parsed successfully" incore.info.