Skip to content

fix(log_parser_bootstrap): guard Claude guardrail with safeOutputEntriesCount escape hatch#41886

Merged
pelikhan merged 3 commits into
mainfrom
copilot/aw-failures-fix-log-parser-bootstrap
Jun 27, 2026
Merged

fix(log_parser_bootstrap): guard Claude guardrail with safeOutputEntriesCount escape hatch#41886
pelikhan merged 3 commits into
mainfrom
copilot/aw-failures-fix-log-parser-bootstrap

Conversation

Copilot AI commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

The Claude log_parser_bootstrap guardrail was calling setFailed on empty logEntries with no completion escape hatch — marking runs as failed even when the agent completed and emitted safe outputs. This was caused by a sandbox/firewall teardown race leaving agent-stdio.log unreadable for the parse step, intermittently hitting Avenger and Daily Rendering Scripts Verifier.

Changes

log_parser_bootstrap.cjs

  • Guardrail escape hatch: When safeOutputEntriesCount > 0, downgrade setFailedcore.warning for the Claude empty-logEntries guard. Mirrors the existing MCP failure handling pattern. setFailed is preserved for the zero-safe-outputs case (true startup failure).

    if (parserName === "Claude" && (!logEntries || logEntries.length === 0)) {
      if (safeOutputEntriesCount > 0) {
        core.warning(`Claude produced no structured log entries, but agent completed with ${safeOutputEntriesCount} safe output … — treating as non-fatal post-completion infrastructure failure`);
      } else {
        core.setFailed(`${ERR_CONFIG}: Claude execution failed: no structured log entries …`);
      }
    }
  • Hygiene: Suppress the misleading Claude log parsed successfully message on the empty-logEntries fallback path — it was printing immediately before setFailed, creating a contradictory log sequence.

log_parser_bootstrap.test.cjs

Three new cases covering: empty logEntries + safe outputs → warning only; empty logEntries + no safe outputs → setFailed; empty logEntries → no "parsed successfully" in core.info.

…iesCount escape hatch

- [x] Fix Claude guardrail: gate `setFailed` on `safeOutputEntriesCount === 0`; downgrade to `core.warning` when safe outputs exist
- [x] Suppress misleading 'parsed successfully' message for Claude on empty-logEntries fallback path
- [x] Add 3 test cases: Claude + empty logEntries + safe outputs → warning (no setFailed); Claude + empty logEntries + no safe outputs → setFailed; Claude + empty logEntries → no 'parsed successfully' message

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix log_parser_bootstrap to handle empty logEntries fix(log_parser_bootstrap): guard Claude guardrail with safeOutputEntriesCount escape hatch Jun 27, 2026
Copilot AI requested a review from pelikhan June 27, 2026 14:00
@pelikhan pelikhan marked this pull request as ready for review June 27, 2026 14:38
Copilot AI review requested due to automatic review settings June 27, 2026 14:38
@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Design Decision Gate 🏗️ completed the design decision gate check.

No ADR enforcement needed: PR #41886 does not have the 'implementation' label and has 0 new lines of code in business logic directories.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

Test Quality Sentinel completed test quality analysis.

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅

@github-actions

github-actions Bot commented Jun 27, 2026

Copy link
Copy Markdown
Contributor

PR Code Quality Reviewer completed the code quality review.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the log_parser_bootstrap Claude guardrail so that an empty logEntries result does not fail the run when there is evidence the agent completed successfully (i.e., safe outputs were emitted). This prevents intermittent false-negative failures caused by post-completion infrastructure teardown races that can make the structured log unreadable.

Changes:

  • Downgrade the Claude “empty structured log entries” guardrail from core.setFailed to core.warning when safeOutputEntriesCount > 0, while preserving failure when there are zero safe outputs.
  • Suppress the misleading ${parserName} log parsed successfully info message on the Claude empty-logEntries fallback path.
  • Add/extend unit tests covering the new warning vs failure behavior and the suppressed success-message behavior.
Show a summary per file
File Description
actions/setup/js/log_parser_bootstrap.cjs Adds a safe-outputs-based escape hatch for the Claude empty-logEntries guardrail and avoids contradictory “parsed successfully” messaging on the fallback path.
actions/setup/js/log_parser_bootstrap.test.cjs Adds test cases asserting the new non-fatal warning behavior (when safe outputs exist), preserved failure behavior (when safe outputs do not exist), and the absence of the “parsed successfully” log for Claude on the fallback path.

Review details

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 2/2 changed files
  • Comments generated: 0
  • Review effort level: Low

@github-actions github-actions Bot mentioned this pull request Jun 27, 2026
@github-actions

Copy link
Copy Markdown
Contributor

🧪 Test Quality Sentinel Report

Test Quality Score: 90/100 — Excellent

Analyzed 3 test(s): 3 design, 0 implementation, 0 guideline violation(s).

📊 Metrics & Test Classification (3 tests analyzed)
Metric Value
New/modified tests analyzed 3
✅ Design tests (behavioral contracts) 3 (100%)
⚠️ Implementation tests (low value) 0 (0%)
Tests with error/edge cases 3 (100%)
Duplicate test clusters 0
Test inflation detected YES — 3.24:1 (55 test lines / 17 prod lines)
🚨 Coding-guideline violations 0
Test File Classification Issues Detected
should warn (non-fatal) when Claude has empty logEntries but safe outputs exist log_parser_bootstrap.test.cjs:~206 ✅ Design
should fail when Claude has empty logEntries and no safe outputs (startup failure) log_parser_bootstrap.test.cjs:~225 ✅ Design
should not print parsed successfully for Claude when logEntries is empty log_parser_bootstrap.test.cjs:~240 ✅ Design

Go: 0 (*_test.go); JavaScript: 3 (*.test.cjs, *.test.js). No other languages detected.

⚠️ Flagged Tests — Inflation Note (1 issue)

Test file inflation (non-blocking): log_parser_bootstrap.test.cjs added 55 lines vs 17 production lines (3.24:1 ratio, threshold 2:1). This is typical for a fix introducing branching logic — the 3 tests each cover a distinct observable outcome of the new conditional (warn vs. fail vs. suppress-info-message), so the extra test lines reflect thorough behavioral coverage rather than padding. No action required.

Mock usage note: mockCore mocks global.core (GitHub Actions external I/O — legitimate). mockParseLog = vi.fn() is a dependency-injection parameter mock — not an internal business-logic bypass. All assertions are on observable outputs (setFailed, warning, info calls).

Verdict

Check passed. 0% implementation tests (threshold: 30%). All 3 new tests verify behavioral contracts of the safeOutputEntriesCount escape hatch: the non-fatal warning path, the fatal-failure path, and the suppressed success message — full branch coverage of the new conditional logic.

🧪 Test quality analysis by Test Quality Sentinel · 34 AIC · ⌖ 12.3 AIC · ⊞ 8.4K ·

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Test Quality Sentinel: 90/100. Test quality is acceptable — 0% of new tests are implementation tests (threshold: 30%).

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking observations only — logic is sound

The escape-hatch design (safeOutputEntriesCount > 0warning instead of setFailed) correctly mirrors the existing MCP-failure pattern and addresses the described teardown-race root cause. No blocking issues found.

Findings summary

Test coverage gaps (both non-blocking):

  • Third test ("should not print...") only asserts the absence of the success message; it does not assert setFailed was not called or warning was called. Tests 1 and 2 provide that coverage, but the third test stands on weak ground as a standalone regression guard.
  • All three new tests use logEntries: [] exclusively; the !logEntries (null) branch of the guardrail is never exercised by the new suite. Both paths emit the same warning, so this is a coverage gap, not a correctness gap.

Design note (acknowledged):
The escape hatch assumes GH_AW_SAFE_OUTPUTS always reflects the current run. In the GitHub Actions environment this holds, and it is the same assumption made by the pre-existing MCP-failure escape hatch. Worth having a comment documenting that assumption explicitly so the intent is clear to future readers.

🔎 Code quality review by PR Code Quality Reviewer · 76.4 AIC · ⌖ 7.08 AIC · ⊞ 5.2K

const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] });
runLogParser({ parseLog: mockParseLog, parserName: "Claude" });
const infoCalls = mockCore.info.mock.calls.map(c => c[0]);
expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak assertion — the escape-hatch outcome is unverified in this test, so a future regression in the safeOutputEntriesCount > 0 branch would not be caught here.

💡 Suggested improvement

The test currently only checks that "Claude log parsed successfully" is absent from core.info, which would remain true even if core.setFailed fired. Adding explicit positive/negative assertions ties this test to the behavior it is supposed to guard:

// already covered by test 1, but worth repeating here for completeness
expect(mockCore.setFailed).not.toHaveBeenCalled();
expect(mockCore.warning).toHaveBeenCalledWith(
  expect.stringContaining("treating as non-fatal post-completion infrastructure failure")
);

Test 1 does assert both of these for the same code path, so this is non-blocking — but the third test as written only validates the hygiene side-effect, not the escape-hatch logic it is co-located with.

process.env.GH_AW_SAFE_OUTPUTS = safeOutputsFile;
// Parser returns markdown but no structured logEntries — simulates sandbox teardown
// race leaving agent-stdio.log unreadable after the agent completed successfully.
const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing coverage for logEntries: null — all three new test cases use logEntries: [], leaving the !logEntries branch of the guardrail untested.

💡 Detail

The condition at log_parser_bootstrap.cjs line 341 is:

if (parserName === "Claude" && (!logEntries || logEntries.length === 0))

logEntries is initialized as null and stays null if the parser returns undefined/nothing for that field. The new tests only exercise the logEntries.length === 0 path. A test like:

const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false /* no logEntries key */ });

...would exercise !logEntries (null) with safe outputs present. Both code paths lead to the same warning, so this is not a correctness gap — but the null path is darker and would be worth a targeted test.

@github-actions github-actions Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skills-Based Review 🧠

Applied /tdd and /diagnose — approving with minor suggestions on test completeness and observability.

📋 Key Themes & Highlights

Key Themes

  • Test assertion gaps: two of the three new tests are underspecified (missing complementary assertions that would catch regressions in the other branch of the new conditional)
  • Test pollution: process.env is mutated but not restored in finally blocks — a pre-existing pattern, but worth correcting now that three more tests follow it
  • Observability: the new warning message is clear but missing the log file path, which would help correlate future occurrences with storage/teardown logs

Positive Highlights

  • ✅ Correct root-cause analysis: the escape hatch targets the right signal (safeOutputEntriesCount > 0 means the agent genuinely completed)
  • ✅ Mirrors the existing MCP failure handling pattern — consistent with the codebase's established convention
  • setFailed is preserved for the zero-safe-outputs case, maintaining the safety guarantee
  • ✅ Three new test cases covering all three new paths — good coverage of the main scenarios
  • countSafeOutputEntries already handles empty/malformed JSONL correctly (returns 0), so the escape hatch is robust by construction

🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 71 AIC · ⌖ 7.42 AIC · ⊞ 6.6K

delete process.env.GH_AW_SAFE_OUTPUTS;
const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] });
runLogParser({ parseLog: mockParseLog, parserName: "Claude" });
expect(mockCore.setFailed).toHaveBeenCalledWith(`${ERR_CONFIG}: Claude execution failed: no structured log entries were produced. This usually indicates a startup or configuration error before tool execution.`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The startup-failure test only asserts setFailed was called, but does not verify core.warning was NOT called — making it impossible to detect a regression that accidentally fires both.

💡 Suggested addition

Add this assertion alongside the existing one:

expect(mockCore.setFailed).toHaveBeenCalledWith(`${ERR_CONFIG}: Claude execution failed: ...`);
expect(mockCore.warning).not.toHaveBeenCalled(); // confirm non-fatal path was not taken

Without it, a bug that emits both warning and setFailed would pass this test.

@copilot please address this.

const mockParseLog = vi.fn().mockReturnValue({ markdown: "## Result\n", mcpFailures: [], maxTurnsHit: false, logEntries: [] });
runLogParser({ parseLog: mockParseLog, parserName: "Claude" });
const infoCalls = mockCore.info.mock.calls.map(c => c[0]);
expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] Test 3 only checks the absence of the misleading info message — it is silent about whether core.warning was actually emitted. This leaves the hygiene fix incompletely specified.

💡 Suggested addition

Add the companion positive assertion to complete the specification:

const infoCalls = mockCore.info.mock.calls.map(c => c[0]);
expect(infoCalls.some(msg => msg.includes("Claude log parsed successfully"))).toBe(false);
// Also verify warning WAS emitted on this path
expect(mockCore.warning).toHaveBeenCalledWith(expect.stringContaining("Claude produced no structured log entries"));

Without this, the test passes even if the warning call were accidentally removed.

@copilot please address this.

expect(mockCore.setFailed).not.toHaveBeenCalled();
expect(mockCore.warning).toHaveBeenCalledWith("Claude produced no structured log entries, but agent completed with 1 safe output entry — treating as non-fatal post-completion infrastructure failure");
} finally {
fs.unlinkSync(logFile);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/tdd] The finally block removes the temp files but leaves process.env.GH_AW_AGENT_OUTPUT and process.env.GH_AW_SAFE_OUTPUTS set — this is a pre-existing pattern in the suite, but extending it to three more tests increases the likelihood of ordering-dependent failures.

💡 Suggested finally block
} finally {
  fs.unlinkSync(logFile);
  fs.unlinkSync(safeOutputsFile);
  fs.rmdirSync(tmpDir);
  delete process.env.GH_AW_AGENT_OUTPUT;   // add
  delete process.env.GH_AW_SAFE_OUTPUTS;   // add
}

Applying the same fix to all three new tests (and ideally the existing ones) would make the suite fully order-independent.

@copilot please address this.

core.setFailed(`${ERR_CONFIG}: Claude execution failed: no structured log entries were produced. This usually indicates a startup or configuration error before tool execution.`);
if (safeOutputEntriesCount > 0) {
core.warning(
`Claude produced no structured log entries, but agent completed with ${safeOutputEntriesCount} safe output ${safeOutputEntriesCount === 1 ? "entry" : "entries"} — treating as non-fatal post-completion infrastructure failure`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[/diagnose] The warning is clear about what happened but omits the log file path, making it harder to correlate with storage/teardown logs in a post-incident investigation.

💡 Suggested enhancement
const logFilePath = process.env.GH_AW_AGENT_OUTPUT ?? "(unknown)";
core.warning(
  `Claude produced no structured log entries (log: ${logFilePath}), but agent completed with ${safeOutputEntriesCount} safe output ${...}  treating as non-fatal post-completion infrastructure failure`
);

The log path is already available in scope and has low overhead to include; it would meaningfully reduce triage time on future occurrences.

@copilot please address this.

@gh-aw-bot

Copy link
Copy Markdown
Collaborator

@copilot please run the pr-finisher skill, tighten up the missing assertions in the third test, cover the logEntries: null path, and restore process.env in cleanup if that is the intended pattern.

Generated by 👨‍🍳 PR Sous Chef · 69.1 AIC · ⌖ 0.98 AIC · ⊞ 17.1K ·

…restore process.env

- Third test: add setFailed not-called + warning called assertions
- Tests 1 and 3: add delete process.env.GH_AW_SAFE_OUTPUTS in finally blocks
- Two new tests cover logEntries: null path (no safe outputs -> setFailed, safe outputs -> warning)

28 tests pass (was 26)

Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Copilot AI requested a review from gh-aw-bot June 27, 2026 15:50
@pelikhan pelikhan merged commit c7c642f into main Jun 27, 2026
14 checks passed
@pelikhan pelikhan deleted the copilot/aw-failures-fix-log-parser-bootstrap branch June 27, 2026 17:19
@github-actions

Copy link
Copy Markdown
Contributor

Great work, @app/copilot-swe-agent 🎉 — this is a clean, targeted fix for the Claude guardrail false-failure on sandbox/firewall teardown races.

The escape hatch logic (safeOutputEntriesCount > 0 → downgrade setFailedcore.warning) mirrors the existing MCP failure handling pattern, the suppression of the contradictory "parsed successfully" log message is a nice hygiene win, and the test suite covers all the meaningful branches: empty logEntries with and without safe outputs, null logEntries for both cases, and the no-spurious-success-message assertion. The PR is focused on exactly one bug and nothing else.

This looks ready for review. ✅

Generated by ✅ Contribution Check · 360.5 AIC · ⌖ 10.1 AIC · ⊞ 6K ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants