Code review · testing · debug pass #3

Closed
opened 2026-05-19 04:42:46 +00:00 by foravo_admin · 0 comments
Owner

Imported from GitHub issue M00C1FER/mesh-review#10.

Source: https://github.com/M00C1FER/mesh-review/issues/10
Original author: @M00C1FER
Original state: closed


Code review · testing · debug pass

Vendor-neutral multi-LLM PR review + summary toolkit. Two subcommands (review,
summary) sharing one CLI registry. Sigma falsification gate dropped false positives
in the demo.

Scope of this pass

  1. Correctness review

    • The Sigma falsification gate (sigma_gate(falsifier=...)): trace the path from
      consensus cluster → falsifier call → decision-to-drop. Edge cases:
      • All falsifiers timeout: does it default to keep-finding or drop-finding?
      • Falsifier returns malformed JSON: does it error gracefully?
      • confidence > 1.0 or < 0.0: bounds-checked?
    • Consensus clustering on (file, severity, line ±2): what happens when severity
      differs between CLIs for the same line? Verify cluster merge semantics.
    • Summary aggregation: when one CLI fails, does the rest still produce a usable
      summary, or does the whole thing abort?
  2. Test coverage

    • Run pytest --cov and identify uncovered branches.
    • Add tests for: malformed CLI output (truncated JSON, invalid utf-8), CLI
      binary-not-found errors, registry-CLI-list-empty edge case.
    • Add a real-world golden test using the example examples/broken-repo/ if it
      exists — assert the gate eliminates a known false-positive without dropping
      a known true-positive.
  3. Debug edge cases

    • The --list-clis flag: does it work when one CLI in the registry is missing?
    • GitHub Action invocation path: verify the action file in .github/workflows/ is
      well-formed and runs end-to-end on a sample PR.
    • Concurrent invocation: two mesh-review instances against the same PR — do they
      race or deduplicate?
  4. Documentation polish (low priority)

    • The README's status note says "v0.1 default falsifier is no-op stub" — confirm
      this is still accurate; if a real default has landed, update.

Deliverable

PR(s) per logical change. Conventional commit format. Bundle no-op-grade test additions
into a single PR; correctness fixes get their own.

Style + scope constraints

  • Python 3.10/3.11/3.12 (matrix in CI). No 3.9 fallbacks.
  • Don't change the public sigma_gate / build_consensus signatures without flagging.
  • Keep the registry vendor-neutral: don't hard-code Anthropic/OpenAI/Ollama names in
    core code paths. Examples are fine in docs.

Project state: solo author, pre-1.0, MIT.

Imported from GitHub issue `M00C1FER/mesh-review#10`. Source: https://github.com/M00C1FER/mesh-review/issues/10 Original author: @M00C1FER Original state: closed <!-- foravo:github-issue:M00C1FER/mesh-review#10 --> --- ## Code review · testing · debug pass Vendor-neutral multi-LLM PR review + summary toolkit. Two subcommands (`review`, `summary`) sharing one CLI registry. Sigma falsification gate dropped false positives in the demo. ### Scope of this pass 1. **Correctness review** - The Sigma falsification gate (`sigma_gate(falsifier=...)`): trace the path from consensus cluster → falsifier call → decision-to-drop. Edge cases: - All falsifiers timeout: does it default to keep-finding or drop-finding? - Falsifier returns malformed JSON: does it error gracefully? - `confidence > 1.0` or `< 0.0`: bounds-checked? - Consensus clustering on `(file, severity, line ±2)`: what happens when severity differs between CLIs for the same line? Verify cluster merge semantics. - Summary aggregation: when one CLI fails, does the rest still produce a usable summary, or does the whole thing abort? 2. **Test coverage** - Run `pytest --cov` and identify uncovered branches. - Add tests for: malformed CLI output (truncated JSON, invalid utf-8), CLI binary-not-found errors, registry-CLI-list-empty edge case. - Add a real-world golden test using the example `examples/broken-repo/` if it exists — assert the gate eliminates a known false-positive without dropping a known true-positive. 3. **Debug edge cases** - The `--list-clis` flag: does it work when one CLI in the registry is missing? - GitHub Action invocation path: verify the action file in `.github/workflows/` is well-formed and runs end-to-end on a sample PR. - Concurrent invocation: two `mesh-review` instances against the same PR — do they race or deduplicate? 4. **Documentation polish** (low priority) - The README's status note says "v0.1 default falsifier is no-op stub" — confirm this is still accurate; if a real default has landed, update. ### Deliverable PR(s) per logical change. Conventional commit format. Bundle no-op-grade test additions into a single PR; correctness fixes get their own. ### Style + scope constraints - Python 3.10/3.11/3.12 (matrix in CI). No 3.9 fallbacks. - Don't change the public `sigma_gate` / `build_consensus` signatures without flagging. - Keep the registry vendor-neutral: don't hard-code Anthropic/OpenAI/Ollama names in core code paths. Examples are fine in docs. Project state: solo author, pre-1.0, MIT.
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
foravo/mesh-review-comment-proof-20260519044241#3
No description provided.