foravo/mesh-review-comment-proof-20260519044241

No description

Python 91.7%
Shell 8.3%

Find a file

Syth ec36975c66 Some checks failed CI / test (ubuntu-24.04, 3.10) (push) Has been cancelled Details CI / test (ubuntu-24.04, 3.11) (push) Has been cancelled Details CI / lint (push) Has been cancelled Details CI / test (macos-latest, 3.10) (push) Has been cancelled Details CI / test (macos-latest, 3.11) (push) Has been cancelled Details CI / test (macos-latest, 3.12) (push) Has been cancelled Details CI / test (ubuntu-22.04, 3.10) (push) Has been cancelled Details CI / test (ubuntu-22.04, 3.11) (push) Has been cancelled Details CI / test (ubuntu-22.04, 3.12) (push) Has been cancelled Details CI / test (ubuntu-24.04, 3.12) (push) Has been cancelled Details CI / test (ubuntu-latest, 3.10) (push) Has been cancelled Details CI / test (ubuntu-latest, 3.11) (push) Has been cancelled Details CI / test (ubuntu-latest, 3.12) (push) Has been cancelled Details CI / test-debian (push) Has been cancelled Details CI / test-alpine (push) Has been cancelled Details docs: mark as portfolio demonstration (not active development) Per portfolio strategy review 2026-05-09: empirical adoption signal across all M00C1FER repos is zero (forks=0, watchers=0, external issues=0). The active development focus shifts to mcp-citation-research (MCP Registry submission + deployment + blog). This repo remains accessible for historical reference but is no longer maintained. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>		2026-05-09 07:48:07 -07:00
.github	docs(.github): add copilot-instructions.md — coding rules + scope guards	2026-05-08 21:11:43 -07:00
examples/broken-repo	feat: confidence clamping, edge-case tests, golden test, broken-repo example (#11 )	2026-05-02 06:16:28 -07:00
scripts	Audit cycle: CI matrix expansion, Hypothesis property tests, WSL detection, Termux support (#15 )	2026-05-02 16:51:34 -07:00
src/mesh_review	Audit cycle: CI matrix expansion, Hypothesis property tests, WSL detection, Termux support (#15 )	2026-05-02 16:51:34 -07:00
tests	Audit cycle: CI matrix expansion, Hypothesis property tests, WSL detection, Termux support (#15 )	2026-05-02 16:51:34 -07:00
.gitattributes	feat(.gitattributes): force LF on shell+code files for cross-platform installers	2026-05-09 06:53:54 -07:00
.gitignore	chore(.gitignore): standardize security patterns	2026-05-09 06:54:09 -07:00
.pre-commit-hooks.yaml	Senior-dev review pass: reference-project analysis + 5 targeted improvements (#13 )	2026-05-02 13:05:57 -07:00
install.sh	fix: 7 findings from automated code-review audit (#1 )	2026-05-01 17:02:05 -07:00
LICENSE	feat: initial release v0.1.0 — merged from triple-review + pr-summary-mesh	2026-04-29 22:06:58 -07:00
pyproject.toml	Audit cycle: CI matrix expansion, Hypothesis property tests, WSL detection, Termux support (#15 )	2026-05-02 16:51:34 -07:00
README.md	docs: mark as portfolio demonstration (not active development)	2026-05-09 07:48:07 -07:00
REFERENCES.md	Senior-dev review pass: reference-project analysis + 5 targeted improvements (#13 )	2026-05-02 13:05:57 -07:00

README.md

⚠ Status: portfolio demonstration

This repo is a learning / demonstration project, not an actively-maintained product. The code works for what it shows — but it isn't intended for production adoption and won't receive ongoing development. The active development focus is M00C1FER/mcp-citation-research (MCP research server with a hard confidence-gate refusal contract).

mesh-review

Modular multi-LLM PR review + summary toolkit. One CLI, one config, two subcommands: review (consensus + adversarial Sigma falsification gate) and summary (vendor-neutral PR-description summarizer). Register any command-line LLM once, get both capabilities. CI-ready GitHub Action.

Why this exists

Two niches that need to compose, not duplicate:

PR review — surface real issues, not noise. Multi-LLM consensus catches more bugs than any single model, but raw consensus amplifies shared training-data biases (every model says "MD5 is bad", which is true for password hashing and wrong for non-crypto checksums). The Sigma falsification gate runs each finding past every CLI as an adversary: if any reviewer can credibly defend the code above a confidence threshold, the finding gets dropped before it ships as a PR comment. Drops false positives without silencing real ones.
PR summary — make every PR description say what changed and why, in a structured shape reviewers actually read. Multi-LLM aggregation produces richer summaries than single-vendor tools (GitHub Copilot, CodeRabbit) without locking you to one provider.

Both share the same vendor-neutral triple-review.yaml registry — register Claude, Gemini, Copilot, Ollama, Mistral, your own SDK shim, etc. once and get both. The vendor names you see in this README's examples are illustrations, not requirements; the orchestrator works with any number of CLIs ≥ 1.

Quick start

pip install git+https://github.com/M00C1FER/mesh-review.git

# Issue gate — catch real bugs, drop false positives
mesh-review review --falsify path/to/auth.py

# Narrative layer — generate a structured PR summary
mesh-review summary --pr owner/repo#42 --mode merge

# Inspect resolved registry before any run
mesh-review review --list-clis dummy.py
mesh-review summary --list-clis

Two subcommands, one config

mesh-review.yaml:

clis:                         # alias: summarizers (for the summary subcommand)
  - { name: claude,  cmd: [claude, -p, --output-format=text] }
  - { name: gemini,  cmd: [gemini, -p] }
  - { name: copilot, cmd: [copilot, -p] }
  - { name: ollama,  cmd: [ollama, run, qwen2.5-coder], timeout_s: 600 }
  - { name: my-rev,  cmd: [./scripts/review.sh] }

mesh-review review --config mesh-review.yaml --falsify file.py
mesh-review summary --config mesh-review.yaml --pr repo#42 --mode merge

The same registry powers both, so configuring once gives you complete PR-comment surface coverage.

What is the Sigma falsification gate?

After consensus clusters group findings by (file, severity, line ±2), the gate asks each registered CLI to argue against each finding (falsified: bool, confidence: 0.0-1.0, rationale). If any CLI returns falsified: true with confidence ≥ threshold (default 0.7), the finding is dropped from PR comments.

Why this matters: multi-LLM consensus systems have a quiet failure mode — shared training-data bias. If every model's training data says "X is bad," they'll all flag X regardless of context. The falsification round forces each finding to survive an adversarial challenge before it gets a comment. Empirically on the demo (examples/broken-repo/auth.py, 5 deliberate issues), the gate eliminates 1–2 false positives per run without dropping any true positives.

Status of v0.1: the falsifier-function plumbing is complete, but the default falsifier is a no-op stub. Wire a real LLM SDK call (any vendor — OpenAI, Ollama, your own) into sigma_gate(falsifier=...) for production use. Programmatic API:

from mesh_review import sigma_gate, build_consensus

def my_falsifier(cli_name: str, prompt: str) -> dict:
    # call your SDK of choice; return {falsified, confidence, rationale}
    ...

gate = sigma_gate(consensus, falsifier=my_falsifier, threshold=0.7)

Two summary modes

--mode merge (default): structural section-by-section concat with per-CLI attribution. Reviewers can trace any line back to the model that wrote it.
--mode vote: pick the single SummaryDoc with the most filled-in sections (length-based proxy for thoroughness). Useful when you want one voice instead of merged perspectives.

Comparison

	Multi-LLM consensus	Adversarial gate	PR summarizer	GH Action	Vendor-neutral
GitHub Copilot built-in PR review/summary	❌	❌	✅	✅	❌ (Copilot only)
CodeRabbit (SaaS)	partial	❌	✅	✅	❌ (their model only)
Mozilla `Star Chamber`	✅	❌	❌	partial	partial
`mataanin/multi-llm`	✅	❌	❌	❌	✅
`multi-llm-consensus` (PyPI)	✅	❌	❌	❌	✅
`mesh-review`	✅	✅	✅	✅	✅

The Sigma falsification gate + vendor-neutral registry + unified review-and-summary surface is the differentiator. Most competitors do consensus or summary; few do both with the same config; none ships an adversarial round on top.

Programmatic API

from mesh_review import (
    ReviewConfig, run_review, build_consensus, sigma_gate,
    SummaryConfig, run_summary, merge_structural,
)

# Review
review_cfgs = [ReviewConfig(cli="claude", cmd=["claude", "-p"])]
findings = run_review("file.py", configs=review_cfgs)
consensus = build_consensus(findings)
gate = sigma_gate(consensus)

# Summary
summary_cfgs = [SummaryConfig(cli="claude", cmd=["claude", "-p"])]
docs = run_summary(open("changes.patch").read(), configs=summary_cfgs)
merged = merge_structural(docs)

Cross-platform

OS	Status
Debian 12/13 / Ubuntu 22.04+	✅ CI-tested (matrix job)
Alpine Linux (musl libc)	✅ CI-tested (container job)
macOS 13+	✅ CI-tested (macos-latest runner)
WSL2 (Ubuntu base)	✅ works; see WSL note below
Fedora / Arch / openSUSE	✅ pure-Python; expected to work
Windows native	⚠️ subprocess dispatch needs `*.exe` versions of CLIs on PATH; WSL2 recommended
Termux (Android)	✅ see Termux section below

WSL note

When running mesh-review locally inside WSL2 (not from a GitHub Actions runner), the CLI emits an advisory warning if it detects a Microsoft kernel (uname -r contains microsoft). This is informational only: everything works, but Windows .exe wrappers on the WSL PATH may behave differently than native Linux binaries. Silence it with:

export MESH_REVIEW_NO_WSL_WARN=1

Termux

Install on Android via Termux (F-Droid build recommended):

curl -fsSL https://raw.githubusercontent.com/M00C1FER/mesh-review/main/scripts/install-termux.sh | bash

Or manually:

pkg install python git
pip install git+https://github.com/M00C1FER/mesh-review.git

Local falsification via Ollama on the same WiFi network:

from mesh_review.review.falsify_sdk import make_openai_falsifier
from mesh_review import sigma_gate, build_consensus

falsifier = make_openai_falsifier(
    model="qwen2.5-coder:7b",
    base_url="http://<ollama-host>:11434/v1",
    api_key="ollama",   # Ollama ignores the key
)
gate = sigma_gate(consensus, falsifier=falsifier)

The OpenAI-compatible adapter (falsify_sdk) accepts any base_url, so any Ollama instance — local or on LAN — works as a cost-free, privacy-preserving falsifier on Android.

Testing

pip install -e .[dev]
pytest

117 tests across config / consensus / summary / falsification / golden / property-based:

13 YAML + inline config parsing tests (review)
50 consensus-building, Sigma-gate, and falsification tests
35 summary-aggregation, diff-provider, and config tests
4 golden-test assertions on examples/broken-repo/auth.py
9 OpenAI-SDK falsifier adapter tests
6 Hypothesis property tests covering sigma_gate threshold edge cases

Roadmap

v0.2: ship a real LLM-SDK-based default falsifier (vendor TBD; replaces the no-op stub)
v0.3: GitHub Action wires gh pr edit --body-file for true PR-description updates
v0.4: per-file walkthrough comments for files with substantial diffs
v0.5: cross-rated vote mode (CLIs grade each other's summaries)

License

MIT.

README.md Unescape Escape