Engineering2026-04-18 · 9 min read

Deduping reviewer findings without losing signal

How we sort by severity, dedupe by (file, line, lowercased title), and only post inline comments on lines that map to a unified-diff position. Plus: what we threw out and why.

By Quorum team

Three reviewers in parallel produce three lists of findings. Naively merging them gets you duplicates, inconsistent severities, and inline comments on lines GitHub will refuse to attach. The aggregator does the boring middle work that turns three model outputs into one usable review.

The dedup key

We dedupe by (file, line, lowercased title). We tried fancier things — embedding similarity, suggestion overlap, semantic fingerprints — and they all lost more than they gained. Two reviewers flagging the same line with similar phrasing is the single strongest cross-reviewer signal we have. We do not want to merge it away by accident.

function dedupKey(f: Finding) {
  return [f.file, f.line, f.title.toLowerCase()].join('::');
}

When two findings collide on the key, we keep the one with the higher severity, then the higher confidence on tiebreak. We attribute the kept finding to whichever reviewer sent it; the dropped reviewer's id is logged but not posted. Maintainers do not need to know that two AIs agreed; they need to read one comment.

Severity rank, not severity strings

Sorting by the string severity is a footgun. We map to integers (critical=4, high=3, medium=2, low=1), sort descending, and use confidence as the tiebreak. The first 10 findings after sort and dedup are the inline comments; the rest survive in the database as run history.

The diff position trap

GitHub will only attach an inline comment to a line that exists as a position in the unified diff. That is not "any line in the file" — it is specifically the line numbers GitHub assigns to + and context lines inside @@ hunks. A reviewer can confidently flag line 412 of a 600-line file, and if line 412 was not in the diff, the comment will fail to post.

We solved this by parsing the patch in mapPatchLineToPosition and walking the @@ hunks ourselves. Every finding gets the position it would map to; the ones that do not map are logged in review_findings but stripped from the inline post. The summary at the top of the review still mentions the count so nothing is silently dropped.

What we threw out

Embedding-based dedup. Cost more, deduped less, occasionally collapsed two genuinely different findings.
Per-reviewer caps before merge. Made the aggregator unstable when one reviewer was unusually quiet — better to cap once, after dedup.
Asking the model to dedupe in a final pass. Worked sometimes, hallucinated finding text other times. Boring code beat it every time.

The aggregator is now under 200 lines and shipping for months without changes. That is mostly because the dedup key is dumb and the severity rank is an integer. The smartest version of this pipeline lives in the reviewers, not in the post-processing.

← Previous

Why a panel of three beats one big reviewer

A confidence floor is the cheapest noise filter you have