<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: nttylock</title><link>https://news.ycombinator.com/user?id=nttylock</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 22:25:41 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=nttylock" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by nttylock in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"]]></title><description><![CDATA[
<p>The false positive rate you're describing matches what we see running similarity detection on generated text instead of code: cosine similarity alone flags a lot of same-topic pairs that aren't actually duplicates. What helped was combining the embedding score with a structural signal (AST edit distance for code, overlapping headings and citations for text) so no single metric makes the call. Also worth surfacing the raw similarity score in the CLI output instead of just a binary duplicate flag, since people will want to tune the threshold per codebase.</p>
]]></description><pubDate>Thu, 02 Jul 2026 19:36:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48766317</link><dc:creator>nttylock</dc:creator><comments>https://news.ycombinator.com/item?id=48766317</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48766317</guid></item></channel></rss>