<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: NYCHMPAI</title><link>https://news.ycombinator.com/user?id=NYCHMPAI</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 02 Jul 2026 21:36:14 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=NYCHMPAI" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by NYCHMPAI in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"]]></title><description><![CDATA[
<p>This is a great use case for embeddings. Code deduplication across distant modules is notoriously hard for traditional AST-based tools.<p>How do you handle chunking and parsing for different languages to make sure the embeddings capture semantic meaning effectively? For instance, do you chunk by functions/classes, or use a fixed token window? If a function is too long or too short, it can drastically skew the embedding similarity.</p>
]]></description><pubDate>Thu, 02 Jul 2026 15:26:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=48763060</link><dc:creator>NYCHMPAI</dc:creator><comments>https://news.ycombinator.com/item?id=48763060</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48763060</guid></item></channel></rss>