<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: metawake</title><link>https://news.ycombinator.com/user?id=metawake</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 05 Jul 2026 12:52:53 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=metawake" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Show HN: Ragprobe – measure RAG domain difficulty before deploying,no embeddings]]></title><description><![CDATA[
<p>Article URL: <a href="https://pypi.org/project/ragprobe/0.1.0/">https://pypi.org/project/ragprobe/0.1.0/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47501960">https://news.ycombinator.com/item?id=47501960</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 24 Mar 2026 13:01:29 +0000</pubDate><link>https://pypi.org/project/ragprobe/0.1.0/</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=47501960</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47501960</guid></item><item><title><![CDATA[PageIndex (19k stars) scored 44% on legal docs. Same as vector RAG]]></title><description><![CDATA[
<p>Article URL: <a href="https://medium.com/@TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90">https://medium.com/@TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47251342">https://news.ycombinator.com/item?id=47251342</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 04 Mar 2026 18:03:11 +0000</pubDate><link>https://medium.com/@TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=47251342</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47251342</guid></item><item><title><![CDATA[New comment by metawake in "Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it"]]></title><description><![CDATA[
<p>Great suggestion!! this is exactly the right methodology for establishing confidence intervals.<p>I've added this to the roadmap as `--bootstrap N`:<p><pre><code>    ragtune simulate --queries queries.json --bootstrap 5
    
    # Output:
    # Recall@5:  0.664 ± 0.012 (n=5)
    # MRR:       0.533 ± 0.008 (n=5)
</code></pre>
The implementation would sample N random subsets from the query set (or corpus), run each independently, and report mean ± std.<p>This also enables detecting real regressions vs noise eg "Recall dropped 3% ± 0.8%" is actionable, "dropped 3%" alone isn't.<p>Will ship this during   next few weeks. Thanks for the push toward more rigorous methodology, this is exactly what's missing from most RAG benchmarks.</p>
]]></description><pubDate>Wed, 21 Jan 2026 18:01:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46709124</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46709124</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46709124</guid></item><item><title><![CDATA[New comment by metawake in "Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it"]]></title><description><![CDATA[
<p>Author here. Built RagTune to stop guessing at RAG configs.<p>Surprising findings:<p>1. On legal text (CaseHOLD), 1024 chunks scored WORST (0.618). 
   The "small" 256 chunks won (0.664). 7% swing.<p>2. On Wikipedia text? All chunk sizes hit ~99%. No difference.<p>3. Plot twist: At 5K docs, optimal chunk size FLIPPED from 256→1024.
   Scale changes everything.<p>Code is MIT: github.com/metawake/ragtune<p>Happy to discuss methodology.</p>
]]></description><pubDate>Wed, 21 Jan 2026 14:17:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=46706031</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46706031</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46706031</guid></item><item><title><![CDATA[Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it]]></title><description><![CDATA[
<p>Article URL: <a href="https://medium.com/@TheWake/i-built-a-rag-tuning-tool-and-discovered-intuition-fails-on-legal-text-9744be9a4bc5">https://medium.com/@TheWake/i-built-a-rag-tuning-tool-and-discovered-intuition-fails-on-legal-text-9744be9a4bc5</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46706025">https://news.ycombinator.com/item?id=46706025</a></p>
<p>Points: 2</p>
<p># Comments: 3</p>
]]></description><pubDate>Wed, 21 Jan 2026 14:17:35 +0000</pubDate><link>https://medium.com/@TheWake/i-built-a-rag-tuning-tool-and-discovered-intuition-fails-on-legal-text-9744be9a4bc5</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46706025</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46706025</guid></item><item><title><![CDATA[New comment by metawake in "Show HN: RagTune – EXPLAIN ANALYZE for your RAG retrieval layer"]]></title><description><![CDATA[
<p>Thanks! To answer your questions:<p>*Backends:* Currently supports Qdrant, pgvector, Weaviate, Chroma, and Pinecone. Adding more is straightforward since it's just implementing a Store interface.
Let me know if I missed some good backend!<p>*Relevance scoring:* No LLM-as-judge — that's intentional. RagTune focuses on retrieval-layer metrics only:<p>- Vector similarity scores (what the DB returns)
- Recall@K, MRR against your golden set
- Score distribution diagnostics<p>The philosophy is: debug retrieval separately from generation. If your retrieval is broken, no amount of prompt engineering will fix it.<p>For chunk size/overlap optimization — exactly the use case! `ragtune compare --chunk-sizes 256,512,1024` lets you see the impact directly.<p>Happy to hear feedback if you try it!</p>
]]></description><pubDate>Thu, 15 Jan 2026 17:13:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=46635777</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46635777</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46635777</guid></item><item><title><![CDATA[Show HN: RagTune – EXPLAIN ANALYZE for your RAG retrieval layer]]></title><description><![CDATA[
<p>CLI tool to debug and benchmark RAG retrieval without LLM calls.<p>- `ragtune explain "query"` → see what was retrieved with scores
- `ragtune simulate` → batch eval with recall/MRR metrics
- `ragtune compare` → compare embedders or chunk sizes
- CI/CD mode for quality gates<p>Works with Qdrant, pgvector, Weaviate, Chroma, Pinecone.<p>Built because I kept guessing why retrieval was bad. Now I can see exactly what's happening.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46635422">https://news.ycombinator.com/item?id=46635422</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 15 Jan 2026 16:53:31 +0000</pubDate><link>https://github.com/metawake/ragtune</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46635422</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46635422</guid></item><item><title><![CDATA[New comment by metawake in "Ask HN: How are you doing RAG locally?"]]></title><description><![CDATA[
<p>I am using a vector DB using Docker image.
And for debugging and benchmarking local RAG retrieval, I've been building 
a CLI tool that shows what's actually being retrieved:<p><pre><code>  ragtune explain "your query" --collection prod
</code></pre>
Shows scores, sources, and diagnostics. Helps catch when your chunking 
or embeddings are silently failing or you need numeric estimations to base your judgements on.<p>Open source: <a href="https://github.com/metawake/ragtune" rel="nofollow">https://github.com/metawake/ragtune</a></p>
]]></description><pubDate>Thu, 15 Jan 2026 12:56:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46631761</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=46631761</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46631761</guid></item><item><title><![CDATA[New comment by metawake in "The Policy Puppetry Attack: Novel bypass for major LLMs"]]></title><description><![CDATA[
<p>I made a small project (<a href="https://github.com/metawake/puppetry-detector">https://github.com/metawake/puppetry-detector</a>) to detect this type of LLM policy manipulation.
It's an early idea using a set of regexp patterns (for speed) and a couple of phases of text analysis.
I am curious if it's any useful, I created integration with Rebuff (loss security suite) just in case.</p>
]]></description><pubDate>Mon, 28 Apr 2025 14:58:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=43822218</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=43822218</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43822218</guid></item><item><title><![CDATA[New comment by metawake in "Exploring spaCy-based prompt compression for LLMs – thoughts welcome"]]></title><description><![CDATA[
<p>Hi HN,<p>I’ve been exploring whether prompt compression — done before sending input to LLMs — can help cut down on token usage and cost without losing key meaning.<p>Instead of using a neural model, I wrote a small open-source tool that uses handcrafted rules + spaCy NLP to reduce prompt verbosity while preserving named entities and domain terms. It’s mostly aimed at high-volume systems (e.g. support bots, moderation pipelines, embedding pipelines for vector DBs).<p>Tested it on 135 real prompts and got 22.4% average compression with high semantic fidelity.<p>GitHub: <a href="https://github.com/metawake/prompt_compressor">https://github.com/metawake/prompt_compressor</a><p>Would love feedback, use cases, or critiques!</p>
]]></description><pubDate>Thu, 17 Apr 2025 13:21:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=43716433</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=43716433</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43716433</guid></item><item><title><![CDATA[Exploring spaCy-based prompt compression for LLMs – thoughts welcome]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/metawake/prompt_compressor">https://github.com/metawake/prompt_compressor</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43716432">https://news.ycombinator.com/item?id=43716432</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 17 Apr 2025 13:21:20 +0000</pubDate><link>https://github.com/metawake/prompt_compressor</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=43716432</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43716432</guid></item><item><title><![CDATA[Anti-fragile web development, preventing “Black Swans”]]></title><description><![CDATA[
<p>Article URL: <a href="http://metawake.tumblr.com/post/97571150482/anti-fragile-web-development">http://metawake.tumblr.com/post/97571150482/anti-fragile-web-development</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=8319629">https://news.ycombinator.com/item?id=8319629</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 15 Sep 2014 16:03:35 +0000</pubDate><link>http://metawake.tumblr.com/post/97571150482/anti-fragile-web-development</link><dc:creator>metawake</dc:creator><comments>https://news.ycombinator.com/item?id=8319629</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=8319629</guid></item></channel></rss>