Hacker News: metawake

Show HN: Ragprobe – measure RAG domain difficulty before deploying,no embeddings

metawake — Tue, 24 Mar 2026 13:01:29 +0000

Article URL: https://pypi.org/project/ragprobe/0.1.0/

Comments URL: https://news.ycombinator.com/item?id=47501960

Points: 1

# Comments: 0

PageIndex (19k stars) scored 44% on legal docs. Same as vector RAG

metawake — Wed, 04 Mar 2026 18:03:11 +0000

Article URL: https://medium.com/@TheWake/three-rag-architectures-one-legal-document-25-needles-none-found-more-than-half-cebdc7ab3a90

Comments URL: https://news.ycombinator.com/item?id=47251342

Points: 1

# Comments: 0

New comment by metawake in "Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it"

metawake — Wed, 21 Jan 2026 18:01:15 +0000

Great suggestion!! this is exactly the right methodology for establishing confidence intervals.

I've added this to the roadmap as `--bootstrap N`:

    ragtune simulate --queries queries.json --bootstrap 5
    
    # Output:
    # Recall@5:  0.664 ± 0.012 (n=5)
    # MRR:       0.533 ± 0.008 (n=5)

The implementation would sample N random subsets from the query set (or corpus), run each independently, and report mean ± std.

This also enables detecting real regressions vs noise eg "Recall dropped 3% ± 0.8%" is actionable, "dropped 3%" alone isn't.

Will ship this during next few weeks. Thanks for the push toward more rigorous methodology, this is exactly what's missing from most RAG benchmarks.

New comment by metawake in "Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it"

metawake — Wed, 21 Jan 2026 14:17:58 +0000

Author here. Built RagTune to stop guessing at RAG configs.

Surprising findings:

1. On legal text (CaseHOLD), 1024 chunks scored WORST (0.618). The "small" 256 chunks won (0.664). 7% swing.

2. On Wikipedia text? All chunk sizes hit ~99%. No difference.

3. Plot twist: At 5K docs, optimal chunk size FLIPPED from 256→1024. Scale changes everything.

Code is MIT: github.com/metawake/ragtune

Happy to discuss methodology.

Show HN: RAG chunk size "best practices" failed on legal text – I benchmarked it

metawake — Wed, 21 Jan 2026 14:17:35 +0000

Article URL: https://medium.com/@TheWake/i-built-a-rag-tuning-tool-and-discovered-intuition-fails-on-legal-text-9744be9a4bc5

Comments URL: https://news.ycombinator.com/item?id=46706025

Points: 2

# Comments: 3

New comment by metawake in "Show HN: RagTune – EXPLAIN ANALYZE for your RAG retrieval layer"

metawake — Thu, 15 Jan 2026 17:13:34 +0000

Thanks! To answer your questions:

*Backends:* Currently supports Qdrant, pgvector, Weaviate, Chroma, and Pinecone. Adding more is straightforward since it's just implementing a Store interface. Let me know if I missed some good backend!

*Relevance scoring:* No LLM-as-judge — that's intentional. RagTune focuses on retrieval-layer metrics only:

- Vector similarity scores (what the DB returns) - Recall@K, MRR against your golden set - Score distribution diagnostics

The philosophy is: debug retrieval separately from generation. If your retrieval is broken, no amount of prompt engineering will fix it.

For chunk size/overlap optimization — exactly the use case! `ragtune compare --chunk-sizes 256,512,1024` lets you see the impact directly.

Happy to hear feedback if you try it!

Show HN: RagTune – EXPLAIN ANALYZE for your RAG retrieval layer

metawake — Thu, 15 Jan 2026 16:53:31 +0000

CLI tool to debug and benchmark RAG retrieval without LLM calls.

- `ragtune explain "query"` → see what was retrieved with scores - `ragtune simulate` → batch eval with recall/MRR metrics - `ragtune compare` → compare embedders or chunk sizes - CI/CD mode for quality gates

Works with Qdrant, pgvector, Weaviate, Chroma, Pinecone.

Built because I kept guessing why retrieval was bad. Now I can see exactly what's happening.

Comments URL: https://news.ycombinator.com/item?id=46635422

Points: 1

# Comments: 1

New comment by metawake in "Ask HN: How are you doing RAG locally?"

metawake — Thu, 15 Jan 2026 12:56:47 +0000

I am using a vector DB using Docker image. And for debugging and benchmarking local RAG retrieval, I've been building a CLI tool that shows what's actually being retrieved:

  ragtune explain "your query" --collection prod

Shows scores, sources, and diagnostics. Helps catch when your chunking or embeddings are silently failing or you need numeric estimations to base your judgements on.

Open source: https://github.com/metawake/ragtune

New comment by metawake in "The Policy Puppetry Attack: Novel bypass for major LLMs"

metawake — Mon, 28 Apr 2025 14:58:44 +0000

I made a small project (https://github.com/metawake/puppetry-detector) to detect this type of LLM policy manipulation. It's an early idea using a set of regexp patterns (for speed) and a couple of phases of text analysis. I am curious if it's any useful, I created integration with Rebuff (loss security suite) just in case.

New comment by metawake in "Exploring spaCy-based prompt compression for LLMs – thoughts welcome"

metawake — Thu, 17 Apr 2025 13:21:20 +0000

Hi HN,

I’ve been exploring whether prompt compression — done before sending input to LLMs — can help cut down on token usage and cost without losing key meaning.

Instead of using a neural model, I wrote a small open-source tool that uses handcrafted rules + spaCy NLP to reduce prompt verbosity while preserving named entities and domain terms. It’s mostly aimed at high-volume systems (e.g. support bots, moderation pipelines, embedding pipelines for vector DBs).

Tested it on 135 real prompts and got 22.4% average compression with high semantic fidelity.

GitHub: https://github.com/metawake/prompt_compressor

Would love feedback, use cases, or critiques!

Exploring spaCy-based prompt compression for LLMs – thoughts welcome

metawake — Thu, 17 Apr 2025 13:21:20 +0000

Article URL: https://github.com/metawake/prompt_compressor

Comments URL: https://news.ycombinator.com/item?id=43716432

Points: 1

# Comments: 1

Anti-fragile web development, preventing “Black Swans”

metawake — Mon, 15 Sep 2014 16:03:35 +0000

Article URL: http://metawake.tumblr.com/post/97571150482/anti-fragile-web-development

Comments URL: https://news.ycombinator.com/item?id=8319629

Points: 1

# Comments: 1