Hacker News: gillesjacobs

New comment by gillesjacobs in "The disturbing white paper Red Hat is trying to erase from the internet"

gillesjacobs — Sat, 11 Apr 2026 18:15:42 +0000

https://web.archive.org/web/20260402155236/https://www.redha...

Archive URL to original paper

New comment by gillesjacobs in "Cursor Composer 2 is just Kimi K2.5 with RL"

gillesjacobs — Fri, 20 Mar 2026 13:52:26 +0000

I stand corrected, that is pretty scummy.

I bet Moonshot is going to make them open their wallets to avoid legal trouble.

New comment by gillesjacobs in "Cursor Composer 2 is just Kimi K2.5 with RL"

gillesjacobs — Fri, 20 Mar 2026 10:22:15 +0000

They probably licensed it. Still a bit deceptive not to mention it on the model card/blog post, but companies whitelabel all the time without mentioning.

It goes against the ML community ethos to obscure it, but is common branding practice.

New comment by gillesjacobs in "Cursor Composer 2 is just Kimi K2.5 with RL"

gillesjacobs — Fri, 20 Mar 2026 10:17:50 +0000

Cursor is mostly an IDE / coding-agent harness company. So it probably makes sense for them not to train their own base model, but instead license something like Kimi and fine-tune it for their own harness and workflows.

Their moat looks pretty thin. A VSCode fork with an open-source LLM fork on top. In the fast-moving coding-agent market, it’s not obvious they keep their massive valuation forever.

New comment by gillesjacobs in "Scott Adams has died"

gillesjacobs — Wed, 14 Jan 2026 17:55:07 +0000

I liked his cartoons and he did no wrong.

New comment by gillesjacobs in "Show HN: Replacing my OS process scheduler with an LLM"

gillesjacobs — Sun, 04 Jan 2026 10:21:26 +0000

You're underselling this as a process manager, it could also be a productivity tool with some prompt changes; Determine procrastination apps: games, non-professional chat, video streaming and kill it.

New comment by gillesjacobs in "A Developer Accidentally Found CSAM in AI Data. Google Banned Him for It"

gillesjacobs — Thu, 11 Dec 2025 16:53:30 +0000

https://archive.ph/awvmJ

New comment by gillesjacobs in "How to sequence your DNA for <$2k"

gillesjacobs — Sun, 19 Oct 2025 08:26:20 +0000

They save money by cheap labour and batching large quantities for analysis. For the consumer this means long wait times and potentially expired DNA samples.

I tried two samples with Nebula, waited 11 months total. Both samples failed. Got a refund on the service but spent 50usd in postage for the sample kit.

New comment by gillesjacobs in "A PM's Guide to AI Agent Architecture"

gillesjacobs — Thu, 04 Sep 2025 21:30:05 +0000

Nice framing for PMs, but technically it is way too rosy. MCP is real but still full of low utility services and security issues, so “skills as plug-ins” is not production ready. A2A protocols were only just announced this year (Google, etc.) and actual inter-agent interoperability is still research grade, with debugging across agents being a nightmare. Orchestration layers (skills, workflows, multi-agent) look clean in diagrams but turn into brittle state machines under load. LLM “confidence scores” are basically uncalibrated logits dressed up as probabilities.

In short: nice industry roadmap, but we are nowhere near robust, trustworthy multi-agent systems yet.

New comment by gillesjacobs in "Show HN: PageIndex – Vectorless RAG"

gillesjacobs — Fri, 29 Aug 2025 15:51:00 +0000

I am always on the lookout for new document extraction tools, but can't seem to find any benchmarks for PageIndex-OCR. There are several like OmniDocBench and readoc. So... Got benchmark?

New comment by gillesjacobs in "Show HN: PageIndex – Vectorless RAG"

gillesjacobs — Fri, 29 Aug 2025 15:13:00 +0000

Extracting structure and elements from HTML should be trivial and probably has multiple libraries in your programming language of choice. Be happy you have machine-readable semantic documents, that's best-case scenario in NLP. I used to convert the chunks to Markdown as it was more token-efficient and LLMs are often heavily preference trained on Markdown, but not sure with current input pricing and LLM performance gains that matters anymore.

If you have scanned documents, last I checked Gemini Flash was very good cost/performance wise for document extraction. Mistral OCR claims better performance in their benchmarks but people I know used it and other benchmarks beg to differ. Personally I use Azure Document Intelligence a lot for the bounding boxes feature, but Gemini Flash apparently has this covered too.

https://getomni.ai/blog/ocr-benchmark

Sidenote: What you want for RAG is not OCR as-in extracting text. The task for RAG preprocessing is typically called Document Layout Analysis or End-to-End Document Parsing/Extraction.

Good RAG is multimodal and semantic document structure and layout-aware so your pipeline needs to extract and recognize text sections, footers/headers, images, and tables. When working with PDFs you want accurate bounding boxes in your metadata for referring your users to retrieved sources etc.

New comment by gillesjacobs in "Show HN: PageIndex – Vectorless RAG"

gillesjacobs — Fri, 29 Aug 2025 15:02:53 +0000

A suspicious lack of any performance metrics on the many standard RAG/QA benchmarks out there, except for their highly fine-tuned and dataset-specific MAFIN2.5 system. I would love the see this approach vs. a similarly well-tuned structured hybrid retriever (vector similarity + text matching) which is the common way of building domain-specific RAG. The FinanceBench GPT4o+Search system never mentions what the retrieval approach is [1,2], so I will have to assume it is the dumbest retriever possible to oversell the improvement.

PageIndex does not state to what degree the semantic structuring is rule-based (document structure) or also inferred by an ML model, in any case structuring chunks using semantic document structure is nothing new and pretty common, as is adding generated titles and summaries to the chunk nodes. But I find it dubious that prompt-based retrieval on structured chunk metadata works robustly, and if it does perform well it is because of the extra work in prompt-engineering done on chunk metadata generation and retrieval. This introduces two LLM-based components that can lead to highly variable output versus a traditional vector chunker and retriever. There are many more knobs to tune in a text prompt and an LLM-based chunker than in a sentence/paragraph chunker and a vector+text similarity hybrid retriever.

You will have to test retrieval and generation performance for your application regardless, but with so many LLM-based components this will lead to increased iteration time and cost vs. embeddings. Advantage of PageIndex is you can make it really domain-specific probably. Claims of improved retrieval time are dubious, vector databases (even with hybrid search) are highly efficient, definitely more efficient that prompting an LLM to select relevant nodes.

1. https://pageindex.ai/blog/Mafin2.5 2. https://github.com/VectifyAI/Mafin2.5-FinanceBench

New comment by gillesjacobs in "Belgian CVD is deeply broken"

gillesjacobs — Tue, 15 Jul 2025 16:44:46 +0000

Had many a friend in the Belgian hacker scene who were threatened with legal action after responsible disclosure. To my knowledge, these threats always remained empty: if there is one thing more expensive than engineering a fix, it is starting a lawsuit in Belgium.

It is a sad state-of-affairs that the culture is like this. Ultimately it results in a less secure society, where vulns are anonymously disclosed and shared.

New comment by gillesjacobs in "MCP: An (Accidentally) Universal Plugin System"

gillesjacobs — Sun, 29 Jun 2025 09:59:05 +0000

It doesn't do it magically. The "tools" an LLM agent calls to create responses are typically REST APIs for these services.

Previously, many companies gated these APIs but with the MCP AI hype they are incentivized to expose what you can achieve with APIs through an agent service.

Incentives align here: user wants automations on data and actions on a service they are already using, company wants AI marketing, USP in automation features and still gets to control the output of the agent.

New comment by gillesjacobs in "Show HN: ChatToSTL – AI text-to-CAD for 3D printing"

gillesjacobs — Thu, 12 Jun 2025 22:39:37 +0000

Looks very cool!

I prototyped something like this with build123d for Python and Cursor + OCP VSCode plugin.

Build123d is too new with too little examples out there, unlike OpenSCAD. I can only get it to generate good code with largr reasoning models that access the latest docs. No fast iteration for build123d yet.

New comment by gillesjacobs in "Germany creates 'super–high-tech ministry' for research, technology, aerospace"

gillesjacobs — Fri, 11 Apr 2025 22:09:13 +0000

I need an LLM in my fax machine.

New comment by gillesjacobs in "OpenEuroLLM"

gillesjacobs — Thu, 20 Feb 2025 22:21:40 +0000

The team is currently skiing for two weeks so we'll have to get back to you on that.

New comment by gillesjacobs in "Open-Sourcing R1 1776"

gillesjacobs — Tue, 18 Feb 2025 22:17:12 +0000

The naming might be somewhat politically coloured but post training with quality data is the best case for uncensoring models: abliteration usually causes substantial drop in performance.

Too bad the created dataset is not open source, as that would allow to verify the objectivity of answers to make sure it is not just a different flavour of propaganda.

That dataset is strategically useful for Perplexity as many more CCP-censored Chinese models are sure to be released.

New comment by gillesjacobs in "If you believe in "Artificial Intelligence", take five minutes to ask it"

gillesjacobs — Sat, 15 Feb 2025 10:37:34 +0000

Using 03-mini-high + Search I get the right answer he was looking for:

  The species was first split at the subgeneric level by Gregory S. Paul in 1988—he proposed the name Brachiosaurus (Giraffatitan) brancai. Then in 1991 George Olshevsky raised the subgenus Giraffatitan to full generic status, so that B. brancai became Giraffatitan brancai. Later, a 2009 study by Michael P. Taylor provided detailed evidence supporting this separation.

I guess Mike Taylor will gracefully cede his point now?

It is very funny to me that someone would feel the need to complain about a niche factual error in pretrained LLMs without even enabling RAG. If you even know the basics about this field, you shouldn't be surprised.

Of course this was probably more about ego stroking his paleontological achievement than a thoughtful evaluation of the current state of LLMs.

New comment by gillesjacobs in "Show HN: FastGraphRAG – Better RAG using good old PageRank"

gillesjacobs — Tue, 19 Nov 2024 09:53:03 +0000

Do you have any retrieval and generation metric scores (eg, KILT or NQ datasets)?

I know benchmark datasets are not the be-all-end-all, but a halfway decent score and inference-time, would really help sell your framework (or help engineers make the choice).

In any case, very cool work, I built a lot of RAG pipelines as freelance NLP engineer and I will try this out.