<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jibuai</title><link>https://news.ycombinator.com/user?id=jibuai</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 04 Apr 2026 15:55:21 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jibuai" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jibuai in "Ingesting PDFs and why Gemini 2.0 changes everything"]]></title><description><![CDATA[
<p>I've been working on something similar the past couple months. A few thoughts:<p>- A lot of natural chunk boundaries span multiple pages, so you need some 'sliding window' mechanism for the best accuracy.<p>- Passing the entire document hurts throughput too much due to the quadratic complexity of attention. Outputs are also much worse when you use too much context.<p>- Bounding boxes can be solved by first generating boxes using tradition OCR / layout recognition, then passing that data to the LLM. The LLM can then link it's outputs to the boxes. Unfortunately getting this reliable required a custom sampler so proprietary models like Gemini are out of the question.</p>
]]></description><pubDate>Wed, 05 Feb 2025 20:40:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=42954782</link><dc:creator>jibuai</dc:creator><comments>https://news.ycombinator.com/item?id=42954782</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42954782</guid></item></channel></rss>