<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: kbyatnal</title><link>https://news.ycombinator.com/user?id=kbyatnal</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 07 Apr 2026 10:48:58 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=kbyatnal" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[PoliTax Split: PDF splitting benchmark from presidential tax returns]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.extend.ai/resources/document-splitting-benchmark">https://www.extend.ai/resources/document-splitting-benchmark</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47579121">https://news.ycombinator.com/item?id=47579121</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 30 Mar 2026 20:08:29 +0000</pubDate><link>https://www.extend.ai/resources/document-splitting-benchmark</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=47579121</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47579121</guid></item><item><title><![CDATA[How we built a prompt optimization agent]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.extend.ai/resources/how-we-built-composer">https://www.extend.ai/resources/how-we-built-composer</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47368189">https://news.ycombinator.com/item?id=47368189</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 13 Mar 2026 18:56:24 +0000</pubDate><link>https://www.extend.ai/resources/how-we-built-composer</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=47368189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47368189</guid></item><item><title><![CDATA[New comment by kbyatnal in "Rolling your own serverless OCR in 40 lines of code"]]></title><description><![CDATA[
<p>Deepseek OCR is no longer state of the art. There are much better open source OCR models available now.<p>ocrarena.ai maintains a leaderboard, and a number of other open source options like dots [1] or olmOCR [2] rank higher.<p>[1] <a href="https://www.ocrarena.ai/compare/dots-ocr/deepseek-ocr" rel="nofollow">https://www.ocrarena.ai/compare/dots-ocr/deepseek-ocr</a><p>[2] <a href="https://www.ocrarena.ai/compare/olmocr-2/deepseek-ocr" rel="nofollow">https://www.ocrarena.ai/compare/olmocr-2/deepseek-ocr</a></p>
]]></description><pubDate>Mon, 16 Feb 2026 13:30:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47034720</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=47034720</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47034720</guid></item><item><title><![CDATA[New comment by kbyatnal in "Show HN: OCR Arena – A playground for OCR models"]]></title><description><![CDATA[
<p>Ultimately, there’s some intersection of accuracy x cost x speed that’s ideal, which can be different per use case. We’ll surface all of those metrics shortly so that you can pick the best model for the job along those axes.</p>
]]></description><pubDate>Tue, 25 Nov 2025 01:05:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46041260</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=46041260</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46041260</guid></item><item><title><![CDATA[New comment by kbyatnal in "Show HN: OCR Arena – A playground for OCR models"]]></title><description><![CDATA[
<p>Claude coming shortly (in the next ~1 hour)</p>
]]></description><pubDate>Tue, 25 Nov 2025 00:55:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46041185</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=46041185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46041185</guid></item><item><title><![CDATA[New comment by kbyatnal in "Show HN: OCR Arena – A playground for OCR models"]]></title><description><![CDATA[
<p>We wanted to keep the focus on (1) foundation VLMs and (2) open source OCR models.<p>We had Mistral previously but had to remove it because their hosted API for OCR was super unstable and returned a lot of garbage results unfortunately.<p>Paddle, Nanonets, and Chandra being added shortly!</p>
]]></description><pubDate>Tue, 25 Nov 2025 00:55:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=46041184</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=46041184</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46041184</guid></item><item><title><![CDATA[New comment by kbyatnal in "Show HN: OCR Arena – A playground for OCR models"]]></title><description><![CDATA[
<p>Sonnet/Opus is being added shortly!</p>
]]></description><pubDate>Tue, 25 Nov 2025 00:51:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=46041162</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=46041162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46041162</guid></item><item><title><![CDATA[Show HN: OCR Arena – A playground for OCR models]]></title><description><![CDATA[
<p>I built OCR Arena as a free playground for the community to compare leading foundation VLMs and open-source OCR models side-by-side.<p>Upload any doc, measure accuracy, and (optionally) vote for the models on a public leaderboard.<p>It currently has Gemini 3, dots.ocr, DeepSeek, GPT5, olmOCR 2, Qwen, and a few others. If there's any others you'd like included, let me know!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46006104">https://news.ycombinator.com/item?id=46006104</a></p>
<p>Points: 216</p>
<p># Comments: 63</p>
]]></description><pubDate>Fri, 21 Nov 2025 16:44:45 +0000</pubDate><link>https://www.ocrarena.ai/battle</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=46006104</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46006104</guid></item><item><title><![CDATA[New comment by kbyatnal in "GPT-5o-mini hallucinates medical residency applicant grades"]]></title><description><![CDATA[
<p>Yeah that can occasionally work and something we've tested, but it introduces a lot of noise unfortunately and makes systematic evals difficult.</p>
]]></description><pubDate>Tue, 14 Oct 2025 17:22:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=45582578</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45582578</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45582578</guid></item><item><title><![CDATA[New comment by kbyatnal in "GPT-5o-mini hallucinates medical residency applicant grades"]]></title><description><![CDATA[
<p>School transcripts are surprisingly one of the hardest documents to parse. The thing that makes them tricky is (1) the multi-column tabular layouts and (2) the data ambiguity.<p>Transcript data is usually found in some sort of table, but they're some of the hardest tables for OCR or LLMs to interpret. There's all kinds of edge cases with tables split across pages, nested cells, side-by-side columns, etc. The tabular layout breaks every off-the-shelf OCR engine we've run across (and we've benchmarked all of them). To make it worse, there's no consistency at all (every school in the country basically has their own format).<p>What we've seen help in these cases are:<p>1. VLM based review and correction of OCR errors for tables. OCR is still critical for determinism, but VLMs really excel at visually interpreting the long tail.<p>2. Using both HTML and Markdown as an LLM input format. For some of the edge cases, markdown cannot represent certain structures (e.g. a table cell nested within a table cell). HTML is a much better representation for this, and models are trained on a lot of HTML data.<p>The data ambiguity is a whole set of problems on its own (e.g. how do you normalize what a "semester" is across all the different ways it can be written). Eval sets + automated prompt engineering can get you pretty far though.<p>Disclaimer: I started a LLM doc processing company to help companies solve problems in this space (<a href="https://extend.ai/">https://extend.ai/</a>).</p>
]]></description><pubDate>Tue, 14 Oct 2025 15:50:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=45581480</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45581480</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45581480</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>thanks! Datalab is great, I've met Vik a few times and their team has done some impressive work. We can also support the conversion to markdown use case, and might be a better fit depending on your use case. Feel free to create an account to try it out!</p>
]]></description><pubDate>Thu, 09 Oct 2025 21:46:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=45533447</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45533447</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45533447</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>It's very dependent on the use case. That's why we offer a native evals experience in the product, so you can directly measure the % accuracy diffs between the two modes for your exact docs.<p>As a rule of thumb, light processing mode is great for (1) most classification tasks, (2) splitting on smaller docs, (3) extraction on simpler documents, or (4) latency sensitive use cases.</p>
]]></description><pubDate>Thu, 09 Oct 2025 19:38:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=45532125</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45532125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45532125</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>Exactly correct! We've had users migrate over from other providers because our granular pricing enabled new use cases that weren't feasible to do before.<p>One interesting thing we've learned is, most production pipelines often end up using a combination of the two (e.g. cheap classification and splitting, paired with performance extraction).</p>
]]></description><pubDate>Thu, 09 Oct 2025 19:35:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=45532096</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45532096</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45532096</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>Feedback heard. Pricing is hard, and we've iterated on this multiple times so far.<p>Our goal is to provide customers with as much transparency & flexibility as possible. Our pricing has 2 axes:<p>- the complexity of the task<p>- performance processing vs cost-optimized processing<p>Complexity matters because e.g. classification is much easier than extraction, and as such it should be cheaper. That unlocks a wide range of use cases, such as tagging and filtering pipelines.<p>Toggles for performance is also important because not all use cases are created equal. Similar to how having options between cheaper and the best foundation models is important, the same applies to document tasks.<p>For certain use cases, you might be willing to take a slight hit to accuracy in exchange for better costs and latency. To support this, we offer a "light" processing mode (with significantly lower prices) that uses smaller models, fewer VLMs, and more heuristics under the hood.<p>For other use cases, you simply want the highest accuracy possible. Our "performance" processing mode is a great fit for that, which enables layout models, signature detection, handwriting VLMs, and the most performant foundation models.<p>In fact, most pipelines we seen in production often end up combining the two (cheap classification and splitting, paired with performance extraction).<p>Without this level of granularity, we'd either be overcharging certain customers or undercharging others. I definitely understand how this is confusing though, we'll work on making our docs better!</p>
]]></description><pubDate>Thu, 09 Oct 2025 19:33:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=45532082</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45532082</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45532082</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>good question!<p>Our goal is to provide customers with as much flexibility as possible. For certain use cases, you might be willing to take a slight hit to accuracy in exchange for better costs and latency. To support this, we offer a "light" processing mode (with significantly lower prices) that uses smaller models, fewer VLMs, and more heuristics under the hood.<p>For other use cases, you simply want the highest accuracy possible. Our "performance" processing mode is a great fit for that, which enables layout models, signature detection, handwriting VLMs, and the most performant foundation models.<p>We back this up with a native evals experience in the product, so you can directly measure the % accuracy difference between the two modes for your exact use case.</p>
]]></description><pubDate>Thu, 09 Oct 2025 19:20:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=45531928</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45531928</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45531928</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>thanks!<p>A lot of customers choose us for our handwriting, checkbox, and table performance. To handle complex handwriting, we've built an agentic OCR correction layer which uses a VLM to review and make edits to low confidence OCR errors.<p>Tables are a tricky beast, and the long tail of edge cases here is immense. A few things we've found to be really impactful are (1) semantic chunking that detects table boundaries (so a table that spans multiple pages doesn't get chopped in half) and (2) table-to-HTML conversion (in addition to markdown). Markdown is great at representing most simple tables, but can't represent cases where you have e.g. nested cells.<p>You can see examples of both in our demo! <a href="https://dashboard.extend.ai/demo">https://dashboard.extend.ai/demo</a><p>Accuracy and data verification is challenging. We have a set of internal benchmarks we use, which gets us pretty far, but that's not always representative of specific customer situations. That's why one of the earliest things we built was a evaluation product, so that customers can easily measure performance on their exact docs and use cases. We recently added support for LLM-as-a-judge and semantic similarity checks, which have been really impactful for measuring accuracy before going live.</p>
]]></description><pubDate>Thu, 09 Oct 2025 19:11:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=45531797</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45531797</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45531797</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>There's certainly a lot of tools that focus on individual parts of the problem (e.g. the OCR layer, or workflows on top). But very few that solve the problem end-to-end with enough flexibility for AI teams that want a lot of control over the experience.<p>For example, we expose options for AI teams to control how chunking works, whether to enable a bounding box citation model, and whether a VLM should correct handwriting errors.<p>Most customers we speak with, the evaluation is actually between Extend or building it in-house (and we have a pretty good win rate here).</p>
]]></description><pubDate>Thu, 09 Oct 2025 17:56:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45530939</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45530939</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45530939</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>thanks! Yup that's correct, we offer a set of APIs for handling documents: parsing, classification, splitting, and extraction.<p>We've seen customers integrate these in a few interesting ways so far:<p>1. Agents (exposing these APIs as tools in certain cases, or into a vector DB for RAG)<p>2. Real-time experiences in their product (e.g. we power all of Brex's user-facing document upload flows)<p>3. Embedded in internal tooling for back-office automation<p>Our customers are already requesting new APIs and capabilities for all the other problems they run into with documents (e.g. fintech customers want fraud detection, healthcare users need form filling). Some of these we'll be rolling out soon!</p>
]]></description><pubDate>Thu, 09 Oct 2025 17:25:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=45530610</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45530610</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45530610</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>There's definitely no shortage of options. OCR has been around for decades at this point, and legacy IDP solutions really proliferated in the last ~10 years.<p>The world today is quite different though. In the last 24 months, the "TAM" for document processing has expanded by multiple orders of magnitude. In the next 10 years, trillions of pages of documents will be ingested across all verticals.<p>Previous generations of tools were always limited to the same set of structured/semi-structured documents (e.g. tax forms). Today, engineering teams are ingesting truly the wild west of documents, from 500pg mortgage packages to extremely messy healthcare forms. All of those legacy providers fall apart when tackling these types of actual unstructured docs.<p>We work with hundreds of customers now, and I'd estimate 90% of the use cases we tackle weren't technically solvable until ~12 months ago. So it's nearly all greenfield work, and very rarely replacing an existing vendor or solution already in place.<p>All that to say, the market is absolutely huge. I do suspect we'll see a plateau in new entrants though (and probably some consolidation of current ones). With how fast the AI space moves, it's nearly impossible to compete if you enter a market just a few months too late.</p>
]]></description><pubDate>Thu, 09 Oct 2025 17:16:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=45530481</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45530481</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45530481</guid></item><item><title><![CDATA[New comment by kbyatnal in "Launch HN: Extend (YC W23) – Turn your messiest documents into data"]]></title><description><![CDATA[
<p>thank you Fabio!</p>
]]></description><pubDate>Thu, 09 Oct 2025 17:08:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=45530377</link><dc:creator>kbyatnal</dc:creator><comments>https://news.ycombinator.com/item?id=45530377</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45530377</guid></item></channel></rss>