<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: TheNovaBomb</title><link>https://news.ycombinator.com/user?id=TheNovaBomb</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 05 Jun 2026 01:25:30 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=TheNovaBomb" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by TheNovaBomb in "OCR4all"]]></title><description><![CDATA[
<p>What kind of accuracy have you reached with this pipeline of Tesseract+LLM? I imagine that there would be a hard limit as to what level the LLM could improve the OCR extract text from Tesseract, since its far from perfect itself.<p>Haven't seen many people mention it, but have just been using the PaddleOCR library on it's own and has been very good for me. Often achieving better quality/accuracy than some of the best V-LLM's, and generally much better quality than other open-source OCR models I've tried like Tesseract for example.<p>That being said, my use case is definitely focused primarily on digital text, so if you're working with handwritten text, take this with a grain of salt.<p><a href="https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_en.md">https://github.com/PaddlePaddle/PaddleOCR/blob/main/README_e...</a><p><a href="https://huggingface.co/spaces/echo840/ocrbench-leaderboard" rel="nofollow">https://huggingface.co/spaces/echo840/ocrbench-leaderboard</a></p>
]]></description><pubDate>Fri, 14 Feb 2025 18:19:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=43051342</link><dc:creator>TheNovaBomb</dc:creator><comments>https://news.ycombinator.com/item?id=43051342</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43051342</guid></item></channel></rss>