<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: sidmo</title><link>https://news.ycombinator.com/user?id=sidmo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 21 Apr 2026 11:47:42 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=sidmo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"]]></title><description><![CDATA[
<p>I'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. I built a simple API over it if you want to try it out: <a href="https://github.com/DataFog/vlm-api">https://github.com/DataFog/vlm-api</a></p>
]]></description><pubDate>Wed, 20 Nov 2024 17:00:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=42195901</link><dc:creator>sidmo</dc:creator><comments>https://news.ycombinator.com/item?id=42195901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42195901</guid></item><item><title><![CDATA[New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"]]></title><description><![CDATA[
<p>VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: <a href="https://github.com/DataFog/vlm-api">https://github.com/DataFog/vlm-api</a></p>
]]></description><pubDate>Wed, 20 Nov 2024 16:59:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=42195886</link><dc:creator>sidmo</dc:creator><comments>https://news.ycombinator.com/item?id=42195886</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42195886</guid></item><item><title><![CDATA[New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"]]></title><description><![CDATA[
<p>If you are looking for the latest/greatest in file processing i'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document.  Picks up text that OCR misses. My company DataFog has an open-source demo if you want to try it out: <a href="https://github.com/DataFog/vlm-api">https://github.com/DataFog/vlm-api</a><p>If you're looking for an all-in-one solution, little plug for our new platform that does the above and also allows you to create custom 'patterns' that get picked up via semantic search. Uses open-source models by default, can deploy into your internal network. www.datafog.ai. In beta now and onboarding manually. Shoot me an email if you'd like to learn more!</p>
]]></description><pubDate>Wed, 20 Nov 2024 16:27:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=42195512</link><dc:creator>sidmo</dc:creator><comments>https://news.ycombinator.com/item?id=42195512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42195512</guid></item></channel></rss>