Hacker News: sidmo

New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"

sidmo — Wed, 20 Nov 2024 17:00:01 +0000

I'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. I built a simple API over it if you want to try it out: https://github.com/DataFog/vlm-api

New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"

sidmo — Wed, 20 Nov 2024 16:59:00 +0000

VLMs are cool - they generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. Here's an open-source API demo I built if you want to try it out: https://github.com/DataFog/vlm-api

New comment by sidmo in "Show HN: Documind – Open-source AI tool to turn documents into structured data"

sidmo — Wed, 20 Nov 2024 16:27:02 +0000

If you are looking for the latest/greatest in file processing i'd recommend checking out vision language models. They generate embeddings of the images themselves (as a collection of patches) and you can see query matching displayed as a heatmap over the document. Picks up text that OCR misses. My company DataFog has an open-source demo if you want to try it out: https://github.com/DataFog/vlm-api

If you're looking for an all-in-one solution, little plug for our new platform that does the above and also allows you to create custom 'patterns' that get picked up via semantic search. Uses open-source models by default, can deploy into your internal network. www.datafog.ai. In beta now and onboarding manually. Shoot me an email if you'd like to learn more!