Show HN: Unsiloed – VLMs for Document Ingestion

ady9999 — Fri, 13 Jun 2025 21:36:10 +0000

I'm excited to introduce Unsiloed Chunker, an open-source Python library designed for efficient document chunking in retrieval-augmented generation (RAG) applications.

Key Features:

Multi-threaded Processing: Speeds up chunking operations by processing multiple documents simultaneously. Supports Multiple File Types: Handles PDF, DOCX, and PPTX formats. Flexible Chunking Strategies: Offers fixed-size and page-based chunking methods. Zero Dependencies: Lightweight and easy to integrate into your projects. Installation:

pip install unsiloed-chunker Usage Example:

from unsiloed_chunker import Chunker

chunker = Chunker(file_path="your_document.pdf") chunks = chunker.chunk(strategy="fixed_size", chunk_size=500) for chunk in chunks: print(chunk) For more details, check out the documentation.

I'd love to hear your feedback and suggestions!

Comments URL: https://news.ycombinator.com/item?id=44272502

Points: 1

# Comments: 0

New comment by ady9999 in "Ingesting PDFs and why Gemini 2.0 changes everything"

ady9999 — Fri, 07 Feb 2025 02:04:17 +0000

We have been building smaller and more efficient VLMs for document extraction from way before and we are 10x faster than unstructured,reducto (the ocr vendors) with an accuracy of 90%.

P.S. - You can find us here (unsiloed-ai.com) or you can reach out to me on adnan.abbas@unsiloed-ai.com

Hacker News: ady9999

Show HN: Unsiloed – VLMs for Document Ingestion

New comment by ady9999 in "Ingesting PDFs and why Gemini 2.0 changes everything"