Hacker News: diptanu

New comment by diptanu in "Show HN: Sub-millisecond VM sandboxes using CoW memory forking"

diptanu — Wed, 18 Mar 2026 01:45:08 +0000

The tricky part of doing this in production is cloning sandboxes across nodes. You would have to snapshot the resident memory, file system (or a CoW layer on top of the rootfs), move the data across nodes, etc.

Kosong: Kimi AI's Agent SDK

diptanu — Sun, 09 Nov 2025 05:44:42 +0000

Article URL: https://github.com/MoonshotAI/kosong

Comments URL: https://news.ycombinator.com/item?id=45863229

Points: 1

# Comments: 0

New comment by diptanu in "Benchmarking the Most Reliable Document Parsing API"

diptanu — Thu, 06 Nov 2025 21:24:18 +0000

There was an unusual traffic spike around that time, if you try now it should be a lot faster. We were calling up but there was not enough GPU capacity at that time.

New comment by diptanu in "Benchmarking the Most Reliable Document Parsing API"

diptanu — Thu, 06 Nov 2025 19:53:11 +0000

We haven’t tested Chandra yet, because it’s very new. Under the hood Tensorlake is very similar to Marker - it’s a pipeline based OCR API, we do layout detection, Text Recognition and Detection, Table Structure Understanding, etc. We then use VLMs to enrich the results. Our models are much bigger than marker, and thus takes a little longer to parse documents. We optimized for accuracy. We will have a faster API soon.

New comment by diptanu in "Benchmarking the Most Reliable Document Parsing API"

diptanu — Thu, 06 Nov 2025 19:50:02 +0000

It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.

New comment by diptanu in "Benchmarking the Most Reliable Document Parsing API"

diptanu — Thu, 06 Nov 2025 19:48:55 +0000

OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini

New comment by diptanu in "Benchmarking the Most Reliable Document Parsing API"

diptanu — Thu, 06 Nov 2025 19:20:47 +0000

Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.

On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.

On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.

Roles and Intelligence for Individual Contributors

diptanu — Sun, 12 Oct 2025 16:42:43 +0000

Article URL: https://raees.me/blog/role-and-intelligence/

Comments URL: https://news.ycombinator.com/item?id=45559572

Points: 1

# Comments: 0

RAG isn't dead, the bar has gone up

diptanu — Tue, 19 Aug 2025 20:48:09 +0000

Article URL: https://www.tensorlake.ai/blog/advanced-rag

Comments URL: https://news.ycombinator.com/item?id=44956119

Points: 2

# Comments: 0

New comment by diptanu in "So you want to parse a PDF?"

diptanu — Mon, 04 Aug 2025 04:23:46 +0000

We parse PDFs to convert them to text in a linearized fashion. The use case for this would be to use the content for downstream use cases - search engine, structured extraction, etc.

New comment by diptanu in "So you want to parse a PDF?"

diptanu — Mon, 04 Aug 2025 04:19:56 +0000

Yeah we don't handle this yet.

New comment by diptanu in "So you want to parse a PDF?"

diptanu — Mon, 04 Aug 2025 04:19:35 +0000

Yes this! We training it on a ton of diverse document images to learn reading order and layouts of documents :)

New comment by diptanu in "So you want to parse a PDF?"

diptanu — Mon, 04 Aug 2025 04:18:44 +0000

There are many cases images are exported as PDFs. Think invoices or financial statements that people send to financial services companies. Using layout understanding and OCR based techniques leads to way better results than writing a parser which relies on the files metadata.

The other thing is segmenting a document and linearizing it so that an LLM can understand the content better. Layout understanding helps with figuring out the natural reading order of various blocks of the page.

New comment by diptanu in "So you want to parse a PDF?"

diptanu — Mon, 04 Aug 2025 00:13:35 +0000

Disclaimer - Founder of Tensorlake, we built a Document Parsing API for developers.

This is exactly the reason why Computer Vision approaches for parsing PDFs works so well in the real world. Relying on metadata in files just doesn't scale across different source of PDFs.

We convert PDFs to images, run a layout understanding model on them first, and then apply specialized models like text recognition and table recognition models on them, stitch them back together to get acceptable results for domains where accuracy is table stakes.

Show HN: Tensorlake-Ingest, Parse, and Orchestrate Documents for AI Workflows

diptanu — Thu, 15 May 2025 17:57:21 +0000

Article URL: https://www.tensorlake.ai/blog/announcing-tensorlake-cloud

Comments URL: https://news.ycombinator.com/item?id=43997563

Points: 4

# Comments: 1

New comment by diptanu in "Ingesting PDFs and why Gemini 2.0 changes everything"

diptanu — Thu, 06 Feb 2025 08:51:38 +0000

We started with using LLMs for parsing at Tensorlake (https://docs.tensorlake.ai), tried Qwen, Gemini, OpenAI, pretty much everything under the sun. My thought was we could skip 5-6 years of development IDP companies have done on specialized models by going to LLMs.

On information dense pages, LLMs often hallucinate half of the times, they have trouble understanding empty cells in tables, doesn't understand checkboxes, etc.

We had to invest heavily into building a state of the art layout understanding model and finally a table structure understanding for reliability. LLMs will get there, but there are some ways to go there.

Where they do well is in VQA type use cases, ask a question, very narrowly scoped, they will work much better than OCR+Layout models, because they are much more generalizable and flexible to use.

New comment by diptanu in "Durable execution should be lightweight"

diptanu — Mon, 03 Feb 2025 08:36:29 +0000

I don’t think what you are describing as heavy is that big of a deal if an external orchestration system is required only for deployment, while the workflow can be developed and tested without a server on a laptop or notebook.

Bringing in orchestration logic in the app layer means there is more code being bundled with the app, which has its own set of tradeoffs - like bringing in a different set of code dependencies which might conflict with application code.

In 2025, I would be surprised if a good workflow engine didn’t have a completely server-less development mode :)

New comment by diptanu in "Running Durable Workflows in Postgres Using DBOS"

diptanu — Wed, 11 Dec 2024 08:54:41 +0000

Great points. Besides performance, centralized coordination and distributed dataplane is better for operability of schedulers as well. Some examples - Being able to roll out new features in the scheduler, tracing scheduling behavior and decisions, deploying configuration changes.

Even with a centralized scheduler it should be possible to create a DevEx that makes use of decorators to author workflows easily.

We are doing that with Indexify(https://github.com/tensorlakeai/indexify) for authoring data intensive workflows to process unstructured data(documents, videos, etc) - it’s like Spark but uses Python instead of Scala/SQL/UDFs. Indexify’s scheduler is centralized and it uses RocksDB under the hood for persistence, and long term we are moving to a hybrid storage system - S3 for less frequently updated data, and SSD for read cache and frequently updated data(on going tasks).

The scheduler’s latency for scheduling new tasks is consistently under < 800 microseconds on SSDs.

This is how schedulers have been designed traditionally that have a proven track record of working in production - Borg, Hashicorp Nomad, etc. There are many ways a centralized scheduler can scale out beyond a single machine - parallel scheduling across different by sharding jobs, node pools, and then linearizing and deduplicating conflicts during writes is one such approach.

Love DBOs and Hatchet! cheering for you @jedberg and @abelanger :-)

Disclaimer - I am the founder of Tensorlake, and worked on Nomad and Apache Mesos in the past.

Show HN: rerank-ts – TypeScript Library for Re-Ranking Search Results with LLMs

diptanu — Tue, 11 Jun 2024 13:50:04 +0000

Hi HN, we are announcing a TypeScript library for re-ranking search results from vector databases or from full text search indexes. Re-Ranking is a very important step in retrieval for building RAG applications. It almost immediately improves accuracy of LLM's response synthesis as you are able to feed in much more accurate and relevant context. Why? Because while semantic or full text search systems are designed to be fast and fetch semantically or lexically close document chunks, they don't rank the chunks based on the intent of the user's query. This is where a re-ranker comes in.

Why did we build this?

We couldn't find a self contained framework independent, re-ranking library for Typescript. We also wanted to swap models and algorithms easily, publish and track latency metrics closely. We implemented two different re-ranking algorithms - 1. LLM based re-ranking: It uses the algorithm presented in the paper - "Is ChatGPT Good at Search?" https://arxiv.org/abs/2304.09542 - they implement a sliding window based algorithm to re-rank search results which could be potentially larger than the context length of an LLM. We added support for LLama3 and GPT-4. For Llama3, we are using Groq, but other model providers can be added easily. 2. Reciprocal Rank Fusion - A lightweight algorithm to merge search results from more than one index while preserving their relative importance.

We recently built a consumer application which indexes 100s of 1000s of images using Indexfiy (https://getindexfiy.ai), which indexes various aspects of the image - caption(using a text embedding model), visual descriptions(using a VLM), CLIP embeddings. During the retrieval process we lookup 40 images from each of the index based on user queries, and then use this library to re-rank the results. The results are frankly amazing, compared to not re-ranking at all.

Latency - Latency is a big deal for applications which humans interact with, and LLama3 8B on Groq is the fastest LLM re-ranker based on our experience. They are able to process ~1000 tokens/s. We are able to re-rank 100 images in roughly 1.4 seconds. We haven't tried using GPT4 in production to be able to share any latency related numbers.

Choosing a model - Pick the model which has the best tradeoff for latency vs accuracy. There are many smaller re-ranking models available - Jina AI, BGE, Sentence Transformers, etc. We have heard good things about Cohere's re-ranker as well. We hope to add support for more models in the future. The library has a clean Model Provider interface, so contributions are welcome!

We are hoping this helps developers who are building RAG/any search based LLM applications in React or any other JavaScript framework. Love to hear your thoughts, and feedback to improve the library!

Comments URL: https://news.ycombinator.com/item?id=40646164

Points: 1

# Comments: 0

New comment by diptanu in "Breaking up is hard to do: Chunking in RAG applications"

diptanu — Sat, 08 Jun 2024 17:38:48 +0000

We use grid search to figure out what's the best chunking strategy to use. Create a bunch of different strategies such as recursive chunking, semantic, etc, and parameterize them and see which one works best. The "best" chunking strategy depends on the nature of the documents and the questions being asked.