<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: rapjul</title><link>https://news.ycombinator.com/user?id=rapjul</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 18 Apr 2026 05:42:55 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=rapjul" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by rapjul in "Show HN: Kreuzberg – Modern async Python library for document text extraction"]]></title><description><![CDATA[
<p>Docling works quite well for me to convert a scanned book PDF to Markdown text.<p>On the command line, first install `uv` from <a href="https://github.com/astral-sh/uv?tab=readme-ov-file#installation">https://github.com/astral-sh/uv?tab=readme-ov-file#installat...</a>, then run `uv tool install -U "docling[tesserocr,ocrmac,vlm]"` (first includes the tesserocr, ocrmac (macOS only), and vlm (for running a small Image-to-Text model to get descriptions of images).<p>You go here <a href="https://github.com/DS4SD/docling/blob/main/pyproject.toml#L124">https://github.com/DS4SD/docling/blob/main/pyproject.toml#L1...</a> to see all the extra installation options.<p>For cached/offline use, run `docling-tools models download` to download their models.</p>
]]></description><pubDate>Sat, 22 Feb 2025 01:04:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43135014</link><dc:creator>rapjul</dc:creator><comments>https://news.ycombinator.com/item?id=43135014</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43135014</guid></item></channel></rss>