<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: pronoiac</title><link>https://news.ycombinator.com/user?id=pronoiac</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 07 Apr 2026 10:23:05 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=pronoiac" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by pronoiac in "1 Trillion Web Pages Archived"]]></title><description><![CDATA[
<p>I <i>think</i> SciOp is doing something in that area, with a catalog site and webseeds. <a href="https://sciop.net/" rel="nofollow">https://sciop.net/</a></p>
]]></description><pubDate>Mon, 06 Oct 2025 13:43:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=45491352</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=45491352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45491352</guid></item><item><title><![CDATA[New comment by pronoiac in "1 Trillion Web Pages Archived"]]></title><description><![CDATA[
<p>The Archive Team - not part of the Internet Archive - worked on a distributed backup of a portion of the Internet Archive - <a href="https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK" rel="nofollow">https://wiki.archiveteam.org/index.php/INTERNETARCHIVE.BAK</a><p>It's been dormant / on hiatus for a few years now.</p>
]]></description><pubDate>Mon, 06 Oct 2025 13:31:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=45491237</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=45491237</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45491237</guid></item><item><title><![CDATA[New comment by pronoiac in "Building the heap: racking 30 petabytes of hard drives for pretraining"]]></title><description><![CDATA[
<p>I wonder if they'll go with "toploaders" - like Backblaze Storage Pods - later. They have better density and faster setup, as they don't have to screw in every drive.<p>They got used drives. I wonder if they did any testing? I've gotten used drives that were DOA, which showed up in tests - SMART tests, short and long, then writing pseudorandom data to verify capacity.</p>
]]></description><pubDate>Wed, 01 Oct 2025 17:10:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45440231</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=45440231</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45440231</guid></item><item><title><![CDATA[New comment by pronoiac in "Apple Notes Will Gain Markdown Export at WWDC, and, I Have Thoughts"]]></title><description><![CDATA[
<p>There's a flamewar detector, which triggers when there are far more comments than upvotes.</p>
]]></description><pubDate>Thu, 05 Jun 2025 14:37:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=44192125</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=44192125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44192125</guid></item><item><title><![CDATA[New comment by pronoiac in "Find the Odd Disk"]]></title><description><![CDATA[
<p>Kern Type, perhaps? <a href="https://type.method.ac/" rel="nofollow">https://type.method.ac/</a></p>
]]></description><pubDate>Mon, 21 Apr 2025 05:36:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=43748755</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43748755</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43748755</guid></item><item><title><![CDATA[New comment by pronoiac in "Microsoft’s original source code"]]></title><description><![CDATA[
<p>Feel free to run EasyOCR against it and submit a PR</p>
]]></description><pubDate>Fri, 04 Apr 2025 21:26:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43587741</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43587741</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43587741</guid></item><item><title><![CDATA[New comment by pronoiac in "Microsoft’s original source code"]]></title><description><![CDATA[
<p>I attempted OCR, and while it's not great, it's a start. I considered adding a reference to "software wants to be free!" or the Open Letter, but I'm winding down for the night. <a href="https://github.com/pronoiac/altair-basic-source-code">https://github.com/pronoiac/altair-basic-source-code</a></p>
]]></description><pubDate>Fri, 04 Apr 2025 07:26:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=43579267</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43579267</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43579267</guid></item><item><title><![CDATA[New comment by pronoiac in "Microsoft’s original source code"]]></title><description><![CDATA[
<p>I attempted OCR with OCRmyPDF / Tesseract. It's not great, but it's under 1% the size, at least. <a href="https://github.com/pronoiac/altair-basic-source-code">https://github.com/pronoiac/altair-basic-source-code</a></p>
]]></description><pubDate>Fri, 04 Apr 2025 07:22:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43579242</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43579242</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43579242</guid></item><item><title><![CDATA[New comment by pronoiac in "Testing DVD-R and CD-R 25 years later: optical disks from Japan"]]></title><description><![CDATA[
<p>Checking diskprices.com - <a href="https://diskprices.com/?locale=us&condition=new,used&disk_types=bdrw,bdr,dvdrw,dvdr,cdrw,cdr" rel="nofollow">https://diskprices.com/?locale=us&condition=new,used&disk_ty...</a> - there's a cheaper outlier for DVD-R, then it's 25GB BD-Rs for a bit.<p>LTO tape can be cheaper, but the cost of the drives has long been an obstacle to dabbling.</p>
]]></description><pubDate>Wed, 02 Apr 2025 06:23:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=43554127</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43554127</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43554127</guid></item><item><title><![CDATA[New comment by pronoiac in "Show HN: OCR Benchmark Focusing on Automation"]]></title><description><![CDATA[
<p>I've used ocrit, which uses those APIs. <a href="https://github.com/insidegui/ocrit" rel="nofollow">https://github.com/insidegui/ocrit</a><p>There are also:<p>* swiftocr - <a href="https://github.com/fny/swiftocr" rel="nofollow">https://github.com/fny/swiftocr</a><p>* macos-vision-ocr - <a href="https://github.com/bytefer/macos-vision-ocr" rel="nofollow">https://github.com/bytefer/macos-vision-ocr</a></p>
]]></description><pubDate>Sat, 15 Mar 2025 02:44:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43369469</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43369469</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43369469</guid></item><item><title><![CDATA[New comment by pronoiac in "Fediverse Donut Club"]]></title><description><![CDATA[
<p>They asked for something like Bluesky starter packs on <i>Mastodon,</i> not Bluesky starter packs on <i>Bluesky.</i></p>
]]></description><pubDate>Fri, 14 Mar 2025 22:12:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43367817</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43367817</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43367817</guid></item><item><title><![CDATA[New comment by pronoiac in "Fediverse Donut Club"]]></title><description><![CDATA[
<p>I knew I'd seen something, but I just searched for for it; Fedidevs have something like that - <a href="https://fedidevs.com/starter-packs/" rel="nofollow">https://fedidevs.com/starter-packs/</a></p>
]]></description><pubDate>Fri, 14 Mar 2025 16:32:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=43364250</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43364250</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43364250</guid></item><item><title><![CDATA[New comment by pronoiac in "Ask HN: Where are the good Markdown to PDF tools (that meet these requirements)?"]]></title><description><![CDATA[
<p>I think Pandoc and Calibre could work for you.<p>I've worked on PAIP, Paradigms of Artificial Intelligence Programming, and I might be able to help you a bit. It's around 1k pages long. I used Pandoc to generate an epub file, and then Calibre to turn that into a PDF file. I just tried using Pandoc to generate the PDF file directly, and it/LaTeX choked on some Unicode characters.<p>For internal ebook links, there's a Lua script. You'll have to keep anchors unique across the book for this:<p>* good: "chapter1#section1_1" and "chapter2#section2_1"<p>* bad: a "chapter1#section1" and a "chapter2#section1"<p>WIP: <a href="https://github.com/norvig/paip-lisp/pull/195">https://github.com/norvig/paip-lisp/pull/195</a><p>For line wrapping of code, there's CSS. I first used it over on "Writing an Operating System in 1,000 Lines"; here's the PR: <a href="https://github.com/nuta/operating-system-in-1000-lines/pull/52/files">https://github.com/nuta/operating-system-in-1000-lines/pull/...</a></p>
]]></description><pubDate>Sun, 02 Mar 2025 19:14:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=43233889</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43233889</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43233889</guid></item><item><title><![CDATA[New comment by pronoiac in "How to run GUI applications directly in containers"]]></title><description><![CDATA[
<p>I've run an X app from Docker, a Linux container on a macOS host. I was able to move the incantations to a Makefile: <a href="https://github.com/ryanfb/docker_scantailor">https://github.com/ryanfb/docker_scantailor</a></p>
]]></description><pubDate>Thu, 27 Feb 2025 18:24:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43196970</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43196970</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43196970</guid></item><item><title><![CDATA[New comment by pronoiac in "These years in Common Lisp: 2023-2024 in review"]]></title><description><![CDATA[
<p>I've worked on PAIP, and I think the GitHub.com version - <a href="https://github.com/norvig/paip-lisp/">https://github.com/norvig/paip-lisp/</a> - gets more attention than the GitHub.io version linked here. The GitHub.io version automatically gets updates, I think, but I'm not verifying the Markdown works over there.</p>
]]></description><pubDate>Sat, 22 Feb 2025 15:17:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=43139702</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43139702</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43139702</guid></item><item><title><![CDATA[New comment by pronoiac in "Ask HN: What is the best method for turning a scanned book as a PDF into text?"]]></title><description><![CDATA[
<p>It's still in progress! It's looong - about a thousand pages. There's an ebook, but the printed book got more editing.</p>
]]></description><pubDate>Sun, 16 Feb 2025 18:43:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=43070427</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43070427</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43070427</guid></item><item><title><![CDATA[New comment by pronoiac in "Ask HN: What is the best method for turning a scanned book as a PDF into text?"]]></title><description><![CDATA[
<p>I made a high-quality scan of PAIP (Paradigms of Artificial Intelligence Programming), and worked on OCR'ing and incorporating that into an admittedly imperfect git repo of Markdown files. I used Scantailor to deskew and do other adjustments before applying Tesseract, via OCRmyPDF. I wrote notes for some of my process over at <a href="https://github.com/norvig/paip-lisp/releases/tag/v1.2">https://github.com/norvig/paip-lisp/releases/tag/v1.2</a> .<p>I'd also tried ocrit, which uses Apple's Vision framework for OCR, with some success - <a href="https://github.com/insidegui/ocrit">https://github.com/insidegui/ocrit</a><p>It's an ongoing, iterative process. I'll watch this thread with interest.<p>Some recent threads that might be helpful:<p>* <a href="https://news.ycombinator.com/item?id=42443022">https://news.ycombinator.com/item?id=42443022</a> - Show HN: Adventures in OCR<p>* <a href="https://news.ycombinator.com/item?id=43045801">https://news.ycombinator.com/item?id=43045801</a> - Benchmarking vision-language models on OCR in dynamic video environments - driscoll42 posted some stats from research<p>* <a href="https://news.ycombinator.com/item?id=43043671">https://news.ycombinator.com/item?id=43043671</a> - OCR4all<p>(Meaning, I have these browser tabs open, I haven't fully digested them yet)</p>
]]></description><pubDate>Sun, 16 Feb 2025 18:08:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=43070126</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=43070126</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43070126</guid></item><item><title><![CDATA[New comment by pronoiac in "Douglas McIlroy responds to Unix spell article with new implementation details"]]></title><description><![CDATA[
<p>> The compression was trivial: store a suffix preceded by one byte that contained the length of the prefix that the word shared with its predecessor in dictionary order.<p>Oh, that looks familiar; the database for the locate command uses something similar - <a href="https://www.gnu.org/software/findutils/manual/html_node/find_html/LOCATE02-Database-Format.html" rel="nofollow">https://www.gnu.org/software/findutils/manual/html_node/find...</a></p>
]]></description><pubDate>Sun, 09 Feb 2025 23:37:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=42995298</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=42995298</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42995298</guid></item><item><title><![CDATA[New comment by pronoiac in "The Taylorator – All Your Frequencies Are Belong to Us"]]></title><description><![CDATA[
<p>Covering all frequencies? No Blank Space?</p>
]]></description><pubDate>Tue, 28 Jan 2025 02:07:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=42848131</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=42848131</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42848131</guid></item><item><title><![CDATA[New comment by pronoiac in "Migrating Away from Bcachefs"]]></title><description><![CDATA[
<p>> Maybe yours did much worse because you aren't splitting files into subdirectories but creating them all in one?<p>No, and also, I'd expect that to be awful. 1000 folders, each with 1000 folders, each with 1000 files.<p>Those Arxiv and Phoronix links are great!</p>
]]></description><pubDate>Fri, 24 Jan 2025 05:17:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=42810728</link><dc:creator>pronoiac</dc:creator><comments>https://news.ycombinator.com/item?id=42810728</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42810728</guid></item></channel></rss>