<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: UglyToad</title><link>https://news.ycombinator.com/user?id=UglyToad</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 16 Apr 2026 23:52:43 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=UglyToad" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by UglyToad in "So you want to parse a PDF?"]]></title><description><![CDATA[
<p>Yes this is generally the fallback approach if finding the objects via the index (xref) fails. It is slightly slower but it's a one time cost, though I imagine it was a lot slower back when PDFs were first used on the machines of the time.</p>
]]></description><pubDate>Mon, 04 Aug 2025 14:21:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=44786189</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=44786189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44786189</guid></item><item><title><![CDATA[New comment by UglyToad in "So you want to parse a PDF?"]]></title><description><![CDATA[
<p>If you don't have a known set of PDF producers this is really the only way to safely consume PDF content. Type 3 fonts alone make pulling text content out unreliable or impossible, before even getting to PDFs containing images of scans.<p>I expect the current LLMs significantly improve upon the previous ways of doing this, e.g. Tesseract, when given an image input? Is there any test you're aware of for model capabilities when it comes to ingesting PDFs?</p>
]]></description><pubDate>Mon, 04 Aug 2025 00:08:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=44780954</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=44780954</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780954</guid></item><item><title><![CDATA[New comment by UglyToad in "So you want to parse a PDF?"]]></title><description><![CDATA[
<p>You're right, this was a fairly common failure state seen in the sample set. The previous reference or one in the reference chain would point to offset of 0 or outside the bounds of the file, or just be plain wrong.<p>What prompted this post was trying to rewrite the initial parse logic for my project PdfPig[0]. I had originally ported the Java PDFBox code but felt like it should be 'simple' to rewrite more performantly. The new logic falls back to a brute-force scan of the entire file if a single xref table or stream is missed and just relies on those offsets in the recovery path.<p>However it is considerably slower than the code before it and it's hard to have confidence in the changes. I'm currently running through a 10,000 file test-set trying to identify edge-cases.<p>[0]: <a href="https://github.com/UglyToad/PdfPig/pull/1102">https://github.com/UglyToad/PdfPig/pull/1102</a></p>
]]></description><pubDate>Mon, 04 Aug 2025 00:03:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=44780927</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=44780927</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780927</guid></item><item><title><![CDATA[New comment by UglyToad in "So you want to parse a PDF?"]]></title><description><![CDATA[
<p>Yes, you're right there are Linearized PDFs which are organized to enable parsing and display of the first page(s) without having to download the full file. I skipped those from the summary for now because they have a whole chunk of an appendix to themselves.</p>
]]></description><pubDate>Sun, 03 Aug 2025 23:23:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=44780685</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=44780685</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780685</guid></item><item><title><![CDATA[So you want to parse a PDF?]]></title><description><![CDATA[
<p>Article URL: <a href="https://eliot-jones.com/2025/8/pdf-parsing-xref">https://eliot-jones.com/2025/8/pdf-parsing-xref</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44780353">https://news.ycombinator.com/item?id=44780353</a></p>
<p>Points: 408</p>
<p># Comments: 230</p>
]]></description><pubDate>Sun, 03 Aug 2025 22:24:29 +0000</pubDate><link>https://eliot-jones.com/2025/8/pdf-parsing-xref</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=44780353</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780353</guid></item><item><title><![CDATA[New comment by UglyToad in "Americans see their savings vanish in Synapse fintech crisis"]]></title><description><![CDATA[
<p>FWIW they are acting, these things just take a while, current phase of gathering comments ends December 2nd <a href="https://www.fdic.gov/news/press-releases/2024/fdic-proposes-deposit-insurance-recordkeeping-rule-banks-third-party" rel="nofollow">https://www.fdic.gov/news/press-releases/2024/fdic-proposes-...</a></p>
]]></description><pubDate>Sat, 23 Nov 2024 09:00:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=42219905</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=42219905</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42219905</guid></item><item><title><![CDATA[New comment by UglyToad in "GLP-1 for Everything"]]></title><description><![CDATA[
<p>The point, which seems to be routinely massively downvoted on here, is that both things can be true at once:<p>- these drugs are good and a paradigm shift in the treatment of obesity (and have other benefits)<p>- we must not lose sight of the need to address a thoroughly sick food industry that necessitate so many people needing to use these. Junk food advertising, lack of subsidies for fresh vegetables, HFCS, food deserts, etc.<p>Chile is experimenting with banning junk food ads to children and is seeing some early behaviour changes.<p>The point which people seem to be wilfully missing is that we can have both these drugs and advocate for cracking down on a food system that deliberately poisons everyone in society. Having everyone be on this drug because we shrug and say "free market innit" while big corps continue to feed us crap is not a solution, obviously.</p>
]]></description><pubDate>Tue, 29 Oct 2024 21:53:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=41989804</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41989804</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41989804</guid></item><item><title><![CDATA[China's social credit score – untangling myth from reality (2022)]]></title><description><![CDATA[
<p>Article URL: <a href="https://merics.org/en/comment/chinas-social-credit-score-untangling-myth-reality">https://merics.org/en/comment/chinas-social-credit-score-untangling-myth-reality</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41147061">https://news.ycombinator.com/item?id=41147061</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 03 Aug 2024 15:15:31 +0000</pubDate><link>https://merics.org/en/comment/chinas-social-credit-score-untangling-myth-reality</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41147061</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41147061</guid></item><item><title><![CDATA[New comment by UglyToad in "C# almost has implicit interfaces"]]></title><description><![CDATA[
<p>The idea with the named delegate would be if you need some way to:<p><pre><code>    delegate Task<string> GetUserEmail(int userId);
</code></pre>
This provides more guidance than taking in a:<p><pre><code>    Func<int, Task<string>> getUserEmail
</code></pre>
If you can annotate implementations of the delegate the tooling support becomes even nicer. Not all Funcs with the same shape have the same semantics, in my ideal C#-like language.<p>Edit: I completely forgot the main reason which is if using a DI container it can inject the named delegate for you correctly in the constructor. Versus only being able to register a single func shape per container.</p>
]]></description><pubDate>Fri, 26 Jul 2024 21:03:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=41082260</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41082260</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41082260</guid></item><item><title><![CDATA[New comment by UglyToad in "C# almost has implicit interfaces"]]></title><description><![CDATA[
<p>My aim is to use dependency injection to inject the minimal dependency and nothing more. Versus the grab bag every interface in a medium-complexity C# project eventually devolves into.<p>I've had this on my blogpost-to-write backlog for a year at this point but in every project I've worked on an interface eventually becomes a holding zone for related but disparate concepts. And so injecting the whole interface it becomes unclear what the dependency actually <i>is</i>.<p>E.g. you have some service that does data access for users, then someone adds some Salesforce stuff, or a notification call or whatever. Now any class consuming that service could be doing a bunch of different things.<p>The idea is basically single method interfaces without the overhead of writing the interface. Just being able to pass around free functions but with the superior DevX most C# tools offer.<p>I guess I want a more functional C# without having to learn F# which I've tried a few times and bounced off.</p>
]]></description><pubDate>Fri, 26 Jul 2024 21:00:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=41082228</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41082228</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41082228</guid></item><item><title><![CDATA[Alfredo Moser: Bottle light inventor (2013)]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.bbc.co.uk/news/magazine-23536914">https://www.bbc.co.uk/news/magazine-23536914</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41082058">https://news.ycombinator.com/item?id=41082058</a></p>
<p>Points: 58</p>
<p># Comments: 14</p>
]]></description><pubDate>Fri, 26 Jul 2024 20:38:03 +0000</pubDate><link>https://www.bbc.co.uk/news/magazine-23536914</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41082058</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41082058</guid></item><item><title><![CDATA[New comment by UglyToad in "C# almost has implicit interfaces"]]></title><description><![CDATA[
<p>I've been experimenting with this, it makes testing trivial and removes the coupling that inevitably occurs with multi method interfaces.<p>However I think there's one missing enhancement that would turn it from esoteric and difficult to reason about to actually usable that the language will never get.<p>This is being able to indicate a method implements a delegate so that compilation errors and finding references work much more easily.<p>E.g. suppose you have:<p><pre><code>    delegate Task<string> GetEntityName(int id)

    public async Task<string> MyEntityNameImpl(int id)
</code></pre>
I'd love to be able to mark the method:<p><pre><code>    public async Task<string> MyEntityNameImpl(int id) : GetEntityName
</code></pre>
This could just be removed on compile but it would make the tooling experience much better in my view when you control the delegate implementations and definitions.</p>
]]></description><pubDate>Fri, 26 Jul 2024 06:50:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=41076352</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41076352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41076352</guid></item><item><title><![CDATA[New comment by UglyToad in "Database Design for Google Calendar: A Tutorial"]]></title><description><![CDATA[
<p>Having built recurring stuff in the past (date based with no time component, luckily for me) I think you gain a lot of usability gains for generating a row for each occurrence of the event.<p>Inevitably the user will come back and say "oh, I want it monthly except this specific instance" or if it's a time based event "this specific one should be half an hour later". You could just store the exceptions to the rule as their own data-structure but then you need to correlate the exception to the scheduler 'tick' and if they can edit the schedule, well, you're S.O.O.L either way but I think having concrete occurrences is potentially easier to recover from.</p>
]]></description><pubDate>Tue, 23 Jul 2024 12:34:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=41045307</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=41045307</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41045307</guid></item><item><title><![CDATA[New comment by UglyToad in "Double-entry bookkeeping as a directed graph"]]></title><description><![CDATA[
<p>But the problem is the accounting jargon is counter (contra?) to the layman's gut understanding.<p>If I get credited or I use a credit card money came from nowhere, woohoo. If I have a debit well that sounds like debt and my money decreased, boo.<p>I get that actually there's a good reason for the names but a field that doggedly sticks to non intuitive jargon that runs counter to every usage yet encountered for outsiders could do with some different non-overloaded terms.</p>
]]></description><pubDate>Wed, 10 Apr 2024 14:57:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=39991415</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=39991415</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39991415</guid></item><item><title><![CDATA[Supreme Commander Graphics Study (2015)]]></title><description><![CDATA[
<p>Article URL: <a href="http://www.adriancourreges.com/blog/2015/06/23/supreme-commander-graphics-study/">http://www.adriancourreges.com/blog/2015/06/23/supreme-commander-graphics-study/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=39979140">https://news.ycombinator.com/item?id=39979140</a></p>
<p>Points: 133</p>
<p># Comments: 62</p>
]]></description><pubDate>Tue, 09 Apr 2024 13:18:34 +0000</pubDate><link>http://www.adriancourreges.com/blog/2015/06/23/supreme-commander-graphics-study/</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=39979140</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39979140</guid></item><item><title><![CDATA[New comment by UglyToad in "BYD's new EV, starting under $10K, is stoking fear among rivals"]]></title><description><![CDATA[
<p>Sounds sort of like the Citroen Ami? Slightly below your ideal range and quite expensive too but the same concept.</p>
]]></description><pubDate>Sat, 23 Mar 2024 11:24:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=39799104</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=39799104</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39799104</guid></item><item><title><![CDATA[New comment by UglyToad in "Ask HN: Habits to avoid small mistakes in PRs"]]></title><description><![CDATA[
<p>For sure as others have mentioned review it yourself in the web interface of whichever PR tool you use. (Generally after having a break first)<p>This is something of a superpower for catching things before review and in my experience makes the actual third party review pointless 95% of the time.</p>
]]></description><pubDate>Thu, 25 Jan 2024 17:38:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=39132193</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=39132193</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39132193</guid></item><item><title><![CDATA[New comment by UglyToad in "A TV Show Forced Britain's Devastating Post Office Scandal into the Light"]]></title><description><![CDATA[
<p>As the sibling comment says (max reply depth reached) the sub-postmasters in general were just about getting by. Part of why this scandal is so egregious is that these people were often just-about-managing under terms of an incredibly unfair contract with the PO.<p>The amounts of money involved may be small to business owners in other domains but for SPMs many were almost bankrupted by trying to replace the sums out of their own earnings as you say. This wouldn't have been a rounding error to their lifestyle, it should have been provable to any halfway decent investigator. And if not? Then they get away with it and it's the price we pay not to live in tyranny.</p>
]]></description><pubDate>Thu, 11 Jan 2024 16:50:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=38954794</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=38954794</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38954794</guid></item><item><title><![CDATA[New comment by UglyToad in "A TV Show Forced Britain's Devastating Post Office Scandal into the Light"]]></title><description><![CDATA[
<p>I think any (financial crime) case built solely on computer evidence is too weak to be prosecuted, even if that means you end up accepting some non-zero amount of financial crime. "it is better a hundred guilty persons should escape than one innocent person should suffer" as someone once said.<p>In the case of the sub-postmasters the Post Office, as far as I'm aware, never proved where these stolen sums supposedly went. The computer evidence was thought terminating and was the only thing (except false confessions under duress) used to secure these convictions, rather than proper investigative work.</p>
]]></description><pubDate>Thu, 11 Jan 2024 14:49:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=38952877</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=38952877</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38952877</guid></item><item><title><![CDATA[New comment by UglyToad in "A TV Show Forced Britain's Devastating Post Office Scandal into the Light"]]></title><description><![CDATA[
<p>The other complication was that a change in the law was made to assume computer systems were correct by default due to the problems with the existing law around breathalysers and speed cameras:<p>"In 1997 the Law Commission published a paper which went into some detail about the use of mechanical and computer evidence in court. It seemed a little too fixated with the effective workings of speedometers, traffic lights and breathalysing devices called ‘Intoximeters.’ It concluded that the present law is ‘unsatisfactory’ because of the necessity for prosecutors to ‘prove that the computer is reliable.’"[0]<p>The amended law changed the burden of proof from the prosecution proving the system functioned correctly to the defence proving it didn't, without access to the systems being used to prosecute them.<p>[0]: The Great Post Office Scandal, Nick Wallis</p>
]]></description><pubDate>Thu, 11 Jan 2024 14:17:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=38952437</link><dc:creator>UglyToad</dc:creator><comments>https://news.ycombinator.com/item?id=38952437</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38952437</guid></item></channel></rss>