Hacker News: dperfect

New comment by dperfect in "Recreating Epstein PDFs from raw encoded attachments"

dperfect — Fri, 06 Feb 2026 20:04:22 +0000

That's true if we're correcting OCR of actual output text. In this case, it's operating on the base 64 text, trying to produce chunks that form valid zlib streams and PDF syntax so the file can be intact enough to be opened. "Just accepting errors" would mean not seeing any content in the file because it cannot be read.

So yes, the "fixed" output has errors, but it’s not hallucinating details like an LLM, nor is it trying to produce output that conforms to any linguistic or stylistic heuristics.

The phrase "correcting similar OCR'd PDFs" should have been "correcting similar OCR'd base 64 representations of PDFs".

New comment by dperfect in "Recreating Epstein PDFs from raw encoded attachments"

dperfect — Fri, 06 Feb 2026 18:06:12 +0000

Letting Claude work a little longer produced this behemoth of a script (which is supposed to be somewhat universal in correcting similar OCR'd PDFs - not yet tested on any others though): https://pastebin.com/PsaFhSP1

which uses this Rust zlib stream fixer: https://pastebin.com/iy69HWXC

and gives the best output I've seen it produce: https://imgur.com/itYWblh

This is using the same OCR'd text posted by commenter Joe.

New comment by dperfect in "Recreating Epstein PDFs from raw encoded attachments"

dperfect — Fri, 06 Feb 2026 04:25:19 +0000

Screenshot: https://imgur.com/eWCfYYd

New comment by dperfect in "Recreating Epstein PDFs from raw encoded attachments"

dperfect — Fri, 06 Feb 2026 01:26:50 +0000

Nerdsnipe confirmed :)

Claude Opus came up with this script:

https://pastebin.com/ntE50PkZ

It produces a somewhat-readable PDF (first page at least) with this text output:

https://pastebin.com/SADsJZHd

(I used the cleaned output at https://pastebin.com/UXRAJdKJ mentioned in a comment by Joe on the blog page)

New comment by dperfect in "Show HN: We built an open source, zero webhooks payment processor"

dperfect — Tue, 25 Nov 2025 18:29:20 +0000

It looks like this does make some things easier, but I'm not sure if it's actually better.

From what I can tell, any time you use this to check something like the customer's subscription state (or anything else payment-related) - either from the front end or the back end - it's going to perform an API request to Flowglad's servers. If you care about responsiveness, I'm not sure that's a good idea. Of course, you can cache that state if you need to access it frequently, but then it kind of defeats the purpose of this layer.

Stripe integration can be tricky, but if you don't want to store anything locally, you might as well just hit Stripe's APIs without the middleman. For the payment systems I've worked on, having cached state in the database is actually really nice, even if it's a bit more work. Want to do a complicated query on your customers based on payment/subscription state and a bunch of other criteria? It's just a DB query. With this, I think you'll be hoping they expose an API to query what you need and how you need it. Otherwise, you'll be stuck waiting for a thousand API requests to fetch the state of each of your customers.

New comment by dperfect in "Stripe Launches L1 Blockchain: Tempo"

dperfect — Thu, 04 Sep 2025 20:18:38 +0000

It sounds great, but every time I see this argument, I end up going down the rabbit hole of actually studying how stablecoins operate. And every time, I come to the same conclusion: they always rely on trust in an off-chain oracle or custodian. At that point, a shared ledger implemented with traditional databases / protocols would be faster, easier, and more transparent.

Bitcoin (and possibly a few others) is one of the few uses of blockchain that actually makes sense. The blockchain serves the currency, and the currency serves the blockchain. The blockchain exists to provide consensus without needing to trust any off-chain entity, but the blockchain relies on computing infrastructure that has real-world costs. The scarcity of Bitcoin (the currency) and arguably-fictitious reward for participation in mining is the incentive for people in the real world to contribute resources required for the blockchain to function.

Any real-world value given to Bitcoin is secondary and only a result of the fact that (1) mining infrastructure has a cost, and (2) people who understand the system have realized that, unlike fiat, stablecoins, or 1000 other crypto products, Bitcoin has no reliance on trusted, off-chain entities who could manipulate it.

You trust your stablecoin's issuer that they hold enough fiat in reserve to match the coin? You might as well trust your bank, but while you're at it, remind them that they don't have to take days to process a transaction - they could process transactions as fast as (actually faster than) a blockchain. But I imagine most banks would point to regulation as a reason for the delays, and they might be right.

So what are stablecoins really trying to do? Circumvent regulation? Implement something the banks just aren't willing to do themselves?

New comment by dperfect in "Stripe Launches L1 Blockchain: Tempo"

dperfect — Thu, 04 Sep 2025 17:12:33 +0000

Exactly. The only way for this to deliver on its goals would be for it to not be public or permissionless. And if that's the case, then it should really just be a database and/or a shared protocol between financial institutions.

Once it's truly "open", you can't have any sensitive identifiers in there, so you need another protocol/system for correlating opaque identifiers with real-world entities (thus defeating the purpose).

And if financial institutions are involved, they'll want the ability to do what they do now: rewrite history whenever they feel the need (or are compelled by governments). Another strike against using blockchain.

New comment by dperfect in "AV1@Scale: Film Grain Synthesis, The Awakening"

dperfect — Fri, 04 Jul 2025 03:52:30 +0000

This is a really good point.

To illustrate the temporal aspect: consider a traditional film projector. Between every frame, we actually see complete darkness for a short time. We could call that darkness "noise", and if we were to linger on that moment, we'd see nothing of the original signal. But since our visual systems tend to temporally average things out to a degree, we barely even notice that flicker (https://en.wikipedia.org/wiki/Flicker_fusion_threshold). I suspect noise and grain are perceived in a similar way, where they become less pronounced compared to the stable parts of the signal/image.

Astrophotographers stack noisy images to obtain images with higher SNR. I think our brains do a bit of that too, and it doesn't mean we're hallucinating detail that isn't there; the recorded noise - over time - returns to the mean, and that mean represents a clearer representation of the actual signal (though not entirely, due to systematic/non-random noise, but that's often less significant).

Denoising algorithms that operate on individual frames don't have that context, so they will lose detail (or will try to compensate by guessing). AV1 doesn't specify a specific algorithm to use, so I suppose in theory, a smart algorithm could use the temporal context to preserve some additional detail.

New comment by dperfect in "AV1@Scale: Film Grain Synthesis, The Awakening"

dperfect — Thu, 03 Jul 2025 18:46:24 +0000

I agree. However, let's look at it practically. Let's assume someone is watching content streamed on a low bandwidth connection. As a content creator, what version of the compressed content would you rather your audience experience:

a) Compressed original with significant artifacts from the codec trying to represent original grain

b) A denoised version with fewer compression artifacts, but looks "smoothed" by the denoising

c) A denoised version with synthesized grain that looks almost as good as the original, though the grain doesn't exactly match

I personally think the FGS needs better grain simulation (to look more realistic), but even in its current state, I think I'd probably go with choice C. I'm all for showing the closest thing to the author's intent. We just need to remember that compression artifacts are not the author's intent.

In an ideal world where we can deliver full, uncompressed video to everyone, then obviously - don't mess with it at all!

New comment by dperfect in "AV1@Scale: Film Grain Synthesis, The Awakening"

dperfect — Thu, 03 Jul 2025 18:28:14 +0000

To the comments hating on grain: everything naturally has some amount of noise or grain - even the best digital sensors. Heck, even your eyes do. It's useful beyond just aesthetics. It tends to increase perceived sharpness and hides flaws like color banding and compression artifacts.

That's not to say that all noise and grain is good. It can be unavoidable, due to inferior technology, or a result of poor creative choices. It can even be distracting. But the alternative where everything undergoes denoising (which many of our cameras do by default now) is much worse in my opinion. To my eyes, the smoothing that happens with denoising often looks unrealistic and far more distracting.

New comment by dperfect in "AV1@Scale: Film Grain Synthesis, The Awakening"

dperfect — Thu, 03 Jul 2025 17:45:35 +0000

> both it and the synthesized grain image look noticeably less sharp than the source

That's true, but at a given bitrate (until you get to very high bitrates), the compressed original will usually look worse and less sharp because so many bits are spent trying to encode the original grain. As a result, that original grain tends to get "smeared" over larger areas, making it look muddy. You lose sharpness in areas of the actual scene because it's trying (and often failing) to encode sharp grains.

Film Grain Synthesis makes sense for streaming where bandwidth is limited, but I'll agree that in the examples, the synthesized grain doesn't look very grain-like. And, depending on the amount and method of denoising, it can definitely blur details from the scene.

New comment by dperfect in "Debunking HDR [video]"

dperfect — Tue, 17 Jun 2025 14:41:23 +0000

I should have worded it like this: I prefer to watch movies in UHD, even though they're usually also HDR.

I don't think anyone is saying HDR isn't really HDR. It obviously does support higher dynamic range, but it's kind of like giving someone directions in a language they don't speak. Technically, the spec does what it claims to do, but it adds unnecessary requirements on the display side that undermine a lot of the benefit. As a result, you have filmmakers forced to pick an arbitrary level of "scene white", which in turn means that displays aren't using their full range of brightness as effectively as they could. It also means that most TVs and projectors have to implement their own version of "tone mapping", some of which are pretty terrible to be honest. Not just terrible for HDR, but worse than SDR.

New comment by dperfect in "Debunking HDR [video]"

dperfect — Mon, 16 Jun 2025 16:41:39 +0000

I assume you've watched the "Wider Gamut Misinformation" portion of the video. You're correct in saying that HDR normally uses Rec. 2020 (because Rec. 2100 points to Rec. 2020's color primaries), but Steve points out that Rec. 2020 is an SDR color space which doesn't technically require HDR.

It's true that, given the options available today, the two usually go hand-in-hand (wider color gamut and HDR). However, one of the arguments Steve makes is that a majority of content (and a vast majority of the pixels in that content) doesn't use colors outside Rec. 1886's gamut. Illuminated objects (natural or manmade) almost never go outside that, so you're usually only talking about a few pixels from intensely-saturated light sources in the shot (like LEDs) that might use those colors. Even then, not a lot of filmmakers feel the need to go there, so their movies will look the same in narrow and wide gamuts.

I don't think the video is arguing against wider gamut or even higher dynamic range as options; modern displays are more capable than older ones, so we need tools to allow content creators to use that capability if they desire. The problem is that all of these things (color space, bit depth, transfer function, absolute luminance values, etc) have been lumped together under one label, "HDR", and some of the implementation details are actually worse than what we had with SDR. If you skip to the "Checklist Recap" portion of the video, you'll see that there are actually quite a few downsides to HDR in its current form, but since most of the standards are tightly coupled, we're kind of stuck unless we move to something better.

I also personally choose HDR versions when watching movies, but that's because UHD content is usually also HDR. What I really want is the higher resolution. I've never felt like I'd be missing out if it didn't have HDR because I've compared the two a lot - they're really mostly the same for the movies I watch with a properly calibrated screen. To each their own :)

New comment by dperfect in "Debunking HDR [video]"

dperfect — Sun, 15 Jun 2025 04:20:32 +0000

It's hard to boil it down to a simple thesis because the problem is complicated. He admits this in the presentation and points to it being part of the problem itself; there are so many technical details that have been met with marketing confusion and misunderstanding that it's almost impossible to adequately explain the problem in a concise way. Here's my takeaway:

- It was clearly a mistake to define HDR transfer functions using absolute luminance values. That mistake has created a cascade of additional problems

- HDR is not what it was marketed to be: it's not superior in many of the ways people think it is, and in some ways (like efficiency) it's actually worse than SDR

- The fundamental problems with HDR formats have resulted in more problems: proprietary formats like Dolby Vision attempting to patch over some of the issues (while being more closed and expensive, yet failing to fully solve the problem), consumer devices that are forced to render things worse than they might be in SDR due to the fact that it's literally impossible to implement the spec 100% (they have to make assumptions that can be very wrong), endless issues with format conversions leading to inaccurate color representation and/or color banding, and lower quality streaming at given bit rates due to HDR's reliance on higher bit depths to achieve the same tonal gradation as SDR

- Not only is this a problem for content delivery, but it's also challenging in the content creation phase as filmmakers and studios sometimes misunderstand the technology, changing their process for HDR in a way that makes the situation worse

Being somewhat of a film nerd myself and dealing with a lot of this first-hand, I completely agree with the overall sentiment and really hope it can get sorted out in the future with a more pragmatic solution that gives filmmakers the freedom to use modern displays more effectively, while not pretending that they should have control over things like the absolute brightness of a person's TV (when they have no idea what environment it might be in).

New comment by dperfect in "Supabase Storage v3: Resumable Uploads with support for 50GB files"

dperfect — Wed, 12 Apr 2023 17:58:47 +0000

Awesome to see supabase constantly improving. Been using supabase for the past few weeks and have really enjoyed it!

I was a bit surprised, however, that there's not currently a good way to reference storage objects from my postgres tables. I found that the recommended way is to store the object's path (as a string) in the database. While that works, it isn't optimal as I'd like to enforce consistency between the object and the table referencing it.

I've tried referencing the id of the corresponding row in the storage.objects table, but (1) apparently the schema supabase uses to manage storage.objects may change, and (2) it still requires separate (non-atomic) operations - or additional triggers - for keeping things in sync. Using buckets (corresponding to tables) and folders with ids is another way to work around it, but still feels suboptimal.

Not 100% sure what the best solution would look like, but ideally the supabase client could emulate storage operations for objects "attached" to a given table record, and supabase (the backend piece) could implement them as atomic operations (e.g., uploading the actual storage asset, storing the necessary metadata, and updating my table row to reference the newly-created storage object; exposing a helper function to return the URLs for any storage objects attached to a record; etc).

Anyway, just a suggestion. Keep up the great work!

New comment by dperfect in "“Terrascope”: The possibility of using the Earth as an atmospheric lens (2019)"

dperfect — Thu, 30 Mar 2023 04:56:50 +0000

Does this mean there's a location in space where – on the side of the earth opposite to the sun – the sun's light is focused into a point of extremely high intensity due to the earth's atmosphere acting like a magnifying glass?

If so, that could make for either a very unfortunate surprise (i.e. a spacecraft passing through that point suddenly melting to a crisp) or an interesting source of energy if it could be harnessed.

New comment by dperfect in "Apple Music Sing"

dperfect — Tue, 06 Dec 2022 18:40:07 +0000

Maybe it's licensing? I can imagine copyright holders being squeamish about Apple processing, permanently storing, and serving heavily altered versions of their music. The difference is silly and pedantic, but by processing it in real-time during playback, one might argue it's just a filter effect like EQ.

New comment by dperfect in "iPhone 14 Pro Camera Review: Scotland"

dperfect — Thu, 15 Sep 2022 03:01:19 +0000

They are DNG files, but I don't think they're truly raw like one might expect from other cameras[1]. I just downloaded the DNG posted in the article and took a look in Photoshop with all post-processing turned off. Definitely looks a bit processed in some of the background details, but I could be wrong.

Specifically, the out-of-focus shadow details look like a smoothing or denoising algorithm has been applied. Maybe it's just something about the optics Apple is using (e.g. different lenses can have distinct characteristics in bokeh, softening, color, distortion, etc) vs other dedicated cameras, but it's something I see in almost all photos coming from an iPhone.

[1] https://kirkville.com/apples-new-proraw-photo-format-is-neit...

Heroku Down for Anyone Else?

dperfect — Mon, 15 Aug 2022 22:20:40 +0000

Some of my Heroku apps are working, and others just stopped responding to requests. Nothing showing on https://status.heroku.com, but quite a few people on Twitter seem to be reporting issues.

Edit: apps appear to be coming back online now

Comments URL: https://news.ycombinator.com/item?id=32476230

Points: 15

# Comments: 9

New comment by dperfect in "This is the Way: Invisible Jenkins"

dperfect — Wed, 25 May 2022 16:23:26 +0000

Yes, if you also need users to log into Jenkins from outside the private network (without a VPN), it sounds like OpenZiti would be a good option. In my case, Jenkins is only used from within the LAN. The SQS solution authenticates GitHub webhooks using the sha256 hmac signature (not by IP), and no inbound ports need to be open.