Hacker News: caffeinated_me

New comment by caffeinated_me in "An Update on Heroku"

caffeinated_me — Fri, 06 Feb 2026 21:40:36 +0000

Looks like that still has downtime for a Postgres migration- you're suggesting going into maintenance mode and just doing a dump/restore. I've seen that take hours once you hit the terabyte scale, depending on hardware.

I've had pretty good luck setting up logical replication from Heroku to the new provider and having a 10-15 minute maintenance window to catch up once it's in sync. Might be worth considering.

You might also want to add a warning about Postgres versions. There's some old bugs around primary key hash functions that can cause corruption on a migration. I've seen it twice when migrating from Heroku to other vendors.

New comment by caffeinated_me in "Show HN: Managed Postgres with native ClickHouse integration"

caffeinated_me — Mon, 26 Jan 2026 19:47:14 +0000

Thanks! Out of curiosity, does the NVME have a big effect on replication throughput? I've been wondering how much trouble I've had with other solutions is due to parsing WAL and how much is just slow cloud disk

New comment by caffeinated_me in "Show HN: Managed Postgres with native ClickHouse integration"

caffeinated_me — Mon, 26 Jan 2026 19:07:52 +0000

It sounds like you're doing something similar to how Databricks works now that they've acquired neon, or Snowflake now that they got Crunchy. I'm guessing the local SSD is a big advantage, but what else is different with your approach?

New comment by caffeinated_me in "US Coast Guard Report on Titan Submersible"

caffeinated_me — Tue, 05 Aug 2025 16:56:56 +0000

I've linked this elsewhere in thread, but here's testing results from a US Navy pilot project for carbon fiber unmanned subs. It looks like this found it pretty viable.

https://apps.dtic.mil/sti/pdfs/ADA270438.pdf

New comment by caffeinated_me in "US Coast Guard Report on Titan Submersible"

caffeinated_me — Tue, 05 Aug 2025 16:55:17 +0000

This isn't the first carbon fiber submarine, although it is the first manned one. The US Navy tried out an unmanned model in the 80s, and got much better results- they were expecting at least 1000 successful dives before stress fatigue was an issue.

Here's a detailed report on it. Pages 32-33 has their take on material analysis, probably the most relevant to this failure

https://apps.dtic.mil/sti/pdfs/ADA270438.pdf

I'm personally more suspicious of oceangates manufacturing process than the material, but I'm far from an expert here.

New comment by caffeinated_me in "When Sigterm Does Nothing: A Postgres Mystery"

caffeinated_me — Tue, 15 Jul 2025 17:33:43 +0000

Good find! I've seen similar behavior before and was wondering why it wasn't easy to stop.

This isn't the only place Postgres can act like this, though. I've seen similar behavior when a foreign data wrapper times out or loses connection, and had to resort to either using kill -9 or attaching to the process using a debugger and closing the socket, which oddly enough also worked.

Might be worth generalizing this approach to also handle that kind of failure

New comment by caffeinated_me in "Migrating to Postgres"

caffeinated_me — Thu, 15 May 2025 00:52:16 +0000

I'd argue that horizontally sharded databases can work well, but they do tend to have significant non obvious tradeoffs that can be pretty painful.

There's a handful of companies that have scaled Citus past 1PB for production usage, but the examples I'm aware of all had more engineering to avoid capability or architecture limitations than one might like. I'd love to see someone come back with a fresh approach that covered more use cases effectively.

Disclaimer: former Citus employee

New comment by caffeinated_me in "Making Postgres scale"

caffeinated_me — Fri, 14 Mar 2025 23:14:41 +0000

It can be great, depending on your schema and planned growth. Questions I'd be asking in your shoes:

1. Does the schema have an obvious column to use for distribution? You'll probably want to fit one of the 2 following cases, but these aren't exclusive:

    1a. A use case where most traffic is scoped to a subset of data. (e.g. a multitenant system). This is the easiest use case- just make sure most of your queries contain the column (most likely tenant ID or equivalent), and partially denormalize to have it in tables where it's implicit to make your life easier. Do not use a timestamp. 

    1b. A rollup/analytics based use case that needs heavy parallelism (e.g. a large IoT system where you want to do analytics across a fleet). For this, you're looking for a column that has high cardinality witout too many major hot spots- in the IoT use case mentioned, this would probably be a device ID or similar

2. Are you sure you're going to grow to the scale where you need Citus? Depending on workload, it's not too hard to have a 20TB single-server PG database, and that's more than enough for a lot of companies these days.

3. When do you want to migrate? Logical replication in should work these days (haven't tested myself), but the higher the update rate and larger the database, the more painful this gets. There's not a lot of tools that are very useful for the more difficult scenarios here, but the landscape has changed since I've last had to do this

4. Do you want to run this yourself? Azure does offer a managed service, and Crunchy offers Citus on any cloud, so you have options.

5. If you're running this yourself, how are you managing HA? pg_auto_failover has some Citus support, but can be a bit tricky to get started with.

I did get my Citus cluster over 1 PB at my previous job, and that's not the biggest out out there, so there's definitely room to scale, but the migration can be tricky.

Disclaimer: former Citus employee

New comment by caffeinated_me in "Making Postgres scale"

caffeinated_me — Fri, 14 Mar 2025 20:58:31 +0000

For a multi-tenant use case, yeah, pretty close to thinking about partitioning.

For other use cases, there can be big gains from cross-shard queries that you can't really match with partitioning, but that's super use case dependent and not a guaranteed result.

New comment by caffeinated_me in "Making Postgres scale"

caffeinated_me — Fri, 14 Mar 2025 18:33:03 +0000

Depends on your schema, really. The hard part is choosing a distribution key to use for sharding- if you've got something like tenant ID that's in most of your queries and big tables, it's pretty easy, but can be a pain otherwise.

New comment by caffeinated_me in "You Can Make Postgres Scale"

caffeinated_me — Fri, 14 Mar 2025 18:09:35 +0000

Seems like this is a similar philosophy, but is missing a bunch of things the Citus coordinator provides. From the article, I'm guessing Citus is better at cross-shard queries, SQL support, central management of workers, keeping schemas in sync, and keeping small join tables in sync across the fleet, and provides a single point of ingestion.

That being said, this does seem to handle replicas better than Citus ever really did, and most of the features it's lacking aren't relevant for the sort of multitenant use case this blog is describing, so it's not a bad tradeoff. This also avoids the coordinator as a central point of failure for both outages and connection count limitations, but we never really saw those be a problem often in practice.

New comment by caffeinated_me in "Being overweight overtakes tobacco smoking as the leading disease risk factor"

caffeinated_me — Wed, 11 Dec 2024 18:56:00 +0000

Any recommendations on telehealth suppliers to contact for that compounded formulation? They're easy to find, but I'm not sure who is trustworthy on this topic.

New comment by caffeinated_me in "Reducing BigQuery Costs"

caffeinated_me — Tue, 06 Feb 2024 00:24:12 +0000

I've generally found something similar- lots of gotchas, but also some very useful products.

The best way I've found to approach it is to treat GCP as something that has to be evaluated at an individual service level. It's great if you're on one of their expected workflows/golden paths, and you can get lucky with a good fit if you aren't, but they seem to have a lot of unspoken assumptions and limits baked in that might or might not align with your use case.

Disclaimer: My use cases are pretty unusual from talking to our account rep, so this might be over-fitting to weird data.

New comment by caffeinated_me in "Too much serendipity"

caffeinated_me — Mon, 22 Jan 2024 21:09:51 +0000

I believe they're referring to the short story of The Road Not Taken, by Turtledove.

https://en.wikipedia.org/wiki/The_Road_Not_Taken_(short_stor...

PDF of story: https://www.eyeofmidas.com/scifi/Turtledove_RoadNotTaken.pdf

New comment by caffeinated_me in "Benchmarking Postgres Replication: PeerDB vs. Airbyte"

caffeinated_me — Tue, 10 Oct 2023 21:02:35 +0000

The article mentions a need for primary keys for data sync. Does anyone know if compound primary keys are currently supported? That's been a huge pain for me with the existing stack, so it would be nice to have an alternative.

New comment by caffeinated_me in "S.F. apartment rents fell again"

caffeinated_me — Thu, 02 Feb 2023 07:30:39 +0000

Was that break on Market Street, out of curiosity? After a related incident, I discovered that I was now the 5th co-worker a friend had that had broken a wrist or leg there.

New comment by caffeinated_me in "Issues with upstream DNS provider"

caffeinated_me — Wed, 24 Aug 2022 00:22:27 +0000

We did some trials and ended up on Crunchy Data for the Postgres part of the equation. Wish we had done so earlier, logical replication was a big win.

Sadly, we're still using Heroku DNS, but this should accelerate finding an alternative.

New comment by caffeinated_me in "FreeBSD has serious problems with focus, longevity and lifecycle (2012)"

caffeinated_me — Tue, 25 Jan 2022 00:19:54 +0000

I've seen ZFS being used with Postgres in a few different environments. Seems to work fine for the most part- surprisingly good compression (~8X in one case, usually lower), with the major downside being increased CPU usage when taking advantage of said compression.

I think that only one or two of those environments were heavily used production instances, so if there is a serious gotcha here it might not have been apparent to me.

New comment by caffeinated_me in "Error 404 (Not Found)"

caffeinated_me — Tue, 16 Nov 2021 18:11:01 +0000

My company was seeing GCP Airflow environments not responding, but they seem to have recovered in the past few minutes.

New comment by caffeinated_me in "How I recorded user behaviour on my competitor’s websites"

caffeinated_me — Thu, 23 Aug 2018 02:54:52 +0000

This seems to have some fairly scary security implications if used maliciously, but I can't think of a good way to protect against this.

Does anyone know of a browser extension to limit access to the history API?