Hacker News: cullenking

New comment by cullenking in "The bot situation on the internet is worse than you could imagine"

cullenking — Sun, 29 Mar 2026 17:22:07 +0000

We started building out a set of spam/fraud/bot management tooling. If you have any decent infrastructure in place already, this is a pretty manageable task with a mismash of techniques. ASN based blocking (ip lookup databases can be self hosted and contain ASN) for the obvious ones like alibaba etc, subnet blocking for the less obvious (see pattern, block subnet, alleviates but doesn't solve problems).

If you have a logging stack, you can easily find crawler/bot patterns, then flag candidate IP subnets for blocking.

It's definitely whackamole though. We are experimenting with blocking based on risk databases, which run between $2k and $10k a year depending on provider. These map IP ranges to booleans like is_vpn, is_tor, etc, and also contain ASN information. Slightly suspicious crawling behavior or keyword flagging combined with a hit in that DB, and you have a high confidence block.

All this stuff is now easy to homeroll with claude. Before it would have been a major PITA.

New comment by cullenking in "Making geo joins faster with H3 indexes"

cullenking — Sat, 07 Feb 2026 15:01:53 +0000

Doesn’t meet all our product requirements unfortunately. We used returned hexes in certain queries, and we also hacked in directionality of line using least significant 12 bits of the hex (didn’t need that level of hex precision), and we are doing direction oriented matching and counting. For simpler use cases it’s definitely a better option. thanks for reminding me and other people reading my comment!

New comment by cullenking in "Making geo joins faster with H3 indexes"

cullenking — Sat, 07 Feb 2026 14:57:26 +0000

Not any differently than another indexed text field

New comment by cullenking in "Making geo joins faster with H3 indexes"

cullenking — Sat, 07 Feb 2026 05:51:32 +0000

We do something similar for some limited geospatial search using elastic search. We make a set of h3 indexes for each of the hundreds of millions of gps recordings on our service, and store them in elastic search. Geospatial queries become full text search queries, where a point is on the line if the set of h3 indexes contains the point. You can do queries on how many cells overlap, which lets you match geospatial tracks on the same paths, and with ES coverage queries, you can tune how much overlap you want.

Instead of using integers IDs for the hexes, we created an encoded version of the ID that has the property that removing a character gets you the containing parent of the cell. This means we can do basic containment queries by querying with a low resolution hex (short string) as a prefix query. If a gps track goes through this larger parent cell, the track will have hexes with the same prefix. You don’t get perfect control of distances because hexes have varying diameters (or rather the approximation, since they aren’t circles they are hexes), but in practice and at scale for a product that doesn’t require high precision, it’s very effective.

I think at the end of this year we’ll have about 6tb of these hex sets in a four node 8 process ES cluster. Performance is pretty good. Also acts as our full text search. Half the time we want a geo search we also want keyword / filtering / etc on the metadata of these trips.

Pretty fun system to build, and the concept works with a wide variety of data stores. Felt like a total hack job but it has stood the test of time.

Thanks uber, h3 is a great library!

New comment by cullenking in "A Broken Heart"

cullenking — Thu, 05 Feb 2026 20:16:31 +0000

I do this all the time in a dumb but effective way. Add logging statements to code paths that drop timing info. Another dumb but effective way, instead of using a step through debugger, is drop "here, value is {val}". Telling claude to do this is trivial, it's quick, and it can read its own output and self-solve the problem all with just the code itself.

IMHO git bisect is slower, especially depending on the reload/hot-reload/compile/whatever process your actual app is using.

New comment by cullenking in "Flameshot"

cullenking — Mon, 02 Feb 2026 20:23:06 +0000

The above works on wayland, had to make the changes specifically when I moved over to wayland and hyprland

New comment by cullenking in "Flameshot"

cullenking — Thu, 29 Jan 2026 20:22:16 +0000

Flameshot is the best! I've been using it for 10+ years. I have it wired up to some hot keys in my window manager, and have it dump to s3 so I can paste around links to screenshots everywhere for work.

https://github.com/kingcu/screendrop

New comment by cullenking in "FAA to restrict commercial rocket launches to overnight hours"

cullenking — Sat, 08 Nov 2025 04:45:46 +0000

So like texting and driving, but in the air? Flying is hard, I don’t think an automated text based system would be safer than what we have now.

New comment by cullenking in "Doing Rails Wrong"

cullenking — Tue, 07 Oct 2025 20:38:15 +0000

it all depends on your philosophy on dependencies. if you maintain a small set of core dependencies that are there for good reasons and are actively maintained, then rails upgrades are pretty easy. if you have a Gemfile that has a bunch of third party gems that you bring in for small problems here and there, you have to occasionally pay down that debt on version upgrades. we have an 18 year old rails codebase currently on 7.1 that hasn't proven to be a big pain for upgrades. the hardest upgrade we did was because of a core dependency that had been dead for 5 years broke with a new version of rails. but that was a story of letting technical debt ride for too long and having to pay it back.

this is a common problem in any complex codebase that has a culture of using third party dependencies to solve small problems. you see this conversation all the time with modern frontend development and the resulting dependency tree you get with npm etc....

New comment by cullenking in "Samsung now owns Denon, Bowers and Wilkins, Marantz, Polk, and more audio brands"

cullenking — Sat, 27 Sep 2025 17:35:26 +0000

Minidsp flex ht or htx paired with a buckeye 6 channel amp. As cheap as you can get premium sound quality. Not cheap but you get the software control you actually want via the minidsp

New comment by cullenking in "Terence Tao: The role of small organizations in society has shrunk significantly"

cullenking — Thu, 25 Sep 2025 04:08:54 +0000

Preschool is just daycare with structure, so it costs more. Optional, privately owned. Nice to do 2-3 days a week for young kids to give them more social and learning opportunites. But it’s not public school, it’s usually just a small locally owned business.

New comment by cullenking in "How AWS S3 serves 1 petabyte per second on top of slow HDDs"

cullenking — Wed, 24 Sep 2025 21:55:48 +0000

We've been running a production ceph cluster for 11 years now, with only one full scheduled downtime for a major upgrade in all those years, across three different hardware generations. I wouldn't call it easy, but I also wouldn't call it hard. I used to run it with SSDs for radosgw indexes as well as a fast pool for some VMs, and harddrives for bulk object storage. Since i was only running 5 nodes with 10 drives each, I was tired of occasional iop issues under heavy recovery so on the last upgrade I just migrated to 100% nvme drives. To mitigate the price I just bought used enterprise micron drives off ebay whenever I saw a good deal popup. Haven't had any performance issues since then no matter what we've tossed at it. I'd recommend it, though I don't have experience with the other options. On paper I think it's still the best option. Stay away from CephFS though, performance is truly atrocious and you'll footgun yourself for any use in production.

New comment by cullenking in "Ask HN: Who is hiring? (July 2025)"

cullenking — Mon, 07 Jul 2025 14:36:55 +0000

Focusing on compatible timezones, but willing to branch out for just the right candidate

New comment by cullenking in "Ask HN: Who is hiring? (July 2025)"

cullenking — Tue, 01 Jul 2025 15:17:23 +0000

Ride with GPS | Full Time Remote | https://ridewithgps.com/about

We are the world's largest library of bike routes, and we enable cyclists to go on better rides, more often. We have a website and mobile apps that allow people to discover the best riding in their area, and get turn by turn navigation using either our mobile apps or the bike computer of their choosing. Come join us in taking Ride with GPS to the next level! We have two openings right now, and are starting to build out the hiring plan for a third:

Senior Software Engineer - API & Product Development: We are looking for an experienced backend engineer to join our small and effective engineering team with a focus on supporting web and mobile app development using our APIs. The right candidate for this role brings extensive experience supporting modern product development, in collaboration with frontend and mobile developers, product management, and design. This requires excellent communication and collaboration skills, both on the engineering side, and from a product perspective. We use rails, but prior rails experience is not required.

Details, and application process available here: https://ridewithgps.com/careers/2025-senior-software-enginee...

Senior Software Engineer - API Development: We are looking for an experienced backend engineer to join our small and effective team with a focus on our APIs and supporting our platform at scale. This doesn't mean you are isolated from product development &emdash; everything we do serves our users in some way, and being a small team we regularly share responsibilities. However, this role will spend more time on efficiency and system design rather than delivering this quarter's new features. The right candidate should have a depth of experience supporting a large API surface area with efficient, well organized code, and should be excited about maintaining and improving performance over time. Experience with developer tooling, database design, query optimization, and DevOps workflows will serve you well in this role. We use rails, but prior rails experience is not required.

Details, and application process available here: https://ridewithgps.com/careers/2025-senior-software-enginee...

Senior Software Engineer - iOS Development: In mid July, we will officially start the hiring process for an iOS developer, and potentially another Android engineer. We are reviewing applications for qualified candidates at this time, and will officially post the job by July 15th. If you think you are an excellent fit please apply now, however there might be some delays in screening, interviewing, etc while we finalize our hiring plan. We have a technically interesting, battery efficient set of mobile apps that act as a companion to our website, and need another iOS or Android engineer to help us take our apps to the next level.

Draft job position and application process available here: https://ridewithgps.com/careers/job_postings/2025-ios-engine...

New comment by cullenking in "Show HN: I built a knife steel comparison tool"

cullenking — Sun, 18 May 2025 00:13:14 +0000

Cruwear! from my practical testing, it's been the best I've tried. From the tool, it represents quite well.

New comment by cullenking in "It's five grand a day to miss our S3 exit"

cullenking — Sun, 30 Mar 2025 15:21:32 +0000

Enterprise server gear is pretty reliable, and you build your infra to be fully redundant. In our setup, no single machine failure will take us offline. I have 13 machines in a rack running a > 10mm ARR business, and haven't had any significant hardware failures. We have had occasional drive failures, but everything is a RAID1 at a minimum so they are a non issue.

We just replaced our top of rack firewall/proxies that were 11 years old and working just fine. We did it for power and reliability concerns, not because there was a problem. App servers get upgraded more often, but that's because of density and performance improvements.

What does cause a service blip fairly regularly is a single upstream ISP. I will have a second ISP into our rack shortly, which means that whole class of short outage will go away. It's really the only weak spot we've observed. That being said, we are in a nice datacenter that is a critical hub in the pacific northwest. I'm sure a budget datacenter will have a different class of reliability problems that I am not familiar with.

But again, an occasional 15m outage is really not a big deal business wise. Unless you are running a banking service or something, no one cares when something happens for 15m. Heck, all my banks regularly have "maintenance" outages that are unpredictable. I promise, no one relaly cares about five nines of reliability in the strong majority of services.

New comment by cullenking in "Atop 2.11 heap problems"

cullenking — Sun, 30 Mar 2025 03:58:21 +0000

I was bit by atop a few years back and swore it off. I would get perfectly periodic 10m hangs on MySQL. Apparently they changed the default runtime options such that it used an expensive metric gathering technique with a 10m cron job that would hang any large memory process on the system. It was one of those “no freaking way” revelations after 3 days troubleshooting everything.

Interesting reading through the related submission comments and seeing other hard to troubleshoot bugs. I don’t think atop devs are to blame, my guess is that what you have to do to make a tool like atop work means you are hooking into lots of places that have potential to have unintended consequences.

New comment by cullenking in "Ask HN: How much traffic do you serve and with which database engine?"

cullenking — Fri, 14 Mar 2025 23:16:19 +0000

Yup, last time I priced this in RDS I got to maybe $20k a month for two reserved instances across AZs.

I pay for our rack outright every 3-4 months from what I can tell. Still takes the same number of infra/ops/sre people as well. We staff 2, but really just have 1.25 worth of FTE work, you just need more for redundancy.

Pretty nuts! This is also why I am so dismissive of performance optimization. Yeah, I'll just buy a new set of three machines with 2tb of ram each in a few years and call it good, still come out ahead.

New comment by cullenking in "Ask HN: How much traffic do you serve and with which database engine?"

cullenking — Fri, 14 Mar 2025 21:41:44 +0000

I'll bite, just so you get a real answer instead of the very correct but annoying "don't worry about it right now" answers everyone else is going to provide!

We have a rails monolith that sends our master database instance between 2,000 and 10,000 queries per second depending on the time of year. We have a seasonal bike business with more traffic in the summer. 5% of queries are insert/update/delete, the rest read.

mariadb (mysql flavor), all reads and writes sent just to master. Two slaves, one for live failover, the other sitting on a ZFS volume for backup snapshotting sending snapshots off to rsync.net (they are awesome BTW).

We run all our own hardware. The database machines have 512gb of ram and dual EPYC 74F3 24 core processors, backed by a 4 drive raid10 nvme linux software raid volume on top of micron 9300 drives. These machines also house a legacy mongodb cluster (actually a really really nice and easy to maintain key/value store, which is how we use it) on a separate raid volume, an elastic search cluster, and a redis cluster. The redis cluster often is doing 10,000 commands a second on a 20gb db, and the elastic search cluster is a 3tb full text search + geo search database that does about 150 queries a second.

In other words, mysql isn't single tenant here, though it is single tenant on the drives that back our mysql database.

We don't have any caching as it pertains to database queries. yes we shove some expensive to compute data in redis and use that as a cache, but it wouldn't be hitting our database on a cache miss, it would instead recalculate it on the fly from GPS data. I would expect to 3-5x our current traffic before considering caching more seriously, but I'll probably once again just upgrade machines instead. I've been saying this for 15 years....

At the end of 2024 I went on a really fun quest to cut our DB size from 1.4tb down to about 500gb, along with a bunch of query performance improvements (remove unnecessary writes with small refactors, better indexes, dropping unneeded indices, changing from strings to enums in places, etc). I spent about 1 week of very enjoyable and fast paced work to accomplish this while everyone was out christmas break (my day job is now mostly management), and prob would need another 2 weeks to go after the other 30% performance improvements I have in mind.

All this is to serve a daily average of 200-300 http requests per second to our backend, with a mix of website visitors and users of our mobile apps. I've seen 1000rps steady-state peak peak last year and wasn't worried about anything. I wouldn't be surprised if we could get up to 5,000rps to our API with this current setup and a little tuning.

The biggest table by storage and by row count has 300 million rows and I think 150gb including indexes, though I've had a few tables eclipse a billion rows before rearchitecting things. Basically, if you use DB for analytics things get silly, but you can go a long ways before thinking "maybe this should go in its own datastore like clickhouse".

Also, it's not just queries per second, but also row operations per second. mysql is really really fast. We had some hidden performance issues that allowed me to go from 10,000,000 row ops per second down to 200,000 row ops per second right now. This didn't really change any noticable query performance, mysql was cool for some things just doing a ton of full table scans all over the place....

New comment by cullenking in "We were wrong about GPUs"

cullenking — Sat, 15 Feb 2025 16:13:21 +0000

Hey fellow k8s+ceph on bare metaler! We only have a 13 machine rack and 350tb of raw storage. No major issues with ceph after 16.x and all nvme storage though.