Hacker News: kwillets

New comment by kwillets in "The rise of South Korea’s weapons business"

kwillets — Sat, 20 Jun 2026 19:25:32 +0000

This trend has been obvious since at least the Poland deal. Korea gets much more return on its defense dollar manufacturing exportable weapons systems than relying on imports or domestic-only programs.

New comment by kwillets in "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search"

kwillets — Tue, 09 Jun 2026 16:13:11 +0000

I'm curious to see what patterns it's grepping.

New comment by kwillets in "What game engines know about data that databases forgot"

kwillets — Thu, 09 Apr 2026 19:21:58 +0000

This has a lot in common with a columnstore, but there are a few interesting differences.

The main one is subrow versioning. Column stores (in OLAP at least) have always used row-level versioning, which gets in the way of small updates. A single change to a row amplifies into deleting and re-inserting the whole thing, and operations that seem sensible like adding or dropping a column break previous versions. This scheme is the first I've seen that tries to fix that problem.

One other difference is a lack of compression, as it's zero-copy, so the performance gains of operating on compressed data are lost.

New comment by kwillets in "Bombadil: Property-based testing for web UIs"

kwillets — Mon, 23 Mar 2026 16:39:26 +0000

One of the founders of Antithesis gave a talk about this problem last week; diversity in test cases is definitely an issue they're trying to tackle. The example he gave was Spanner tests not filling its cache due to jittering near zero under random inputs. Not doing that appears to be a company goal.

https://github.com/papers-we-love/san-francisco/blob/master/...

New comment by kwillets in "Nobody ever got fired for using a struct"

kwillets — Fri, 06 Mar 2026 20:04:32 +0000

This site is underweighted on OLAP. Columnstores were invented for precisely this use case; nobody in the field wants to normalize everything.

Which brings me to the question, why a rowstore? Are Z-sets hard to manage otherwise?

Another aspect of wide tables is that they tend to have a lot of dependencies, ie different columns come from different aggregations, and the whole table gets held up if one of them is late. IVM seems like a good solution for that problem.

New comment by kwillets in "Ask HN: Who is hiring? (March 2026)"

kwillets — Tue, 03 Mar 2026 05:21:17 +0000

Can you be more specific about what type of data mining you're doing?

New comment by kwillets in "Infrastructure decisions I endorse or regret after 4 years at a startup (2024)"

kwillets — Fri, 20 Feb 2026 17:24:24 +0000

He doesn't want to manage the database the way he manages the rest of his infrastructure. All of his bullet points apply to other components as well, but he's absorbed the cost of managing them and assigning responsibilities.

- Crud accumulates in the [infrastructure thingie], and it’s unclear if it can be deleted.

- When there are performance issues, infrastructure (without deep product knowledge) has to debug the [infrastructure thingie] and figure out who to redirect to

- [infrastructure thingie] users can push bad code that does bad things to the [infrastructure thingie]. These bad things may PagerDuty alert the infrastructure team (since they own the [infrastructure thingie]). It feels bad to wake up one team for another team’s issue. With application owned [infrastructure thingies], the application team is the first responder.

New comment by kwillets in "Readings in Database Systems (5th Edition) (2015)"

kwillets — Wed, 31 Dec 2025 19:03:41 +0000

The object storage stuff is new, but it's mostly confirmed that the older architecture works. MPP with shared (S3) storage and everything above that on local SSD and compute delivers the best performance. Even Snowflake finally came out with "interactive" warehouses with this architecture.

Parquet, Iceberg, and other open formats seem good, but they may hit a complexity wall. There's already some inconsistency between platforms, eg with delete vectors.

Incremental view maintenance interests me as well, and I would like to see it more available on different platforms. It's ironic that people use dbt etc. to test every little edit of their manually coded delta pipelines, but don't look at IVM.

New comment by kwillets in "Go ahead, self-host Postgres"

kwillets — Sun, 21 Dec 2025 06:54:47 +0000

Over time I've realized that the best abstraction for managing a computer is a computer.

New comment by kwillets in "Biscuit is a specialized PostgreSQL index for fast pattern matching LIKE queries"

kwillets — Sat, 20 Dec 2025 20:05:27 +0000

This is a fairly simple idea of indexing characters for each column/offset and compressing the bitmaps. Simple is good, as the overhead of more sophisticated ideas (eg suffix sorting) is often prohibitive.

One suggestion is to index the end-of-string as a character as well; then you don't need negative offsets. But that turns the suffix search into a wildcard type of thing where you have to try all offsets, which is what the '%pat%' searches do already, so maybe it's OK.

New comment by kwillets in "SQLite JSON at full index speed using generated columns"

kwillets — Fri, 12 Dec 2025 18:04:31 +0000

It's been around for quite while, but DB people hate to explain where they got an idea. For all I know Vertica got it from somewhere else; I think postgres got jsonb around the same time.

New comment by kwillets in "Weight-sparse transformers have interpretable circuits [pdf]"

kwillets — Sat, 22 Nov 2025 21:14:13 +0000

My last dive into matrix computations was years ago, but the need was the same back then. We could sparsify matrices pretty easily, but the infrastructure was lacking. Some things never change.

New comment by kwillets in "Ondol"

kwillets — Sun, 26 Oct 2025 05:41:40 +0000

I wrote part of that -- it needed a tight summary that covers all the main components, so I packed as much as I could into a sentence or two. I believe the second paragraph is mostly from me, but it was some time ago, back when Hanok revival was fairly new, and a lot of books were coming out on the subject.

A few years after that one of the Korean dailies wrote an English article and copied my wording. :(

New comment by kwillets in "A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec"

kwillets — Fri, 24 Oct 2025 16:49:25 +0000

This is fun, but I'm confused by the architecture. Duckdb is based on one-off queries that can scale momentarily and then disappear, but this seems to run on k8s and maintain a persistent distributed worker pool.

This pool lacks many of the features of a distributed cluster such as recovery, quorum, and storage state management, and queries run through a single server. What happens when a node goes down? Does it give up, replan, or just hang? How does it divide up resources between multiple requests? Can it distribute joins and other intermediate operators?

I have a soft spot in my heart for duckdb, but its uniqueness is in avoiding the large-scale clustering that other engines already do reasonably well.

New comment by kwillets in "A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec"

kwillets — Fri, 24 Oct 2025 15:46:06 +0000

Don't forget the 2 hour tableau cloud runtime limit.

New comment by kwillets in "SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval"

kwillets — Fri, 17 Oct 2025 17:29:57 +0000

I'm just learning about agentic search so I'm a bit adrift.

One of my side projects is a full text index for pattern search, and I'm trying to understand how it might fit with that. You mention tool call overhead, but is that a significant part of the latency in the multi-turn scenario, or is it the coding agent being forced into a serial processing pattern?

New comment by kwillets in "SWE-Grep and SWE-Grep-Mini: RL for Fast Multi-Turn Context Retrieval"

kwillets — Fri, 17 Oct 2025 04:38:50 +0000

Are you actually using grep here? How much data are you searching?

New comment by kwillets in "For centuries massive meals amazed visitors to Korea (2019)"

kwillets — Mon, 13 Oct 2025 14:53:56 +0000

This article seems somewhat fanciful. Medieval Koreans ate two meals a day, and famines were common. Only a small area of the country is suitable for rice cultivation. The photos seem to show only upper-class clothing and furniture.

I've been to that makgeolli place in Jeonju, and it sells drinks and food as a set; there is no free food.

New comment by kwillets in "South Korea's booming market for traditional (and novel) hangover remedies"

kwillets — Sun, 28 Sep 2025 06:20:13 +0000

Nam (Man Tea) and other brands have been in convenience stores for years; most have a full shelf of it. As I heard it the male part of the name is due to its use in manly herbal mixtures, but it's not considered a male-specific substance on its own.

New comment by kwillets in "Samsung now owns Denon, Bowers and Wilkins, Marantz, Polk, and more audio brands"

kwillets — Sat, 27 Sep 2025 22:56:19 +0000

I built the ads audience system. Most of the effects of that are already known here; Ads became a big trophy within the org; everybody had to have ads and post-sale revenue, even the fridge people.

Sometime around when the CEO got out of prison a bunch of weirdness occurred. Good managers left, bad managers got hired, and everything became top-down. The group head "retired" but last year un-retired in a different position; I didn't know you could do that.

Engineering-wise it went from technical free rein to "only use this suspiciously chummy cloud vendor" in a few months. I never got to the bottom of that deal, but costs exploded, and revenue flattened.