Hacker News: wenc

New comment by wenc in "Taking a walk may lead to more creativity than sitting, study finds (2014)"

wenc — Tue, 26 May 2026 04:25:44 +0000

I can attest to this. I work in Midtown Manhattan. You'd think walking around meant getting distracted by the all the activity around you that you'd forget about the problem you're trying to solve.

But I've found that distraction is the catalyst. Creativity for me comes when I focus on something else for a while, not grinding on the same problem with unwavering focus.

New comment by wenc in "America's Most-Spoken Languages After English and Spanish"

wenc — Mon, 18 May 2026 12:58:30 +0000

People often say Mandarin and Cantonese are like Spanish and Portuguese, but that undersells how different they are.

Your example of Spanish and French is more accurate -- same language family, but different grammar and vocabulary.

I offer German and Dutch as another example pair -- same language family as well, but different enough that no one will say "oh they're just different dialects". Dutch is an example of what happens when a Germanic language (Low Franconian) gets it's own state.

New comment by wenc in "Building ML framework with Rust and Category Theory"

wenc — Fri, 15 May 2026 14:58:21 +0000

Category theory is rarely useful by itself, but it can be a mental scaffold when designing things like query languages. Microsoft's LINQ dsl within C# used category theory ideas to ensure consistency. That said, the applicability surface area in practice is typically quite limited in my experience. It's like formal methods -- elegant in practice, but a good problem fit is often rare. It's like writing a LEAN proof for your web app -- rarely needed, but if your web app needs a high degree of correctness, then indispensable.

This is John D Cook's take:

Category theory can be very useful, but you don’t use it the same way you use other kinds of math. You can apply optimization theory, for example, by noticing that a problem has a certain form, and therefore a certain algorithm will converge to a solution. Applications of category theory are usually more subtle. You’re not likely to quote some theorem from category theory that finishes off a problem the way the selecting an optimization algorithm does.

I had been skeptical of applications of category theory, and to some extent I still am. Many reported applications of category theory aren’t that applied, and they’re not so much applications as post hoc glosses. At the same time, I’ve seen real applications of categories, such as the design of LINQ mentioned above. I’ve been a part of projects where we used category theory to guide mathematical modeling and software development. Category theory can spot inconsistencies and errors similar to the way dimensional analysis does in engineering, or type checking in software development. It can help you ask the right questions. It can guide you to including the right things, and leaving the right things out. [1]

[1] https://www.johndcook.com/blog/applied-category-theory/

New comment by wenc in "Geography is four-dimensional"

wenc — Fri, 15 May 2026 13:30:54 +0000

If you lived in a high place (Denver), you will find it different from a flat lowland (Chicago).

Also in Rio, how high you live can be a marker depending on which part of town you are. Favelas are on hills, whereas wealthy people in Zona Sul live down the hill closer to the beaches.

New comment by wenc in "Quack: The DuckDB Client-Server Protocol"

wenc — Wed, 13 May 2026 13:15:51 +0000

But why though? DuckDB can still be used as a local query engine — I still use it as that. I haven’t touched any of the DuckLake stuff and the duckdb cli and Python library are still my bread and butter. They can add new use cases, but it doesn’t affect the core engine.

Is the concern that the duckdb messaging is now diluted by it having all these extra features? That you can’t sell it to friends as “this thing” like you can a one use tool like curl? I get that, but I also feel that duckdb is so much bigger than a “do one thing and do it well” tool.

It’s an engine that drives the modern data tool stack. Duckdb’s team has been prescient in that it has made many tasteful bets on what users want —- the ability to interop with pandas and polars, addition of geospatial, the plug-in infra. They’re all optional but when you neeed these things, they’re so useful. They’ve also clued me into what the broader data world is thinking about (I didn’t know about sketches and hilbert, but those are so useful in probailistic large scale queries and in geospatial queries). And they exist in larger database systems like Redshift too.

So far duckdb’s bets have been tasteful, and mostly ignorable if you don’t happen to use them.

New comment by wenc in "Quack: The DuckDB Client-Server Protocol"

wenc — Tue, 12 May 2026 21:58:25 +0000

DuckDB is both a standalone and a component. This effort is actually very coherent and brings it back into a familiar usage model — that of a traditional client server RDBMS.

RDBMS have always been multi-user concurrent systems. DuckDB is a very fast local engine that has a multitude of use cases because it is a embeddable in other systems.

It’s like saying what does SQLite wanna be? It’s in your phones, your browser, your desktop apps, iot devices and people have extended it in different directions. The only difference here is that this is first party not third party. But to me it’s a very legible move.

New comment by wenc in "Amazon employees are "tokenmaxxing" due to pressure to use AI tools"

wenc — Tue, 12 May 2026 17:14:26 +0000

When did FT become Business Insider?

I have an FT subscription and they keep moving toward this kind of narrative first reporting to get clicks. It’s no longer a believable paper.

New comment by wenc in "A polynomial autoencoder beats PCA on transformer embeddings"

wenc — Fri, 08 May 2026 15:22:46 +0000

It sounds like this replaces the PCA reconstruction function with a quadratic.

The normal PCA encoding:

1) Given a mean-center-scaled X matrix, get the latent variable matrix T with X = T * P’ + e, where P = loadings and e = residuals. The P is your model, so for a new vector xnew, you can calculate tnew = xnew * P (because P’ * P = I).

This is the encoder —- nothing changes here. The original matrix is dimensionally reduced with residuals e discarded. This is why PCA is lossy.

The decoder is where things diverge

The usual PCA decoder reconstructs a given latent variable t_any by using the trained P loadings, like thus x_reconstructed = t_any * P’. This reconstructed data lies on a linear hyperplane, so if the original data did not lie on the hyperplane, reconstruction errors are potetially high.

In your proposal, instead of a linear decoder, you train a quadratic decoder (essentially a classic ridge regression using a quadratic) on the original X. So for your reconstruction, you have x_reconstructed = poly(t_new).

This achieves lower reconstruction error in-sample (naturally, because quadratic is higher order than linear), but your poly function is trained on a particular corpus. Which means that when you’re in-distribution within that corpus, you’re good but when you’re not, you can be very wrong in biased ways that PCA’s linear reconstruction is not.

SO this is not a better technique than PCA in a general sense. It’s a better reconstruction machine when your data is mostly in-sample. It’s a kind of computationally cheap “specialization” on a particular distribution of data, which can be useful if you’re mostly in-distribution but introduces new risks when out-of-distribution.

Whereas PCA just drops the residual and makes modest claims, a quadratic decoder is trying to predict the residual and on out-of-sample data, it can be wrong in biased ways that PCA is not. In other words, it can hallucinate.

But if on a large enough training corpus, chances are we’re going to be in-distribution most of the time, so maybe this could generalize well.

New comment by wenc in "Apple accidentally left Claude.md files Apple Support app"

wenc — Fri, 01 May 2026 13:12:49 +0000

Right now Alexa+ and Gemini are objectively better.

The best is ChatGPT voice mode. It understands non English words and accents amazingly well, and even though the LLM model isn’t the full fledged one, I can have deep conversations with it for an hour without it missing a beat.

New comment by wenc in "Maladaptive Frugality"

wenc — Fri, 01 May 2026 04:37:12 +0000

I read this article 10 years ago by a guy named Ricky Yean who went to Stanford as an economically disadvantaged admit and couldn’t shake his poverty mindset and it cost him when he was running a startup.

Why “few successful startup founders grew up desperately poor”

https://rickyyean.com/2016/01/22/privilege-and-inequality-in...

Poverty mindset is maladaptive because it teaches you only money is worth anything, so you hoard it. But in truth time is also worth a lot and sometimes it’s wise to use money to buy time.

New comment by wenc in "EvanFlow – A TDD driven feedback loop for Claude Code"

wenc — Mon, 27 Apr 2026 04:16:59 +0000

I feel like 1 is a self correcting problem. If this goes nowhere it will soon be forgotten.

I can think of one example that did go somewhere: Linux.

New comment by wenc in "Gas Town: From Clown Show to v1.0"

wenc — Wed, 15 Apr 2026 00:57:59 +0000

I feel Gastown is an attempt at answering: what if i push the multi-agent paradigm to its chaotic end?

But I think the point that Yegge doesn't address and that I had to discover for myself is: getting many agents working in parallel doing different things -- while cool and exciting (in an anthromorphic way) -- might not actually be solving the right problem. The bottleneck in development isn't workflow orchestration (what Gastown does) -- it's actually problem decomposition.

And Beads doesn't actually handle the decomposed problem well. I thought it did. But all it is is a task-graph system. Each bead is task, and agents can just pick up tasks to work on. That looks a lot like an SDE picking up a JIRA ticket right? But the problem is embedding just enough context in the task that the agent can do it right. But often it doesn't, so the agent has to guess missing context. And it often produces plausible code that is wrong.

Devolving a goal into the smaller slices is really where a lot of difficulty lies. You might say, oh, "I can just tell Claude to write Epics/Stories/Tasks, and it'll figure it out". Right? But without something grounding it like a spec, Claude doesn't do a good job. It won't know exactly how much context to provide to each independent agent.

What I have found useful is spec-driven development, especially of the opinionated variety that Kiro IDE offers. Kiro IDE is a middling Cursor, but an excellent spec generator -- in fact one of the best. It generates 3 specs at 3 levels of abstraction. It generates a Requirements doc in EARS/INCOSE (used at Rolls Royce and Boeing for reducing spec ambiguity), and then generate a Design doc (commonly done at FAANG), and... then generates a Task list, which cross-references the sections of the requirements/design.

This kind of spec hugely limits the degrees of freedom. The Requirements part of the spec actually captures intent, which is key. The Design part mocks interfaces, embeds glossaries, and also embeds PBTs (property-based tests using Hypothesis -- maybe eventually Hegel?) as gating mechanisms to check invariants. The Task list is what Beads is supposed to do -- but Beads can't do a good job because it doesn't have the other two specs.

I've deployed 4 products now using Kiro spec-driven dev (+ Simon Willison's tip "do red/green tdd") and they're running in prod and so far so good. They're pressure-tested using real data.

Spec-driven development isn't perfect but I feel its aim is the correct one -- to capture intentions, to reduce the degrees of freedom, and to constrain agents toward correctness. I tried using Claude Code's /plan mode but it's nowhere as rigorous, and there's still spec drift in the generated code. It doesn't pin down the problem sufficiently.

Gastown/Beads are solutions for workflow orchestration problem (which is exciting for tech bros), but at its core, it's not the most important problem. Problem decomposition is.

Otherwise you're just solving the wrong problem, fast.

New comment by wenc in "NYC to open municipal grocery store in 2027"

wenc — Wed, 15 Apr 2026 00:01:22 +0000

You're thinking a tax break which is an unconditional subsidy. That relies on the business passing savings through which folks are right to be skeptical about.

But that's not all subsidy mechanisms. The best ones are where pass-through is enforced, not assumed.

You already know of one that works: WIC. It lowers the effective price for customer, which the store receives as reimbursement.

It's not about trickle-down -- that's ideology. It's more about designing the right mechanism.

New comment by wenc in "Design and implementation of DuckDB internals"

wenc — Tue, 14 Apr 2026 16:32:57 +0000

I use DuckDB daily.

In short — It doesn’t crash often at all.

What you may be remembering were reports of exceptional cases where it didn’t handle out of memory errors well. I was one of the people affected. I was running complex analytic queries on 400 GB parquets and I only had 128GB memory. It used jemalloc which didn’t gracefully degrade. They fixed a lot of the OOM issues so it’s more robust now. I haven’t had a crash for a long time.

On normal sized datasets it never crashes.

New comment by wenc in "Distributed DuckDB Instance"

wenc — Tue, 14 Apr 2026 12:34:11 +0000

Try DuckLake. They just released a prod version.

You can do read/write of a parquet folder on your local drive, but managed by DuckLake. Supports schema evolution and versioning too.

Basically SQLite for parquet.

New comment by wenc in "S3 Files"

wenc — Wed, 08 Apr 2026 01:01:12 +0000

Maybe the OP is thinking of reading/writing to DuckDB native format files. Those require filesystem semantics for writing. Unfortunately, even NFS or SMB are not sufficiently FS-like for DuckDB.

Parquet is static append only, so DuckDB has no problems with those living on S3.

New comment by wenc in "Move Detroit"

wenc — Tue, 07 Apr 2026 22:50:27 +0000

I recommend visiting Detroit to update your priors. I first visited in 2000 and it was blighted. I visited again in 2025 and it’s actually nice (downtown Detroit and surroundings). There’s even a Microsoft office there.

New comment by wenc in "Jack Dorsey says Block employees now bring prototypes, not slides, to meetings"

wenc — Sat, 04 Apr 2026 16:12:52 +0000

We need to match the tool to the uncertainty we're facing.

The "just prototype it" thinking addresses "feasibility uncertainty". It surfaces blind spots and helps people tangibly reason about what the product looks like. It's a great exploratory tool for incremental ideas.

But it doesn't address the the larger uncertainty that startups are faced with: "market uncertainty" (or pmf). It doesn't answer "should we be building in this the first place?" That's where writing as a tool of thought is most powerful -- it helps you crystallize what problem we're actually solving.

The "just prototype it" culture (which is being promoted these days because Claude Code makes it easy) risks answering the wrong question, or at least the right question but in the wrong order. You end up with organizations that are incredibly fast at building things that no one should have built.

Ironically sometimes you need to start from a lower resolution (i.e. writing a doc). Prototyping too early is premature optimization.

New comment by wenc in "What category theory teaches us about dataframes"

wenc — Sat, 04 Apr 2026 05:35:56 +0000

Polars is Ritchie Vink. Pandas is Wes McKinney.

New comment by wenc in "Why I Vibe in Go, Not Rust or Python"

wenc — Mon, 23 Mar 2026 01:14:48 +0000

It depends on the use case. Go seems like a dream... until you have to work with dataframes or do any kind of ML work. Then it's a nightmare.

Go's ecosystem is especially weak in ML, stats, and any kind of scientific computation. I mean, do you really want Claude to implement standard battle-tested ML algorithms in Go from scratch? You'd be burning tokens and still get a worse result than if you'd just used Python.

I use Go to write CLI tools, but for ML work I'd rather have Claude generate Python.

The suitability of language hinges not only on its language design, but its ecosystem as well.