Hacker News: thesz

New comment by thesz in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

thesz — Wed, 08 Apr 2026 06:20:33 +0000

  > I think over-thinking is only solved by thinking more, not less.

Despite "thinking" tokens being determined by the preceding tokens, they still are taken from some probability distribution, just a complex one. This means that at each token selection step there is a probability P_e of an error, of selecting a wrong token.

These errors compound exponentially: the probability of not selecting wrong token for N steps is 1-(1-P_e)^N.

The shorter "thinking" is, the less is the probability of it going astray.

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Tue, 07 Apr 2026 14:21:15 +0000

It is important to question "how to judge," not "who is to judge."

My answer of "how to judge?" question is the question "how easy is it to implement new unforeseen functionality with the code under scrutiny?"

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Tue, 07 Apr 2026 14:15:51 +0000

  >  Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+).

How do you know?

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Tue, 07 Apr 2026 14:13:25 +0000

  > AI hasn't observably reduced the quality of software I use everyday in a way that is meaningfully separable from normal incidents in the past.

Most probably, you are not looking into that well enough.

Average duration for AWS outages [2] was 1.5 hours per outage, 38 hours total. Most recent AWS outage in 2026 [1] downed AWS for 13 hours, a third of 38 hours spanning an year before, and was caused by AWS LLM coding tool.

[1] https://www.theguardian.com/technology/2026/feb/20/amazon-cl...

[2] https://www.cherryservers.com/blog/cloud-outages

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Tue, 07 Apr 2026 14:04:09 +0000

  > I have been writing a library called "assume" where you can specify a type signature, give it a prompt, and it generates a function on the fly in the background with Claude Code, so you still write some code, but whenever you need a function you "assume" that such a function exists.

This is very much like good old djinn [1], which would generate code from Haskell type specification.

[1] https://mail.haskell.org/pipermail/haskell/2005-December/017...

And this is why I boldly compare current LLM craze to the much less hyped craze of strong type systems. I was a part of that strong type system discussion, advocating for them. ;)

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Tue, 07 Apr 2026 13:58:44 +0000

  > If you make the prompts specific enough and provide tests that it has to run before it passes, then it should be fairly close to deterministic.

Your prompt may work on the specific state of code base and not before or after some changes. Your tests can check for the specific behavior but not for the absence of undesirable behaviors induced by absence of some specific code or by addition of other specific code.

  > I assure you that's already happening. A lot.

Thank you for assurance. Can we have less of it? Thank you again.

New comment by thesz in "A cryptography engineer's perspective on quantum computing timelines"

thesz — Tue, 07 Apr 2026 07:42:14 +0000

Given that quantum computing (QC) can speed up training of neural networks (LLMs), it would be wise for Google to invest into QC as much as possible.

Google with Softbank invested about $230M into QC last year. Microsoft, IBM and Google have spent on QC $15B combined, through all of the time they researched it. $15B spent in 20 years, less than $1B per year, by three companies.

Google spent upwards of $150B last year in datacenters.

This may tell us something about how close we are to a working quantum computing.

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Mon, 06 Apr 2026 22:21:04 +0000

It is hard to find market shares of Viaweb and Yahoo at the time of Viaweb's purchase, the best I've found is that Viaweb was bought for 1/4 of Yahoo's net revenue at the time ($49M price/$203 net revenue in 1998). Viaweb was not profitable at the time of purchase, but it had about four people and quite modest hardware costs.

While there are no companies with $1.5 trillions (4*$380B) of net revenue, the difference is that Anthropic is cash net-negative, has more than 4 people in staff (none of them are hungry artists like PG) and hardware use spendings, I think, are astronomical. They are cash net-negative because of hardware needed to train models.

There should be more than one company able to offer good purchase terms to Anthropic's owners.

I also think that Anthropic, just like OpenAI and most of other LLM companies and companies' departments, ride "test set leakage," hoping general public and investors do not understand. Their models do not generalize well, being unable to generate working code in Haskell [1] at the very least.

[1] https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...

PG's Viaweb had an awful code as a liability. Anthropic's Claude Code has an awful implementation (code) and produces awful code, with more liability than code written by human.

New comment by thesz in "The cult of vibe coding is dogfooding run amok"

thesz — Mon, 06 Apr 2026 19:35:20 +0000

This product rides a hype wave. This is why it is crazy popular and successful.

The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).

What did Viaweb's buyer do? They rewrote thing in C++.

If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.

New comment by thesz in "Finnish sauna heat exposure induces stronger immune cell than cytokine responses"

thesz — Sun, 05 Apr 2026 16:19:27 +0000

It may originate from Roman's thermae: https://en.wikipedia.org/wiki/Thermae

New comment by thesz in "Finnish sauna heat exposure induces stronger immune cell than cytokine responses"

thesz — Sun, 05 Apr 2026 16:17:55 +0000

Hammam is not as hot as sauna and not as dry. Sauna's air temperatures can reach above 100 degress Celsius and humidity is usually relatively low (around 20%).

[1] https://en.wikipedia.org/wiki/Sauna

Hammam's temperatures are around 40-50 degrees Celsius and humidity is close to 100%.

These are very different conditions, with very different body response.

New comment by thesz in "The revenge of the data scientist"

thesz — Thu, 02 Apr 2026 19:12:07 +0000

  > You recognize that you haven't really needed strong mathematical (or coding) skills to create models for some time.

And then there goes something like this [1], where researchers failed to control for p-value: "In this particular setting, emergent abilities claims are possibly infected by a failure to control for multiple comparisons. In BIG-Bench alone, there are ≥220 tasks, ∼40 metrics per task, ∼10 model families, for a total of ∼10^6 task-metric-model family triplets, meaning probability that no task-metric-model family triplet exhibits an emergent ability by random chance might be small."

[1] https://arxiv.org/abs/2304.15004

New comment by thesz in "Slop is not necessarily the future"

thesz — Wed, 01 Apr 2026 17:58:07 +0000

  > No one has ever made a purchasing decision based on how good your code is.

People make purchasing decisions on the availability of source code all the time, preferring source code available and be able to use it. It is safe to assume that they can perform purchase decisions on the quality of source code, given all is equal.

New comment by thesz in "Google's 200M-parameter time-series foundation model with 16k context"

thesz — Tue, 31 Mar 2026 11:54:26 +0000

  > How can the same model predict egg prices in Italy, and global inflation in a reliable way?

For one, there's Benford's law: https://en.wikipedia.org/wiki/Benford%27s_law

So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.

New comment by thesz in "Reports of code's death are greatly exaggerated"

thesz — Mon, 30 Mar 2026 18:54:31 +0000

  > A system that is not even Turing complete is extremely limited.

Agda is not Turing-complete, yet it is very useful.

New comment by thesz in "VHDL's Crown Jewel"

thesz — Mon, 30 Mar 2026 12:27:47 +0000

  > The Delta Cycle logic is actually quite similar to functional reactive programming. It separates how a value changes from when a process responds to that change.

This is what I use when I play with hardware simulation in Haskell:

  type S a = [a]

  register :: a -> S a -> S a
  register a0 as = a0:as

  -- combinational logic can be represented as typical pure
  -- functions and then glued into "circuits" with register's
  -- and map/zip/unzip functions.

This thing also separates externally visible events recorded in the (infinite) list of values from externally unobservable pure (combinational) logic. But, one can test combinational logic separately, with property based testing, etc.

New comment by thesz in "Douglas Lenat's Automated Mathematician Source Code"

thesz — Mon, 30 Mar 2026 10:24:45 +0000

Automated Mathematician was what lead to Eurisko: https://en.wikipedia.org/wiki/Eurisko

Eurisko demonstrated superhuman abilities to play strategy games in early 1980-th, and even used strategies from VLSI place-and-route task in planning fleet placement in games. This is knowledge transfer between tasks.

New comment by thesz in "Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem"

thesz — Sun, 29 Mar 2026 07:18:51 +0000

It still is [1].

[1] https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...

New comment by thesz in "Reports of code's death are greatly exaggerated"

thesz — Sat, 28 Mar 2026 22:38:25 +0000

  > This is meaningless.

Turing machines grew from the constructive mathematics [1], where proofs are constructions of the objects or, in other words, algorithms to compute them.

  [1] https://en.wikipedia.org/wiki/Constructivism_(philosophy_of_mathematics)#Constructive_mathematics

Saying that there is no difference between things that can be constructed (quantum oracles) and things that are given and cannot be constructed (Turing oracles - they are not even machines of any sort) is a direct refutation of the very base of the Turing machine theoretical base.

New comment by thesz in "CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering"

thesz — Sat, 28 Mar 2026 22:30:29 +0000

Thanks.

The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.

  [1] https://arxiv.org/pdf/2505.19689
  [2] https://proceedings.mlr.press/v48/taylor16.pdf

I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.

So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?