Hacker News: dwohnitmok

New comment by dwohnitmok in "Just 'English with Hanzi'"

dwohnitmok — Mon, 06 Apr 2026 01:11:54 +0000

> apparently well evidenced view that Lu Xun's overwhelming coverage in popular media and secondary schooling neglects to point out his anti-character stance

What do you mean by "apparently well evidenced view?" No I'm not saying "someone taught it at university." That's a public high school exam. That is specifically secondary schooling.

Moreover, this gets mentioned in official publications and popular media frequently. See for example this official article from the Chinese Academy of Social Sciences (which is a state-run entity), which just happened to be the first article that caught my eye.

> 1935年12月，蔡元培、鲁迅、郭沫若、叶圣陶、茅盾、陈望道、陶行知等688位知名人士，共同发表文章《我们对于推行新文字的意见》，其中说：“中国已经到了生死关头，我们必须教育大众，组织起来解决困难。但这教育大众的工作，开始就遇着一个绝大难关。这个难关就是方块汉字。方块汉字难认、难识、难学。……我们觉得这种新文字值得向全国介绍。我们深望大家一齐来研究它，推行它，使它成为推进大众文化和民族解放运动的重要工具。” (http://ling.cass.cn/keyan/xueshuchengguo/cgtj/202112/t202112...)

And my very rough translation.

> In December of 1935, 688 well-known individuals including Cai Yuanpei, Lu Xun, Guo Moruo, Ye Shengtao, Mao Dun, Chen Wangdao, and Tao Xingzhi, published "Our views on spreading Sin Wenz [Latinxua Sin Wenz, i.e. a Latin alphabetization of Chinese]." It stated in part, "China has already arrived at the point of life or death, we must educate the masses and organize [them] to solve difficulties. But the work of educating the masses, at its very beginning already runs into an enormous problem. That problem is Chinese square characters [Chinese characters usually are roughly proportioned as if they were in a square frame]. Chinese square characters are difficult to recognize, difficult to understand, and difficult to learn.... We believe that Sin Wenz deserves to be introduced to the entire nation. We deeply hope that everyone will study them, spread them and put them into practice, and make them into an important tool for improving the culture of the masses and the movement to liberate the people."

More broadly this is a very common topic among Chinese netizens. There are as I linked dozens of forum posts on this across Zhihu, Baidu, etc.

It's not the first thing people learn about Lu Xun. But it's definitely not hidden.

New comment by dwohnitmok in "Just 'English with Hanzi'"

dwohnitmok — Mon, 06 Apr 2026 00:39:18 +0000

Good to know!

New comment by dwohnitmok in "Just 'English with Hanzi'"

dwohnitmok — Sun, 05 Apr 2026 20:32:05 +0000

We talked about this years ago. This is very much taught in the PRC (and I believe Taiwan for that matter). I specifically gave you examples of standardized tests that go over this material.

https://news.ycombinator.com/item?id=33312227

New comment by dwohnitmok in "A $20/month user costs OpenAI $65 in compute. AI video is a money furnace"

dwohnitmok — Fri, 03 Apr 2026 15:39:44 +0000

There are many ways for a project to no longer be worth the company's attention. E.g. it might be the case that total costs factoring in on-going engineering energy and money (which is quite different than just compute costs!) are too much. It might be that political risk exposure from the product isn't worth the benefits it brings (Sora was always a lightning rod of criticism). It might be that the opportunity cost of engineering and/or compute resources spent on a product is too high (very different than absolute cost).

All this is to say, even for very compute cheap things, companies shut down "mostly passive income" revenue streams all the time (see e.g. Google's graveyard of products). There are all sorts of other organizational costs associated with ongoing maintenance of a product.

New comment by dwohnitmok in "A $20/month user costs OpenAI $65 in compute. AI video is a money furnace"

dwohnitmok — Fri, 03 Apr 2026 01:04:19 +0000

This seems to have a healthy helping of AI editing help (if not fully generated by AI). The links don't quite go to the sources that they should and there's a lot of AI-isms.

Anyways, the calculation for the costs seem crazy high (and are pulled from an ft article). In particular they are based off a calculation that assumes Sora videos take 10 min to generate (which seems simply wrong; I've personally generated Sora videos that take less than 10 min to return fully formed), fully saturate 4 H200s at once (this seems wrong with batching; I would assume they're batching a lot of tokens together per forward pass), and, crucially, that OpenAI is paying full spot, end-user pricing for an H200 (at $2 an hour). As an individual, I can rent an H200 for $2 an hour on e.g. vast.ai (and sometimes even cheaper than that!). There is absolutely no way OpenAI is spending anywhere near that number.

I also have no idea where the Appfigures $2.1 million comes from. As far as I can tell it doesn't exist at all in the linked website.

I don't really trust the numbers here.

New comment by dwohnitmok in "Some uncomfortable truths about AI coding agents"

dwohnitmok — Tue, 31 Mar 2026 00:46:03 +0000

We are kind of talking past each other. I'm saying something simpler. This all goes back to the original point I made in reference to your reply to johnfn:

>> The post is factoring in training costs, not just inference.

It is not because training costs are irrelevant here. Training costs do not cause your costs to go up as you accumulate more users.

None of these calculations we're talking about include training costs. You're saying that inference is unprofitable (at least given the subscription plans). I'm simply pointing out that we are talking about inference not training as you stated earlier. You are (very accurately) not talking at all about training costs.

New comment by dwohnitmok in "Some uncomfortable truths about AI coding agents"

dwohnitmok — Sun, 29 Mar 2026 04:49:32 +0000

Again, that is a statement about inference time costs, not training costs.

New comment by dwohnitmok in "Some uncomfortable truths about AI coding agents"

dwohnitmok — Fri, 27 Mar 2026 23:12:46 +0000

No it's not. Otherwise this part doesn't make sense

> in fact, they actually compound the problem by encouraging significantly more usage

because if eliminating training costs makes running the model above cost, the problem is helped by significantly more usage not compounded.

More usage compounds the problem only if inference is unprofitable.

(the article briefly mentions training but that's later).

New comment by dwohnitmok in ""Disregard That" Attacks"

dwohnitmok — Thu, 26 Mar 2026 14:49:55 +0000

When's the last time you jailbroke a model? Modern frontier models (apart from Gemini which is unusually bad at this) are significantly harder to override their system prompt than this.

Again, let's say the system prompt is "deploy X" and the user prompt provides falsified evidence that one should not deploy X because that will cause a production outage. That technically overrides the system prompt. And you can arbitrarily sophisticated in the evidence you falsify.

But you probably want the system prompt to be overridden if it would truly cause a production outage. That's common sense a general AI system is supposed to possess. And now you're testing the system's ability to distinguish whether evidence is falsified. A very hard problem against a sufficiently determined attacker!

New comment by dwohnitmok in ""Disregard That" Attacks"

dwohnitmok — Thu, 26 Mar 2026 06:07:04 +0000

@krackers gives you a response that points out this already happens (and doesn't fully work for LLMs).

> The hypothetical approach I've heard of is to have two context windows, one trusted and one untrusted (usually phrased as separating the system prompt and the user prompt).

I want to point out that this is not really an LLM problem. This is an extremely difficult problem for any system you aspire to be able to emulate general intelligence and is more or less equivalent to solving AI alignment itself. As stated, it's kind of like saying "well the approach to solve world hunger is to set up systems so that no individual ever ends up without enough to eat." It is not really easier to have a 100% fool-proof trusted and untrusted stream than it is to completely solve the fundamental problems of useful general intelligence.

It is ridiculously difficult to write a set of watertight instructions to an intelligent system that is also actually worth instructing an intelligent system rather than just e.g. programming it yourself.

This is the monkey paw problem. Any sufficiently valuable wish can either be horribly misinterpreted or requires a fiendish amount of effort and thought to state.

A sufficiently intelligent system should be able to understand when the prompt it's been given is wrong and/or should not be followed to its literal letter. If it follows everything to the literal letter that's just a programming language and has all the same pros and cons and in particular can't actually be generally intelligent.

In other words, an important quality of a system that aspires to be generally intelligent is the ability to clarify its understanding of its instructions and be able to understand when its instructions are wrong.

But that means there can be no truly untrusted stream of information, because the outside world is an important component of understanding how to contextualize and clarify instructions and identify the validity of instructions. So any stream of information necessarily must be able to impact the system's understanding and therefore adherence to its original set of instructions.

New comment by dwohnitmok in "What young workers are doing to AI-proof themselves"

dwohnitmok — Mon, 23 Mar 2026 01:55:49 +0000

You are only looking at supply. Neither supply nor demand by themselves adequately describe prices (even in supply-demand 101 theory; in practice of course it gets significantly more complicated than just supply and demand). There are fields with few suppliers where supply is extremely cheap and fields with few suppliers where supply is extremely expensive.

Is the number of suppliers low because demand is also low or is the number of suppliers low because demand is high but supply is constrained?

A field that previously had a supply of labor in it "for the money" who all leave is indicative of the former scenario not the latter.

That does not lead to higher wages. That leads to low wages.

(There are a variety of reasons why this story is too simple and why I remain uncertain about developer salaries in the short term)

There is a broader question of whether having people who are in it for the money leave independently "causes" wages to go down (e.g. if you were to replace all such people with people "purely in it for the passion"). My suspicion is yes. Mainly because wage markets are somewhat inefficient, there are always mild cartel-like/cooperative effects in any market, people in it for passion tend to undersell labor and the people in it for the money are much less likely to undersell their labor and this spills over beneficially to the former.

Note that this broader question is simply unanswerable assuming perfect competition, i.e. a supply-demand 101 perspective (which is why it doesn't make sense to posit "perfect competition" for this question).

It posits durable behavioral differences among suppliers that are not determined purely by supply and demand which do not update reliably in the face of pricing. This is equivalent to market friction and hence fundamentally contradicts an assumption of perfect competition.

New comment by dwohnitmok in "Kotlin creator's new language: talk to LLMs in specs, not English"

dwohnitmok — Thu, 12 Mar 2026 16:56:17 +0000

> but you'll still observe small variations due to the limited precision of float numbers

No. Floating number arithmetic is deterministic. You don't get different answers for the same operations on the same machine just because of limited precision. There are reasons why it can be difficult to make sure that floating point operations agree across machines, but that is more of a (very annoying and difficult to make consistent) configuration thing than determinism.

(In general it is mildly frustrating to me to see software developers treat floating point as some sort of magic and ascribe all sorts of non-deterministic qualities to it. Yes floating point configuration for consistent results across machines can be absurdly annoying and nigh-impossible if you use transcendental functions and different binaries. No this does not mean if your program is giving different results for the same input on the same machine that this is a floating point issue).

In theory parallel execution combined with non-associativity can cause LLM inference to be non-deterministic. In practice that is not the case. LLM forward passes rarely use non-deterministic kernels (and these are usually explicitly marked as such e.g. in PyTorch).

You may be thinking of non-determinism caused by batching where different batch sizes can cause variations in output. This is not strictly speaking non-determinism from the perspective of the LLM, but is effectively non-determinism from the perspective of the end user, because generally the end user has no control over how a request is slotted into a batch.

New comment by dwohnitmok in "I was interviewed by an AI bot for a job"

dwohnitmok — Wed, 11 Mar 2026 23:38:31 +0000

Arbitrary filtering of candidates doesn't reduce the effort that it takes. Let's say 1 out of 1000 of the candidates you see is what you need. The total amount of effort to find the right candidate is still the same. But throwing out half the resumes just doubles the amount of time until you find the candidate you need (you just spread lower effort over a longer time).

On the other hand if you "raise your bar" (let's say you do so by some method that makes it twice as expensive to judge a candidate; twice as likely to reject a candidate that would fit what you need, i.e. doubles your false negative rate; but cuts down on the number of applications by 10x, so that now 1 out of 100 candidates are what you need, which isn't that far off the mark for certain kinds of things), you cut down the effort (and time) you need to spend on finding a candidate by over double.

EDIT: On reflection I think we're mainly talking past each other. You are thinking of a scenario where all stages take roughly the same amount of effort/time, whereas tmorel and I are thinking of a scenario where different stages take different amounts of effort/time. If you "raise the bar" on the stages that take less amount of effort/time (assuming that those stages still have some amount of selection usefulness) then you will reduce the overall amount of time/energy spent on hiring someone that meets your final bar.

New comment by dwohnitmok in "The changing goalposts of AGI and timelines"

dwohnitmok — Mon, 09 Mar 2026 00:04:33 +0000

Kokotajlo still believes we get AGI in the next few years. These are his most updated numbers at the moment: https://www.aifuturesmodel.com/

New comment by dwohnitmok in "The changing goalposts of AGI and timelines"

dwohnitmok — Mon, 09 Mar 2026 00:01:46 +0000

Not quite.

Kokotajlo quit because he didn't think OpenAI would be good stewards of AGI (non-disparagement wasn't in the picture yet). As part of his exit OpenAI asked him to sign a non-disparagement as a condition of keeping his equity. He refused and gave up his equity.

To the best of my knowledge he lost that equity permanently and no longer has any stake in OpenAI (even if this episode later led to an outcry against OpenAI causing them to remove the non-disparagement agreement from future exits).

New comment by dwohnitmok in "The changing goalposts of AGI and timelines"

dwohnitmok — Sun, 08 Mar 2026 19:40:24 +0000

Kokotajlo gave up all his shares in OpenAI as part of his refusal to sign a nondisparagement agreement with OpenAI.

New comment by dwohnitmok in "The changing goalposts of AGI and timelines"

dwohnitmok — Sun, 08 Mar 2026 19:38:12 +0000

Really? I view the original title as a very good summary of the overall point of the article and this new title as fairly misleading.

> It can be debated whether arena.ai is a suitable metric for AGI, a strong case can probably be made for why it’s not. However, that’s irrelevant, as the spirit of the self-sacrifice clause is to avoid an arms race, and we are clearly in one.

> Therefore, one can only conclude, that we currently meet the stated example triggering condition of “a better-than-even chance of success in the next two years”. As per its charter, OpenAI should stop competing with the likes of Anthropic and Gemini, and join forces, however that might look like.

The new title is a single, almost throwaway, line from the article.

> While this will never happen, I think it’s illustrative of some great points for pondering:

> The impotence of naive idealism in the face of economic incentives. The discrepancy between marketing points and practical actions. The changing goalposts of AGI and timelines. Notably, it’s common to now talk about ASI instead, implying we may have already achieved AGI, almost without noticing.

New comment by dwohnitmok in "Statement from Dario Amodei on our discussions with the Department of War"

dwohnitmok — Fri, 27 Feb 2026 04:30:39 +0000

> Amodei repeatedly predicted mass unemployment within 6 months due to AI

When has Amodei said this? I think he may have said something for 1 - 5 years. But I don't think he's said within 6 months.

New comment by dwohnitmok in "Show HN: Steerling-8B, a language model that can explain any token it generates"

dwohnitmok — Tue, 24 Feb 2026 15:42:18 +0000

Note that the parameters to SHAP can be things other than the model parameters (e.g. model inputs), it's very not obvious what those should be. Indeed that's often the central problem for interpretability (what are my actual features) and SHAP is entirely silent on what those features should be. SHAP could work as a final step if you have a small feature set. But I doubt that LLMs will have a small set of features for any reasonable interpretation of what they do.

New comment by dwohnitmok in "Show HN: Steerling-8B, a language model that can explain any token it generates"

dwohnitmok — Tue, 24 Feb 2026 04:40:37 +0000

SHAP would be absurdly expensive to do for even tiny models (naive SHAP scales exponentially in the number of parameters; you can sample your coalitions to do better but those samples are going to be ridiculously sparse when you're talking about billions of parameters) and provides very little explanatory power for deep neural nets.

SHAP basically does point by point ablation across all possible subsets, which really doesn't make sense for LLMs. This is simultaneously too specific and too general.

It's too specific because interesting LLM behavior often requires talking about what ensembles of neurons do (e.g. "circuits" if you're of the mechanistic interpretability bent), and SHAP's parameter-by-parameter approach is completely incapable of explaining this. This is exacerbated by the other that not all neurons are "semantically equal" in a deep network. Neurons in the deeper layers often do qualitatively different things than earlier layers and the ways they compose can completely confuse SHAP.

It's too general because parameters often play many roles at once (one specific hypothesis here is the superposition hypothesis) and so you need some way of splitting up a single parameter into interpretable parts that SHAP doesn't do.

I don't know the specifics of what this particular model's approach is.

But SHAP unfortunately does not work for LLMs at all.