Hacker News: rstuart4133

New comment by rstuart4133 in "Bun has an open PR adding shared-memory threads to JavaScriptCore"

rstuart4133 — Sun, 21 Jun 2026 03:22:49 +0000

> I was mentored out of it, and while I still like my patches to be complete, I balance that with the available bandwidth of the team and what the team can reasonably actually process.

I struggle with the same issue. In my experience you can't reduce the total number of lines. If the feature took 10k or 15k loc to deliver it, you aren't going to be able to reduce that meaningfully.

You can usually break it into stacked series of commits. New code can be split up into stand alone modules, which all compile and pass their unit tests. They could even be shipped, although they wouldn't be used because the changes to the UI are always the last piece of the puzzle. If you are refactoring, you can usually find a way to split the refactor into smaller steps, each building on the previous one. That is almost certainly possible in this case.

The issue with both approaches is while you can review each step independently, what you miss by looking at just that commit is the motivation for doing it. You can only get that from the big picture, and to get the big picture you need the entire 10k or 15k loc available.

That means you have to push the entire series of commits. If you want to make it plain they are individually reviewable you push them as stacked PRs. Either way, it's a 15k loc push.

I don't see a way out of that for the same reason neither bottom up nor top down design works on their own. You have two edges - the upper (often the UI), and the lower (the OS, standard library - things you have to use to get anything done). You work from both edges simultaneously, each working towards the other, hopefully so that when they meet in the middle and the two fit together nicely. The point is, you have to review like that too. You can't just look at how neatly the blocks are stacked on the foundations, you have to evaluate if they are taking the best route to the destination. The review can fail in both ways - the UI can be beautiful but it stands on a mess, or the code could have built up a beautiful series of abstractions that bubble through to the top level and ultimately confuse the end user. So you have to review the code from both perspectives, and to do that you ultimately need to get your head around all 15k loc.

This means a reviewer demanding they be spoon fed a few thousand lines of code at a time is being as unreasonable as the person delivering 15k loc in one commit. They are demanding a simple solution to their problem, and it is wrong. They should be demanding all 15k loc be delivered in the form the author intends to ship, but split into digestible commits that clearly explain the path and reasoning taken between the top and bottom edges, so both the top and bottom level designs are plainly visible.

What happens when I do that is I get into fights over forced pushes. Everyone hates them, and for good reason. They asked for a simple change in their review, and what they want to see is a small commit reflecting their request. Hiding that by doing a rebase is met with howls of pain: "no forced pushes!". So you insert your commit reflecting just the change they asked for into the stack of commits the large feature necessitated. Doing that rebases every commit that depends on it, of course. You push the result and are treated with a chorus of "NO FORCED PUSHES".

Forbidding all forced pushes makes about as much sense as forbidding a 15k loc change, even through its well-structured into commits. It makes me wonder if unis bother to teach modern software engineering practices.

New comment by rstuart4133 in "U.S. science is in chaos"

rstuart4133 — Wed, 17 Jun 2026 21:17:58 +0000

Those places grew out of an approach that had its origins in WW2. It was pretty obvious to the powers that be the USA won because science and engineering produced more and better weapons, transport and logistics than the enemy could. They eventually squashed both Germany and Japan with sheer industrial might.

Post WW2, the USA continued the same approach by adopting the Vannevar Bush model, which boiled down to the USA pouring money into basic research, which is never profitable. That fed the companies like the ones you list, who were willing to make bets on medium-term things that might return a profit in a decade or so. If the USA's dominance of world science and engineering in the later 20th century is any indication, it worked a treat.

The Vannevar Bush model started to be wound back in the Reagan years, and Trump seems bent on excising it entirely. Other countries noticed its success. Most OECD countries put a few percent of public money into basic research now. The country that seems to have really taken the lesson to heart is China. They've gone way beyond what the Vannevar Bush model did even in its heyday. The end result is they dominate some areas of science and engineering and consequently manufacturing now (who here remembers Huawei was the brains behind 5G), and now the USA has thrown in the towel that dominance will grow to cover most areas in time. The gap between the West and China on AI and semiconductors is at most a few years.

The USA is crying China is cheating with subsidies and yes that's true - for example it seems the AI models are mostly developed using public money, whereas the USA is relying on VC funding to do the same. The USA's funding of AI development will very likely slow down after the IPOs happen and the companies must become profitable. China's funding of AI won't slow down.

This is the result of a policy choice China made long ago in the Deng Xiaoping era, back in the 1980s. It's taken 40 years to bear fruit, but my it is fruiting vigorously now. The USA position is also a consequence of policy choices it's been making over the last 40 or so years, starting in the Reagan era.

If you want to see how far this has gone, look up: https://www.aspistrategist.org.au/aspis-critical-technology-... It makes for sobering reading. Some key metrics they measured:

- Research Leadership: China leads the world in high-impact research in 69 out of 74 critical technologies.

- Recent Overtakings: China recently overtook the US in foundational AI and biotech fields, including Natural Language Processing (NLP) and genetic engineering.

- Monopoly Risk: ASPI tracks 41 technologies where China's research concentration is so high it poses a severe future monopoly risk, largely driven by massive hubs like the Chinese Academy of Sciences.

There is now little doubt how this will pan out over the next few decades. The USA and the rest of the West end up buying products made in China, using Chinese technology, and protected by a Chinese patent wall. They will wonder what happened. They will try and recover by going to Chinese universities, and adopting parts of Chinese culture. It won't be a big change for most of the West - the name on the label just changed from the USA to China. It will come as a hell of a shock to most of the USA.

The ironic thing will be - the change has very little to do with the things people will focus on, like who manufactures what, or patent walls, or political systems, or excellence in universities. China pulled that off by adopting some key USA policies, while the USA abandoned those same policies.

New comment by rstuart4133 in "Why AI hasn't replaced software engineers, and won't"

rstuart4133 — Tue, 16 Jun 2026 10:23:38 +0000

The feeling I get is LLM's are the new Excel. I've seen lots of people develop little web based apps that tickle their interest. Things like dashboards for data that would didn't fit on their phone, table tennis scoring (really!), small account keeping apps, plotting calculated GPS path on a map.

These are tiny single use stuff. Exactly the sort of thing the company nerd would create spreadsheets for. The GUI is more advanced, but it is no more maintainable or scalable.

New comment by rstuart4133 in "Making espresso with ultrasound"

rstuart4133 — Tue, 16 Jun 2026 09:33:07 +0000

> having the coffee warm is kinda important.

Cold drip coffee is a thing, done well a very nice thing.

New comment by rstuart4133 in "OpenAI mulls slashing prices as it competes with Anthropic for users"

rstuart4133 — Fri, 12 Jun 2026 00:58:19 +0000

GLM 5.1 gets close to 4.6. It can happily run for hours and achieve a result. It given it bugs like a race condition that lead to a count being out by 1 after millions of operations, somewhere in a hundred thousand lines of C code littered with locks and atomic swaps, and it found (as did Opus). Most other models can't.

I'm using Fable now and GLM 5.1 doesn't really compare. But it's literally 1/20 the price. I can't use Fable for coding - it's too expensive. So now we have three levels of models - lightweight ones you dispatch en masse to find things, ones capable of agentic coding tasks that can run for hours like Opus, and GLM (and possibly open source ones - I've only tried a few), and now Fable, which is a truly helpful "architecture buddy". Fable still makes many, many, mistakes, so you have to review every word it writes.

New comment by rstuart4133 in "Thermodynamics rules future orbital data centers"

rstuart4133 — Thu, 11 Jun 2026 22:15:01 +0000

Why did the word "skynet" pop into my mind?

New comment by rstuart4133 in "I Saw the Future of Windows at Microsoft Build, and It's Unrecognizable"

rstuart4133 — Tue, 09 Jun 2026 23:45:09 +0000

> prioritizing security and hardware that's capable of running local AI models.

Just like the Googlebooks do. It seems everybody is heading in the same direction. Apple already does it. Their high end M4 chips with the 512 bit RAM busses are bigger than the 256 bit planned for the next generation of high end Intel and NVIDIA chips.

The light at the end of the tunnel is very bright now, but I still don't know if it's a train.

New comment by rstuart4133 in "What it feels like to work with Mythos"

rstuart4133 — Tue, 09 Jun 2026 23:10:54 +0000

> Humans are very expensive, so the equation almost always falls against them.

You underestimate what these models cost. Uber's budget is $1,500/dev/month. I gather that was put in place because the dev's were going through $6,000/dev/month, which Uber decided could not be cost justified.

Fable costs at least twice as much, or $12,000/dev/month.

Fable can apparently work for hours without supervision, which means a skilled engineer can now have it working on many tasks concurrently. I would not be at all surprised if they can put a nought or two on that number. If you do that, you are well out of "what a human costs" territory.

New comment by rstuart4133 in "Anthropic, please ship an official Claude Desktop for Linux"

rstuart4133 — Tue, 09 Jun 2026 22:06:20 +0000

> Some problems can't be solved without cooperation with the developer.

Those problems are the ones package maintainer forwards upstream.

It's well understood common and common workflow because lots of closed source packages are distributed by distro's now. Debian distributes firmware; Ubuntu and RedHat distribute closed source drivers.

New comment by rstuart4133 in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

rstuart4133 — Tue, 09 Jun 2026 04:39:27 +0000

The Chinese economics: possibly the USA's experience.

It was pretty clear the USA won World War 2 because it out produced and out innovated everyone else. Probably with that in mind, after World War 2 the USA adopted the "Vannevar Bush" model, summarised in this picture: https://www.researchgate.net/figure/annevar-Bushs-Science-th... The idea is to jump start R&D through public funding. The hoped for outcome was that R&D feed private enterprise, leading to a productivity boom.

The boom happened, and the USA did seem to out-compete everybody else in R&D, science, and the products they delivered for decades after that.

That way of doing things seems to have faded over time in the USA. The decline seemed to coincide with the rise of Neo-econmics, and now of course it's been obliterated by Trump. He's very keen to fund Intel to produce chips in a year or two's time (which is something the stock market and banks do perfectly well), but funding basic science is getting drastic cuts.

Still other countries noticed the rise of the USA, and some adopted similar funding models for basic R&D. China seems to have picked it up with gusto, both subsidising R&D and STEM training, leading to huge numbers of engineers and scientists. Whether it will lead to an economic boom remains unknown, but acceleration of ideas and innovations coming out of China seems undeniable. More recently, Ukraine showered its local engineering garages with funds in the hopes of getting a similar outcome to the USA in WW2. It looks like it worked. If the Iran war continues, it's entirely possible arms trade will reverse: the USA could well start buying drones off Ukraine.

New comment by rstuart4133 in "Anthropic, please ship an official Claude Desktop for Linux"

rstuart4133 — Sun, 07 Jun 2026 21:10:26 +0000

> Can confirm. At a past company we worked hard to release a Linux desktop client for our customers who wanted it ... you’re getting peppered with complaints from people using distributions you’ve never heard of ... but the problem is in upstream somewhere

You have taken on the work the distributions do in the open source world. No upstream open source developer takes that on. Instead of getting bug reports from users, upstream developers insist bug reports are filtered by the distro maintainer first. They fix problems on their side so you never see them, and the ones that do make it through have been triaged. It's a win for everyone.

So the solution is to handle it the way they do. Choose a couple of baselines: maybe Debian Stable and Fedora. Publish packages for them, and make it plain they are only certified for those platforms. Make the rest someone else's problem: if you want it working on distro X, you package it for distro X. You've done the bulk of the work for them anyway, as most of them are either Debian or RPM based.

New comment by rstuart4133 in "When AI Builds Itself: Our progress toward recursive self-improvement"

rstuart4133 — Fri, 05 Jun 2026 04:21:27 +0000

> no developer should be proposing branches over 1k loc

I've seen that reaction many times. It seems to work well enough when someone is maintaining existing code. However, greenfield projects can often require literally orders of magnitude more code to deliver something that can be integration tested.

The first step is to break it up into a stack of commits. Each one must compile and pass its unit tests, of course. Keeping it under 1k loc of released executable code is usually easy, but often becomes difficult to impossible if you want well commented code with excellent unit test coverage.

Assuming you have kept all your commits under 1k loc, there is still the problem of whether you present them in one PR, or as a stack of PRs. The issue with a stack is why an API is designed a certain way often isn't evident until you see how it's used. Responses to PR comments are explanations that point to later PRs in the stack, which is irritating for both the reviewer and the author.

I haven't found a good solution. I'm not sure there is one.

New comment by rstuart4133 in "We Uncovered a Hidden Wealth Transfer in the SpaceX IPO. You're Holding the Bag [video]"

rstuart4133 — Thu, 04 Jun 2026 02:34:20 +0000

> I think the hate about this idea is unwarranted.

I doubt the hate is about passive investment funds owning SpaceX stock. If the rules weren't changed and the index funds ended up owning a lot of SpaceX anyway, no one would care.

The hate is because the index funds used those indexes because they avoided including partial IPOs like this, I guess because they could be prone to manipulation. The funds apparently never imagined the rules controlling how the indexes would be changed. The rage was triggered because not only were they changed, but how the change came about.

To wit:

- NASDAQ owns both an index and an exchange.

- SpaceX said "we will list on your exchange, if you change the rules of the index".

- The rule change they wanted looks allow the very manipulation the index funds were trying to avoid. The indexes aren't meant to be speculative, they are a way to follow the average of blue chip stocks.

- This is speculative, but it looks like SpaceX is hoping to manipulate the index funds into purchasing large quantities of their stock, in order to give the early investors a plump exit. If true, the stock would then crash, leaving the index funds holding the bag.

- The NASDAQ allowed themselves to be bribed.

It's possible of course the IPO won't go as planned, or the market will look at the 54 P/E and run away screaming in that initial 15 days, causing the price will plummet. Or it's possible the early investors will throw money at the stock until the 15 days it up to sustain the price. Who knows. Interesting times.

New comment by rstuart4133 in "Michael Burry says neither SpaceX nor Anthropic is worth $1T"

rstuart4133 — Tue, 02 Jun 2026 22:42:54 +0000

The rational was: SpaceX will list with you, if you change that rule for us.

New comment by rstuart4133 in "US healthcare still stupidly expensive, with pathetic outcomes, study finds"

rstuart4133 — Mon, 01 Jun 2026 20:55:12 +0000

> with most the rest of western Europe dead last along with Canada and Australia.

Just for completeness: I presume that is referring to government owned hospitals in Australia, which are free. There are also privately owned hospitals. I've not had to wait at a private hospital.

Private hospitals are expensive, but I suspect not as expensive as USA hospitals. The price is held in check by the alternative of "wait a little longer, and it's free".

New comment by rstuart4133 in "Please Do Not Vibe Fuck Up This Software"

rstuart4133 — Sun, 31 May 2026 22:35:03 +0000

That makes the original complaint look, well just plain wrong.

This wasn't "unwanted new features". Tridge was fixing a security issue, related to a bug report. I sympathise - we are all getting slammed with security issues. Fixing them isn't optional. I can't say I enjoy returning to decade old software to do it - so colour me impressed that tridge is putting in the effort.

I'm also guilty of using LLMs to help me get past this mess. I dunno what tridge is doing - but I check every line of code it spits out. Nonetheless, I have no doubt bugs slipping through is a real danger. I haven't looked at the code in a long while, I'm not as familiar with it as I once was. So a bug slipping through is not a big surprise.

Which brings us to the one odd thing about the blow up. The original complainer seems very protective of his backup system - yet tridge's commit was only 2 weeks ago. I know tridge is good - but surely you treat this as alpha software. What was he thinking? Maybe he has a bit to learn about building reliable systems himself.

New comment by rstuart4133 in "Claude Opus 4.8"

rstuart4133 — Sun, 31 May 2026 21:59:43 +0000

Thanks for the link to the 3b1b video. I enjoyed the entire series, and some of those he linked to as well. The linked ones explained the history of how they got there - which was new to me and really helps cement some ideas.

However, I didn't learn much. Which means as far as I can tell, my mental model of how it works wasn't far off. So yes - I was already aware one interpretation of how these things work is that LLMs turn concepts into vectors in a high dimensional space, and high level abstractions are linear summations of these vectors.

Given that model, parts of your comment don't make much sense to me. For example "it's able to predict N different full paths WITHOUT actually exploring them fully" - why do you think that's so? And "GRAM allows it to look at the different hyper words and say" - no, GRAM does not look at different hyper words, or at least no more than a non-GRAM LLM does. Only the last word (the 4k vector or whatever dimension they are using) is fed back through the machine.

Regarding "Once decoded into a rigid real word, the multidimensional nuance of the continuous vector is lost!". Yes, and no. Yes, the decoded word doesn't mean much compared to the vector it was derived from. But the machine isn't operating on just the last word. It's operating on the information in the entire context window (which could be 100's of thousands of tokens).

There is undeniably a lot of nuance encoded in one vector, and yes you're right - it can't be represented with just one word. But it can be represented using a string of words, and generating that representation by spitting out more words is partially what an LLM is doing when it generates text. It's only partially doing that because it's randomly mutating that last token as it goes, and pulling in information into the vector from the MLP layers.

Re "LLMs ... add so much more meaning to a word than a human ever could imagine try to put 10,000 dimensions on the word "the" .... OBVIOUSLY makes them enormously less intelligent!". No it doesn't obviously do that. A vector is maybe 16k bytes (depending on the number of dimensions). That corresponds to around 5000 words. Humans have no trouble connecting those 5000 words into a single concept - which would presumably spell out the concept represented by the vector. Same meaning - just encoded differently. Using computer science terminology - we could say the 16k vector is serialised into a sequence of words.

So - two representations of the same thing. What humans do that LLMs can't do right now is squeeze those 5000 words into something tiny. For example, the word "LLM" is a huge concept, squeezed into 3 letters. The human knowledge and thought seems to be based on that one trick - naming abstract concepts, and then using them as building blocks for more abstract concepts. LLMs meanwhile are stuck with their fixed size vectors. They cannot add new concepts to their vocabulary by modifying their weights. Where LLMs seem to win is their short-term memory (of the order of 200K tokens), and they are about 1 million times faster (cycle time of the order of 1 nanosecond vs 1 millisecond), which gives their ability to reason very different properties to human reasoning. Sometimes this means they are (dramatically) better, and sometimes they are worse.

I don't see how GRAM on its own is going to make LLMs 3 orders of magnitude faster than they are now. That 200k token context window is hideously expensive and maintaining it grows O(N^2). As you observed, they can already compress a 100,000 word book into the single token encoded in the last word (although beyond 100k words that compression starts to look increasingly lossy). To get the 3 orders of magnitude speed up, they are going to have to start taking advantage of that compression, and start throwing away the part of that 200k context they have already encoded. So far, no one has deployed something that does it well.

New comment by rstuart4133 in "Claude Opus 4.8"

rstuart4133 — Fri, 29 May 2026 04:22:27 +0000

omg. So is the TL;DR:

- Avoiding building something that turns the universe to paper clips in order to satisfy a prompt is a problem they are genuinely struggling with now.

- They do it by spying on the words generated during CoT. "I can do this quickly by turning the Universe into paper clips. Wait - they won't like that. But there is no need to mention it." - SMACK!

- But you can speed things up immensely (3 orders of magnitude!) by skipping the output layer (and I guess compressing the context window / KV cache, otherwise 3 orders of magnitude seem impossible) which would give someone who pulled it off a huge advantage.

- Downside is humans can't see the CoT anymore, so they can't see what the machine is planning. Keeping the final output layer to spy doesn't work because the model uses its hidden reasoning to sanitise it.

How can this possibly go wrong?

New comment by rstuart4133 in "Justice Dept. Is Said to Open Criminal Inquiry of E. Jean Carroll"

rstuart4133 — Fri, 29 May 2026 01:05:09 +0000

Good excuses work. But they have to be good:

- "travelling": why didn't you postal vote?

- "medical": we allow pre-voting; was it planned?

Just paying the fine without arguing on time gets you a 50% discount in state voting. And it's a token fine - $20 or so from memory for federal ballots. Besides you don't have to vote. The requirement is you turn up, or give them a piece of paper if you post it. This is deemed so important when we designed our own voting machines (which were never deployed), they had an explicit "I decline to vote" option.

The paper can be blank, but people are often more imaginative. I can't find the reference to it now, but one paper had penises of different lengths drawn beside each option. The Australian Electoral Commission is required by law to "save" votes, which means that even if it wasn't marked strictly according to the rules if a reasonable person could infer the intention, it counted. This particular vote worked its way through the courts, where it was eventually struck down. Reason: it was impossible to know if a longer penis meant it was more or less favourable to the candidate.

About 8% of ballots can't be saved. Of those around about 2% are deliberately spoilt - the rest are mistakes. If Vanessa Teague's voting machines (with the decline button) had been deployed, the remaining 6% would have gone away.

New comment by rstuart4133 in "Justice Dept. Is Said to Open Criminal Inquiry of E. Jean Carroll"

rstuart4133 — Thu, 28 May 2026 10:41:47 +0000

> This is true in the absolute, not just in this special case but in almost all national level elections of any democratic country.

Australia would like to remind everyone we have compulsory voting.