Hacker News: joefourier

New comment by joefourier in "Princeton mandates proctoring for in-person exams, upending 133 year precedent"

joefourier — Wed, 13 May 2026 21:53:01 +0000

It's incredibly common all over Europe, not just Switzerland. Not only the metros but the trams and even buses often rely on this system where there's no turnstile or barrier, you just walk in.

Not sure it's about being a high trust society or not, there's frequent inspections where they block the doors, and you get a hefty fine if you're caught without a valid ticket. I certainly wouldn't call Prague or Rome or Dublin high trust societies on par with a Swiss city.

New comment by joefourier in "I have seen the dystopian future of elderly care"

joefourier — Sun, 10 May 2026 17:35:20 +0000

Personally I feel like it would be less undignified and infantilising to have a machine take care of my basic bodily functions than a human being. There's no feeling of judgement or being shamed in front of someone else, and the machine could even restore a feeling of autonomy since it would feel like you're using a tool instead of being helplessly reliant on another person's help.

New comment by joefourier in "I returned to AWS and was reminded why I left"

joefourier — Sun, 10 May 2026 13:21:30 +0000

> I also used dedicated servers in the late ’90s (and they still offer great value today). But before AWS, provisioning new hardware typically took days, not minutes.

VPSes and non-custom configs for dedicated servers were pretty instant as far as I know, I think the advantage of AWS was more that you could scale up and down much more easily since you weren’t locked down in a monthly contract, and that you could automate server provisioning through an API.

New comment by joefourier in "I returned to AWS and was reminded why I left"

joefourier — Sun, 10 May 2026 12:24:10 +0000

> Cloud computing was an absolutely mind blowing revolution - suddenly your startup could run its own computer systems in minutes without need to install and run your own systems in a data center. This was an absolute game changer, and I really drank the AWS Kool Aid down to every last drop then I licked out the cup. I was all in on AWS in a big way.

Am I the only one who remembers that VPSes and dedicated hosting services were a thing before AWS came around? Yes you had to pay for a month at a time and scaling wasn’t as instant, but it wasn’t like the only option before cloud computing was having to drive to the datacentre and install your own server.

New comment by joefourier in "Singapore introduces caning for boys who bully others at school"

joefourier — Fri, 08 May 2026 22:24:26 +0000

> In every country, men commit almost all violent crimes. In school, boys physically bully other boys. Hence the physical punishment for them.

As I've said, and @echoangle repeated, caning is used for cyberbullying, which girls do too (at a rate relatively close to boys actually). If the law was caning in response to physical bullying, and it just so happened that the vast majority of offenders were boys, I would not object on the basic of sexism (I still would not approve of schools being allowed to physically punish students).

> Yes, for homo sapiens, the female is more fragile than the male. This is basic biology. I'm sure that in praying mantis society, females get harsher punishments.

There's no way the typical 16 year old girl is more fragile than the typical 9 year old boy, yet only the latter is subject to this punishment. Until children reach the age of 12 or so the strength difference is quite minor (and there's even a brief period where girls are taller and heavier).

Also it's absurd to punish demographics differently based on their statistical averages. Redheads are less sensitive to pain, should your hair colour determine how many strokes of the cane you get?

New comment by joefourier in "Singapore introduces caning for boys who bully others at school"

joefourier — Fri, 08 May 2026 16:34:55 +0000

Boys and girls being different does not mean one sex deserves corporal punishment and one does not. Girls are equally capable of cyberbullying (which is covered by this law), why should they only get detention while a 9 year old boy has to suffer physical violence? What does this teach girls - that they can get away with more? That they're more fragile than even a prepubescent boy?

If the law punishes one demographic less severely for the same actions, that's injustice. No different in principle from pre-modern practices where if a noble maimed a commoner, they'd just need to pay a fine, while if a commoner did the same, they'd be put to death.

New comment by joefourier in "The Disadvantages of an Elite Education (2008)"

joefourier — Wed, 06 May 2026 17:17:57 +0000

> There he was, a short, beefy guy with a goatee and a Red Sox cap and a thick Boston accent, and I suddenly learned that I didn’t have the slightest idea what to say to someone like him. So alien was his experience to me, so unguessable his values, so mysterious his very language, that I couldn’t succeed in engaging him in a few minutes of small talk before he got down to work.

I'm a self-taught software developer with no university education and I too am socially awkward in front of tradespeople in my house. I don't think this is about Ivy League degrees, just being a nerdy intellectual who's bad at small talk and doesn't have any topics in common with a blue collar worker.

New comment by joefourier in "Train Your Own LLM from Scratch"

joefourier — Tue, 05 May 2026 12:35:52 +0000

Calling anything "large" in computing is problematic since hardware keeps improving. GPT-1 was an LLM in 2017 and had 117M parameters, when did it stop being large?

GPT would have been a better term than LLM, but unfortunately became too associated with OpenAI. And then, what about non-transformer LLMs? And multimodal LLMs?

Maybe we should just give up, shrug and call it "AI".

New comment by joefourier in "Claude.ai unavailable and elevated errors on the API"

joefourier — Tue, 28 Apr 2026 21:02:09 +0000

> Local models sound great until you realize you dont get alot of the features that we implicitly expect from hosted models. Many things would require additional investment into the operations and setup to get to a comparable system. We ended up wanting things that would require us to roll our own memory system, harnesses for the model, compliance needs, and security.

That's not local models vs hosted models, that's using the enterprise services from Anthropic. Any local LLM inference engine such as VLLM gives you an OpenAI compatible API with the exact same features as a hosted model.

I'm not sure what your use case is, but I personally found Anthropic's offerings lacking and inferior to open source or custom-built solutions. I have yet to see any "memory" system that's better than markdown files or search, and harnesses for agentic AIs are dime a dozen.

New comment by joefourier in "Microsoft and OpenAI end their exclusive and revenue-sharing deal"

joefourier — Mon, 27 Apr 2026 21:35:19 +0000

AGI means artificial general intelligence, as opposed to artificial narrow intelligence. General intelligence means being able to generalise to many tasks beyond the single narrow one that an AI has been designed/trained on, and LLMs fit that description perfectly, being able to do anything from writing poetry, programming, summarising documents, translating, NLP, and if multi-modal, vision, audio, image generation... not all to human-level performance, but certainly to a useful one. As opposed to previous AI that was able to do only a single thing, like play chess or classify images, and had no way of being generalised to other tasks.

LLMs aren't artificial superintelligence and might not reach that point, but refusing to call them AGI is absolutely moving the goalposts.

New comment by joefourier in "Quirks of Human Anatomy"

joefourier — Sun, 26 Apr 2026 12:29:36 +0000

There's also a difference between having no immediate use, and having no reason to exist. From what I understand, sexual differentiation works by having the Y chromosome act as a switch, and both sexes have to share the same blueprint with hormones guided the development of their organs.

For males not to have nipples, they'd need to be actively destroyed, which poses a risk for females to also not have nipples, which is much worse than males having harmless, inactive nipples.

New comment by joefourier in "An update on recent Claude Code quality reports"

joefourier — Fri, 24 Apr 2026 03:29:21 +0000

They would honestly have been better off refusing customers if compute is so limited. Degrading the quality leads to customers leaving in the short term, and ruins their long term reputation.

But in either case, if compute is so limited, they’ll have to compete with local coding agents. Qwen3.6-27B is good enough to beat having to wait until 5PM for your Claude Code limit to reset.

New comment by joefourier in "5x5 Pixel font for tiny screens"

joefourier — Wed, 22 Apr 2026 19:14:22 +0000

1. Improving the colourisation algorithms has value, it might be that the available colourised photos of celebrities have inaccurate colours or are of poorer quality than say, one done with a diffusion model that can be instructed about the colours of certain objects

2. Don’t forget about B&W films! Getting automatic methods to be consistent over a long length is still not 100% solved. People are very interested in seeing films from WW1 and WW2 in colour, for instance.

3. Plenty of people (myself included) have relatives in their 80s or 90s. Or maybe someone wants to see their ancestors from the 19th century in colour for whatever reason?

New comment by joefourier in "5x5 Pixel font for tiny screens"

joefourier — Wed, 22 Apr 2026 17:57:08 +0000

> Too bad "tiny screens" pretty much do not exist anymore. Screens with hundreds of pixels on each side are very cheap already.

Find me a 0.66" OLED display for ~$1 that has hundreds of pixels on each side then.

> It reminds me people who research "colorizing grayscale photos", which do not exist anymore either (if you want a color photo of someone you met in your life, there probably exists a color photo of that person).

What train of thought led you to think people are primarily researching colorising new B&W photos? As opposed to historical ones, or those of relatives taken when they were young? You can take a colour photo of granddad today but most likely the photos of him in his 20s are all in black and white.

New comment by joefourier in "Migrating from DigitalOcean to Hetzner: From $1,432 to $233 With Zero Downtime"

joefourier — Sat, 18 Apr 2026 15:47:44 +0000

Hetzner also offers a VPS with superior specs to their old DO server for €374.99/month, or €0.6009/hour. They could just switch to a VPS temporarily while waiting for the hardware fix.

Although since they were running a LEMP server stack manually and did their migration by copying all files in /var/www/html via rsync and ad-hoc python scripts, even a DO droplet doesn't have the best guarantee. Their lowest-hanging fruit is probably switching to infrastructure as code, and dividing their stack across multiple cheaper servers instead of having a central point of failure for 34 applications.

New comment by joefourier in "Claude Opus 4.7"

joefourier — Thu, 16 Apr 2026 17:05:54 +0000

I used the $60/mo subscription and I bet most developers get access to AI agents via their company, and there was no difference. They should have reduced the rate limits, or offered a new model, anything except silently reduce the quality of their flagship product to reduce cost.

The cost of switching is too low for them to be able to get away with the standard enshittification playbook. It takes all of 5 minutes to get a Codex subscription and it works almost exactly the same, down to using the same commands for most actions.

New comment by joefourier in "ML promises to be profoundly weird"

joefourier — Mon, 13 Apr 2026 12:32:12 +0000

Not to be rude, but you're arguing with a machine learning engineer about the basics of neural network architectures :P

> The network has a fixed number of input neurons. You have to put something in all of them.

The way transformers work is that they apply the same "input neurons" to each individual token! It's not:

Token 1 -> Neuron 1 Token 2 -> Neuron 2 Token 3 -> Neuron 3... With excess neurons not being used, it's

Token 1 -> Vector of dimensions N -> ALL neurons Token 2 -> Vector of dimensions N -> ALL neurons Token 3- > Vector of dimensions N -> ALL neurons ...

Grossly oversimplified, in a typical transformer layer, you have 3 distinct such "networks" of neurons. You apply them each token, giving you, for each token, a "query", a "key", and a "value". You take the dot product of there query and key, apply softmax, then multiply it with the value, giving you the vector to input for the next layer.

A probability distribution obviously contains a probability for every possible next token. But the whole probability distribution (which adds up to one) only predicts the next ONE token. It predicts what is the probability of that one token being A, or B, or C, etc, giving a probability for each possible token. It's still predicting only one token. In anything but the last column, the numbers are junk. You can treat them as probability distributions all you want, but the system is only trained to get the outputs of the last column "correct". Not quite, the reason transformers train fast is because you can train on all columns at once.

For tokens 1, 2, 3, 4, ... you get predictions for tokens 2, 3, 4, 5... Typical autoregressive transformer training uses a causal mask, so that token 1 doesn't see token 2, enabling you to train on all the predictions at once.

New comment by joefourier in "ML promises to be profoundly weird"

joefourier — Fri, 10 Apr 2026 14:41:37 +0000

Sorry but that's false, you are confusing transformers as an architecture, and auto-regressive generation, and padding during training.

Standard transformers take in an arbitrary input size and run blocks (self and possibly cross attention, positional encoding, MLPs) that don't care about its length.

> They also have a fixed output of one probability distribution for the next one token.

No, in most implementations, they output a probability distribution for every token in the input. If you input 512 tokens, you get 512 probability distributions. You can input however many tokens you want - 1, 2048, one million, it's the same thing (although since standard self-attention scales quadratically you'll eventually run out of memory). Modern relative embeddings like RoPE can support infinite length although the quality will degrade if you extrapolate too far beyond what the model saw during training.

For typical auto-regressive generation, they are trained with causal masking/teacher forcing, which makes it calculate the probability for the next token. During inference, you throw away all but the last probability distribution and use that to sample the next token, and then repeat. You also do this with an RNN. An autoregressive CNN (e.g. WaveNet) would be closer to what you described in that it has a fixed window looking backwards.

But a transformer doesn't have to be used for auto-regressive generation, you can use it for diffusion, as a classifier model, for embedding text. It doesn't even see a sequence as spatially organised - unlike a CNN or an RNN it doesn't have architectural intrinsic biases about the position of elements, which is why it needs positional embeddings. This lets you have 2D, 3D, 4D, or disordered elements in a sequence. You can even have non-regularly sampled sequences. (Again this is for a classic transformer without sliding window attention or any other special modifications).

> (padding the unneeded context window with null tokens). To have efficient training, you pad all samples in a batch to have the same length (and maybe make it a power of two). But you are working with a single sequence, the length is arbitrary up to hardware limitations, and no padding is needed.

New comment by joefourier in "ML promises to be profoundly weird"

joefourier — Thu, 09 Apr 2026 02:23:51 +0000

The title of the article is “The Future of Everything is Lies, I Guess” and the first part is literally complaining about LLMs being bullshit machines, while the author proceeds to tell confabulations (or lies) of his own. Is there not a bit of irony in that?

If you’re a non-expert in a field, I don’t think it’s a good sign if you’re writing a 10 part article about that field’s impact on society and getting basic facts wrong. How can I trust that the conclusions will be any more credible?

New comment by joefourier in "ML promises to be profoundly weird"

joefourier — Thu, 09 Apr 2026 02:01:01 +0000

> That does not scale anywhere near as well as Transformers in compute spend. It's paper/research novelty. Nobody will be doing this for production.

What exactly makes you so confident?

The world is not just labs that can afford billion dollar datacentres and selling access to SOTA LLMs at $30/Mtokens. Transformers are highly unsuitable for many applications for a variety of reasons and non-linear RNNs trained via parallel methods are an extremely attractive value proposition and will likely feature in production in the next products I work on.

> I guess there's some misunderstanding here because Qwen is 100% a transformer, not a hybrid RNN/LSTM whatever.

See the Qwen3.5 Huggingface description: https://huggingface.co/Qwen/Qwen3.5-27B > Efficient Hybrid Architecture: Gated Delta Networks combined with sparse Mixture-of-Experts deliver high-throughput inference with minimal latency and cost overhead.