Hacker News: ma2rten

New comment by ma2rten in "Ask HN: What was your "oh shit" moment with GenAI?"

ma2rten — Sun, 07 Jun 2026 04:30:18 +0000

My personal "oh shit" moment was in 2015, when this paper came out: https://arxiv.org/abs/1506.05869

It showed me that a model trained only on movie subtitles data exhibited some (very primitive) reasoning. I have been working on Deep Learning and later LLMs ever since.

New comment by ma2rten in "Yann LeCun raises $1B to build AI that understands the physical world"

ma2rten — Tue, 10 Mar 2026 15:19:55 +0000

Erm, ... OpenAI has hyped when it started and it took 6 years to take off. It's way to early to declare the SSI and Thinking Machines have failed.

New comment by ma2rten in "OpenAI declares 'code red' as Google catches up in AI race"

ma2rten — Tue, 02 Dec 2025 16:17:18 +0000

Delaying doesn't necessarily mean they stop working on it. Also it might be a question of compute resource allocation as well.

New comment by ma2rten in "Show HN:emma019 Real-Time AI-Powered Texas Hold'em in Python and Flask"

ma2rten — Sat, 22 Nov 2025 00:02:41 +0000

You can add Show HN to the title for your own projects. They will show up in the show tab.

New comment by ma2rten in "How Airbus took off"

ma2rten — Sun, 09 Nov 2025 05:14:28 +0000

Europe is quite conservative, in the sense that they would not invest billions into an unproven venture. It makes sense that it would excel at an industry that requires putting safety above everything.

New comment by ma2rten in "BERT is just a single text diffusion step"

ma2rten — Tue, 21 Oct 2025 02:30:29 +0000

It's actually true on many levels, if you think about is needed for generating syntactically and grammatically correct sentences, coherent text and working code.

New comment by ma2rten in "BERT is just a single text diffusion step"

ma2rten — Mon, 20 Oct 2025 16:08:29 +0000

Interpretability research has found that Autoregressive LLMs also plan ahead what they are going to say.

New comment by ma2rten in "Boeing has started working on a 737 MAX replacement"

ma2rten — Wed, 01 Oct 2025 03:55:53 +0000

Your use of the phrase makes no sense. It's the "no parking" that proofs the rule and not the exception.

New comment by ma2rten in "Are OpenAI and Anthropic losing money on inference?"

ma2rten — Fri, 29 Aug 2025 02:47:50 +0000

You can also look at the price of opensource models on openrouter, which are a fraction of the cost of closed source models. This is a market that is heavily commoditized, so I would expect it reflect the true cost with a small margin.

New comment by ma2rten in "Curious about the training data of OpenAI's new GPT-OSS models? I was too"

ma2rten — Sun, 10 Aug 2025 08:19:13 +0000

Presumably the model is trained in post-training to produce a response to a prompt, but not to reproduce the prompt itself. So if you prompt it with an empty prompt it's going to be out of distribution.

New comment by ma2rten in "MIT study explains why laws are written in an incomprehensible style"

ma2rten — Tue, 17 Dec 2024 05:53:20 +0000

The study seemed not very convincing to me, at least the way it was described in the article. To summarize: they asked crowdworkers to write a law who used legalese, but not when writing news stories about it or when explaining the law. From that the researchers concluded that people use legalese to convey authority.

But what if people just imitated the writing style of existing laws, but not with the intention to make it authoritative but because that is what they understood their task to be?

New comment by ma2rten in "Ask HN: How does Alexa avoid interrupting itself when saying its own name?"

ma2rten — Sat, 29 Jun 2024 15:14:56 +0000

This is the same problem as echo cancellation on calls. This is something that built into a lot of software and hardware.

New comment by ma2rten in "Maxtext: A simple, performant and scalable Jax LLM"

ma2rten — Wed, 24 Apr 2024 15:01:35 +0000

t5x was used to train PaLM 1.

New comment by ma2rten in "Travelling with Tailscale"

ma2rten — Mon, 15 Apr 2024 12:26:24 +0000

I have an upcoming trip to Europe, which I am quite excited about. I wanted to set up a Tailscale exit node to ensure that critical apps I depend on, such as banking portals continue working from outside the country.

I've never had an issue accessing banking portals from Europe.

New comment by ma2rten in "Apple cuts off Beeper Mini's access"

ma2rten — Fri, 08 Dec 2023 23:08:05 +0000

Apples cares about the privacy and security of iPhones as a differentiator.

New comment by ma2rten in "Gemini AI"

ma2rten — Wed, 06 Dec 2023 18:43:21 +0000

Noam.

New comment by ma2rten in "Gemini AI"

ma2rten — Wed, 06 Dec 2023 18:06:40 +0000

No this is not correct. Arguably OpenAI invented LLMs with GPT3 and the preceding scaling laws paper. I worked on LAMDA, it came after GPT4 and was not as capable. Google did invent the transformer, but all the authors of the paper have left since.

New comment by ma2rten in "OpenAI is exploring making its own AI chips"

ma2rten — Fri, 06 Oct 2023 15:09:06 +0000

Both Amazon and Google already do this, there are reports that Microsoft does as well.

New comment by ma2rten in "How Transformers Work"

ma2rten — Fri, 06 Oct 2023 11:33:37 +0000

Yes, I think that is a reasonable way to think about it, in my opinion. However, with the language modeling objective it predicts the next token and because of the residual connections each intermediate layer is in the same space. So, maybe it would be more accurate to say that it is an increasingly accurate representation of the next token.

New comment by ma2rten in "How Transformers Work"

ma2rten — Thu, 05 Oct 2023 10:27:32 +0000

Attention takes in all tokens in the sequence and outputs a new representation of the current token in context. Each layer of the transformer adds more context to the token.

I haven't read this explanation in detail and although they have some nice animations, I wouldn't go to FT to explain machine learning concepts. Here are two well known explanations that might be better:

http://jalammar.github.io/illustrated-transformer/

http://nlp.seas.harvard.edu/annotated-transformer/.