Hacker News: joelburget

On-Policy Distillation

joelburget — Thu, 30 Oct 2025 16:26:30 +0000

Article URL: https://thinkingmachines.ai/blog/on-policy-distillation/

Comments URL: https://news.ycombinator.com/item?id=45761818

Points: 5

# Comments: 0

US AI Action Plan

joelburget — Wed, 23 Jul 2025 15:28:58 +0000

PDF: https://www.whitehouse.gov/wp-content/uploads/2025/07/Americ...

Comments URL: https://news.ycombinator.com/item?id=44660323

Points: 426

# Comments: 618

Showh HN: Microjax – JAX in two classes and six functions

joelburget — Mon, 07 Jul 2025 13:41:55 +0000

Article URL: https://github.com/joelburget/microjax

Comments URL: https://news.ycombinator.com/item?id=44490299

Points: 46

# Comments: 1

ASI existential risk: reconsidering alignment as a goal

joelburget — Tue, 15 Apr 2025 21:41:19 +0000

Article URL: https://michaelnotebook.com/xriskbrief/index.html

Comments URL: https://news.ycombinator.com/item?id=43698739

Points: 3

# Comments: 0

New comment by joelburget in "“A calculator app? Anyone could make that”"

joelburget — Sun, 16 Feb 2025 17:46:47 +0000

I wrote an OCaml implementation of this paper a few years ago, which I've now extracted into its own [repo](https://github.com/joelburget/constructive-reals/blob/main/C...)

The link in the paper to their Java implementation is now broken: does anyone have a current link?

New comment by joelburget in "Pre-Trained Large Language Models Use Fourier Features for Addition (2024)"

joelburget — Thu, 06 Feb 2025 21:41:56 +0000

And more recently, [Language Models Use Trigonometry to Do Addition](https://arxiv.org/abs/2502.00873)

Writing Einsum in Depth (In OCaml)

joelburget — Wed, 22 Jan 2025 14:41:27 +0000

Article URL: https://www.joelburget.com/writing-einsum-in-depth

Comments URL: https://news.ycombinator.com/item?id=42793323

Points: 2

# Comments: 0

New comment by joelburget in "Einsum in Depth"

joelburget — Mon, 06 Jan 2025 19:39:09 +0000

This is a good idea, though one problem is that Einsum notation (as realized in Numpy and Pytorch) doesn't support the notion of co-contravariance, and the site is based on their Einsum notation. I could potentially add the variances for the examples, though that would move away from how the site currently works (where the information about the reduction comes only from the einsum input).

Einsum in Depth

joelburget — Fri, 03 Jan 2025 16:34:33 +0000

Article URL: https://einsum.joelburget.com/

Comments URL: https://news.ycombinator.com/item?id=42587056

Points: 87

# Comments: 31

New comment by joelburget in "Underrated reasons to be thankful IV"

joelburget — Thu, 28 Nov 2024 17:05:24 +0000

A couple of these I'd like references on if anyone happens to have them.

1. "current science suggests that the actual health impact from consuming most types of plastic might well be essentially zero"

2. "the (weak) evidence we have now suggests running strengthens your knees"

Underrated reasons to be thankful IV

joelburget — Thu, 28 Nov 2024 17:04:13 +0000

Article URL: https://dynomight.net/thanks-4/

Comments URL: https://news.ycombinator.com/item?id=42266848

Points: 2

# Comments: 1

DeepSeek-R1-Lite-Preview is now live

joelburget — Thu, 21 Nov 2024 17:25:11 +0000

Article URL: https://api-docs.deepseek.com/news/news1120

Comments URL: https://news.ycombinator.com/item?id=42206521

Points: 1

# Comments: 0

Quantum error correction below the surface code threshold

joelburget — Tue, 24 Sep 2024 16:57:01 +0000

Article URL: https://arxiv.org/abs/2408.13687

Comments URL: https://news.ycombinator.com/item?id=41638516

Points: 1

# Comments: 0

New comment by joelburget in "Notes on OpenAI's new o1 chain-of-thought models"

joelburget — Fri, 13 Sep 2024 14:45:02 +0000

o1 is an application of the Bitter Less. To quote Sutton: "The two methods that seem to scale arbitrarily in this way are search and learning." (emphasis mine -- in the original Sutton also emphasized learning).

OpenAI and others have previously pushed the learning side, while neglecting search. Now that gains from adding compute at training time have started to level off, they're adding compute at inference time.

New comment by joelburget in "Vision language models are blind"

joelburget — Thu, 11 Jul 2024 01:04:18 +0000

Vision Transformers do a shocking amount of compression in the tokenizer. In the [Chameleon paper](https://arxiv.org/pdf/2405.09818) they say the tokenizer "encodes a 512 × 512 image into 1024 discrete tokens from a codebook of size 8192". That's 256 pixels per token (512 * 512 / 1024). If we assume that a pixel is 24 bits (3x 8 bit channels), this implies that they've compressed 256 * 24 = 6144 bits into 13 = (log2(8192)). [An Image is Worth 32 Tokens for Reconstruction and Generation](https://yucornetto.github.io/projects/titok.html) pushes this even further. If these models work similarly, it's no wonder they struggle with some vision tasks.

New comment by joelburget in "How Does GPT-4o Encode Images?"

joelburget — Fri, 07 Jun 2024 14:04:40 +0000

Vision transformers should be our default guess as to how GPT-4o works, yet this article never mentions them.

Disrupting the Deepfake Supply Chain

joelburget — Wed, 21 Feb 2024 22:13:39 +0000

Article URL: https://openletter.net/l/disrupting-deepfakes

Comments URL: https://news.ycombinator.com/item?id=39460381

Points: 2

# Comments: 0

New comment by joelburget in "Tiktoken: OpenAI’s Tokenizer"

joelburget — Fri, 16 Dec 2022 17:19:30 +0000

It works on all human languages, just inefficiently. I ran it over a sample I found on wikipedia:

    sample = "ฟองมันฟันหนู, ฟันหนูฟองมัน, ฝนทองฟองมัน"
    len(sample), len(enc.encode(sample))

This returns `39, 40` so it's just encoding one character at a time. It's probably like this for almost all non-English text.

New comment by joelburget in "Tiktoken: OpenAI’s Tokenizer"

joelburget — Fri, 16 Dec 2022 17:16:07 +0000

A few interesting findings:

* the cl100k_base tokenizer has ~100k tokens -- previous tokenizers had ~50k. (enc.n_vocab gives 100277 but some numbers in that range don't work, starting at 100256)

* it has exactly 1110 tokens which are just digits. 10 1 digit tokens, 100 2 digit tokens and 1000 3 digit tokens! (none have preceding spaces). this is a huge improvement from GPT2's tokenizer, which was a huge mess.

The biggest news to me is the improved handling of numbers. This could explain some improved performance on arithmetic. One disappointment is that it tokenizes from the front, e.g. "1000000" -> 100|000|0. This is one of those "so close!" moments -- I would work for free to fix this.

Kelvin Versioning

joelburget — Thu, 07 Apr 2022 18:54:51 +0000

Article URL: https://jtobin.io/kelvin-versioning

Comments URL: https://news.ycombinator.com/item?id=30948831

Points: 2

# Comments: 0