Hacker News: BarakWidawsky

New comment by BarakWidawsky in "DiffusionGemma: 4x Faster Text Generation"

BarakWidawsky — Wed, 10 Jun 2026 21:40:50 +0000

You’re mostly right but conflating attention with autoregressive/causal which is the real issue that prevents you from using more compute

You can use diffusion with attention, and this model does in fact use attention

Arcee AI Trinity Mini and Nano – US based open weight models

BarakWidawsky — Mon, 01 Dec 2025 21:56:05 +0000

Article URL: https://www.arcee.ai/blog/the-trinity-manifesto

Comments URL: https://news.ycombinator.com/item?id=46113841

Points: 4

# Comments: 3

New comment by BarakWidawsky in "GPT-5.1: A smarter, more conversational ChatGPT"

BarakWidawsky — Wed, 12 Nov 2025 19:43:33 +0000

I think it's extremely important to distinguish being friendly (perhaps overly so), and agreeing with the user when they're wrong

The first case is just preference, the second case is materially damaging

From my experience, ChatGPT does push back more than it used to

New comment by BarakWidawsky in "Pico-Banana-400k"

BarakWidawsky — Sun, 26 Oct 2025 05:26:25 +0000

Looks like the dataset is distilled from Gemini nano-banana

Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers

New comment by BarakWidawsky in "M5 MacBook Pro"

BarakWidawsky — Wed, 15 Oct 2025 15:56:30 +0000

I’m guessing this is so they optimize processor yields as manufacturing improves

Smaller chips means more of a wafer is usable when a defect exists

New comment by BarakWidawsky in "Diffusion language models are super data learners"

BarakWidawsky — Sun, 10 Aug 2025 18:44:56 +0000

I wonder how much of this is due to Diffusion models having less capacity for memorization than auto regressive models

The auto regressive models consistently show better loss for the same number of training tokens

I find a lot of the conclusions compelling but I would’ve loved to see more epochs of training on the 1B model with a 10B dataset, as that model was showing epoch over epoch improvements

New comment by BarakWidawsky in "Smollm3: Smol, multilingual, long-context reasoner LLM"

BarakWidawsky — Tue, 08 Jul 2025 18:24:57 +0000

It’s interesting that it looks like they didn’t apply their own RL to the model, and instead fine tuned on reasoning traces from large datasets and generating reasoning traces from larger models

Red Star OS (North Korean OS)

BarakWidawsky — Wed, 25 Jun 2025 17:31:04 +0000

Article URL: https://en.wikipedia.org/wiki/Red_Star_OS

Comments URL: https://news.ycombinator.com/item?id=44379821

Points: 1

# Comments: 0

New comment by BarakWidawsky in "TPDE: A Fast Adaptable Compiler Back-End Framework"

BarakWidawsky — Mon, 02 Jun 2025 03:53:57 +0000

If this is a faster backend for LLVM, does it potentially obviate the niche Cranelift is optimizing for?

New comment by BarakWidawsky in "A comparison to Waymo’s auto liability insurance claims at 25M miles"

BarakWidawsky — Sat, 21 Dec 2024 02:02:16 +0000

I have taken a Waymo in the rain before, if they have stopped supporting that as part of the service that’s new, buts it’s definitely within the systems capabilities. It worked great

Why we write numbers in big endian

BarakWidawsky — Wed, 24 Jan 2024 00:07:21 +0000

Article URL: https://cesanta.com/blog/why-we-write-numbers-in-big-endian/

Comments URL: https://news.ycombinator.com/item?id=39111804

Points: 2

# Comments: 2