Hacker News: red2awn

New comment by red2awn in "Something is afoot in the land of Qwen"

red2awn — Wed, 04 Mar 2026 22:40:16 +0000

LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.

New comment by red2awn in "Something is afoot in the land of Qwen"

red2awn — Wed, 04 Mar 2026 19:37:05 +0000

The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.

New comment by red2awn in "Show HN: I built a sub-500ms latency voice agent from scratch"

red2awn — Tue, 03 Mar 2026 12:46:08 +0000

What's the SOTA open source or weight available turn taking model these days? I tried pipecat/smart-turn-v3 and the results are not good. It only works well when you say a short sentence in a clear voice. Anything else will cause it to wait indefinitely. Closed source API models are obviously a lot better but adds network latency and the cost adds up.

New comment by red2awn in "Claude Sonnet 4.6"

red2awn — Tue, 17 Feb 2026 22:26:48 +0000

Distilling from a teacher (Opus 4.5) and scaling RL more.

New comment by red2awn in "Claude Sonnet 4.6"

red2awn — Tue, 17 Feb 2026 22:24:01 +0000

> i need to wash my helicopter at the helicopter wash. it is 50m away, should i walk or fly there with my helicopter.

Sonnet 4.6: Walk! Flying a helicopter 50 metres would be more trouble than it's worth — by the time you've done your pre-flight checks, spun up the rotors, lifted off, and then safely landed again, you'd have walked there and back twice. Just stroll over.

What nobody teaches you after you stop being broke

red2awn — Mon, 19 Jan 2026 17:18:55 +0000

Article URL: https://twitter.com/_thomasip/status/2013287739634888882

Comments URL: https://news.ycombinator.com/item?id=46681691

Points: 2

# Comments: 0

New comment by red2awn in "Why DuckDB is my first choice for data processing"

red2awn — Fri, 16 Jan 2026 21:26:15 +0000

It was mentioned that the performance of DuckDB is similar to that of Polars (among others). In that case why would one choose DuckDB over Polars? The only differentiator seems to be that you do the querying with standard SQL instead of the library specific APIs.

New comment by red2awn in "Anthropic invests $1.5M in the Python Software Foundation"

red2awn — Tue, 13 Jan 2026 18:13:42 +0000

They are heavily focused on code. Claude Code likely generates 100 of millions lines of Python a day, make the language a little bit better with $1.5M is extremely high leverage.

New comment by red2awn in "The Illustrated Transformer"

red2awn — Mon, 22 Dec 2025 19:34:51 +0000

Implement transformers yourself (ie in Numpy). You'll never truly understand it by just watching videos.

New comment by red2awn in "Structured outputs create false confidence"

red2awn — Sun, 21 Dec 2025 22:38:00 +0000

It does NOT. Response healing only fixes JSON syntax errors, not schema differences.

See https://news.ycombinator.com/item?id=46332119

New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"

red2awn — Sat, 20 Dec 2025 17:10:47 +0000

Tokens are sampled from logits using the constraints after a normal forward pass. The forward pass is the expensive part of LLM inference which isn't affected by structured output.

New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"

red2awn — Sat, 20 Dec 2025 17:07:33 +0000

Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...

Gemini [0] is falsely advertising this:

> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.

[0]: https://ai.google.dev/gemini-api/docs/structured-output?exam...

New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"

red2awn — Fri, 19 Dec 2025 23:19:14 +0000

Very confused. When you enable structured output the response should adhere to the JSON schema EXACTLY, not best effort, by constraining the output via guided decoding. This is even documented in OpenRouter's structured output doc

> The model will respond with a JSON object that strictly follows your schema

Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.

Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.

WTF

New comment by red2awn in "Nvidia Nemotron 3 Family of Models"

red2awn — Mon, 15 Dec 2025 22:23:06 +0000

Very interesting release:

* Hybrid MoE: 2-3x faster than pure MoE transformers

* 1M context length

* Trained on NVFP4

* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)

* Open model training recipe (coming soon)

Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.

Also interesting that the model is trained in NVFP4 but the inference weights are FP8.

New comment by red2awn in "Ask HN: How can I get better at using AI for programming?"

red2awn — Sat, 13 Dec 2025 19:34:07 +0000

Opus 4.5 is the highest quality code I've seen out of LLMs, still some way to go to match programmers who care, but much better than most people. I find it enough to let it write the code and then manually polish it afterwards.

New comment by red2awn in "Ask HN: How can I get better at using AI for programming?"

red2awn — Sat, 13 Dec 2025 19:31:02 +0000

Claude Code does not support AGENTS.md, you can symlink it to CLAUDE.md to workaround it. Anthropic: pls support!

New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"

red2awn — Thu, 11 Dec 2025 09:21:18 +0000

Correct, it's breaks the single prompt, single completion assumption baked into the frameworks. Conceptually it's still prompt/completion but for low latency response you have to do streaming KV cache prefill with a websocket server.

New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"

red2awn — Wed, 10 Dec 2025 21:08:52 +0000

Nice work. Are you working on streaming input/output?

New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"

red2awn — Wed, 10 Dec 2025 20:45:54 +0000

Why would you use an Omni model for text only workload... There is Qwen3-30B-A3B.

New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"

red2awn — Wed, 10 Dec 2025 20:19:56 +0000

They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: https://arxiv.org/pdf/2509.17765

For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: https://x.com/NousResearch/status/1998536543565127968 (a 30B model trained specifically to do well in maths get near SOTA scores)