<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: red2awn</title><link>https://news.ycombinator.com/user?id=red2awn</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 28 Apr 2026 23:27:02 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=red2awn" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by red2awn in "Something is afoot in the land of Qwen"]]></title><description><![CDATA[
<p>LLM-as-a-judge is quite effective method to RL a model, similar to RLHF but more objective and scalable. But yes, anthropic is making it more serious than it is. Plus DeepSeek only did it for 125k requests, significantly less than the other labs, but Anthropic still listed them first to create FUD.</p>
]]></description><pubDate>Wed, 04 Mar 2026 22:40:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47254991</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=47254991</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47254991</guid></item><item><title><![CDATA[New comment by red2awn in "Something is afoot in the land of Qwen"]]></title><description><![CDATA[
<p>The "distillation attacks" are mostly using Claude as LLM-as-a-judge. They are not training on the reasoning chains in a SFT fashion.</p>
]]></description><pubDate>Wed, 04 Mar 2026 19:37:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47252682</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=47252682</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47252682</guid></item><item><title><![CDATA[New comment by red2awn in "Show HN: I built a sub-500ms latency voice agent from scratch"]]></title><description><![CDATA[
<p>What's the SOTA open source or weight available turn taking model these days? I tried pipecat/smart-turn-v3 and the results are not good. It only works well when you say a short sentence in a clear voice. Anything else will cause it to wait indefinitely.
Closed source API models are obviously a lot better but adds network latency and the cost adds up.</p>
]]></description><pubDate>Tue, 03 Mar 2026 12:46:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47231515</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=47231515</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47231515</guid></item><item><title><![CDATA[New comment by red2awn in "Claude Sonnet 4.6"]]></title><description><![CDATA[
<p>Distilling from a teacher (Opus 4.5) and scaling RL more.</p>
]]></description><pubDate>Tue, 17 Feb 2026 22:26:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47054323</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=47054323</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47054323</guid></item><item><title><![CDATA[New comment by red2awn in "Claude Sonnet 4.6"]]></title><description><![CDATA[
<p>> i need to wash my helicopter at the helicopter wash. it is 50m away, should i walk or fly there with my helicopter.<p>Sonnet 4.6: Walk! Flying a helicopter 50 metres would be more trouble than it's worth — by the time you've done your pre-flight checks, spun up the rotors, lifted off, and then safely landed again, you'd have walked there and back twice. Just stroll over.</p>
]]></description><pubDate>Tue, 17 Feb 2026 22:24:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47054292</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=47054292</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47054292</guid></item><item><title><![CDATA[What nobody teaches you after you stop being broke]]></title><description><![CDATA[
<p>Article URL: <a href="https://twitter.com/_thomasip/status/2013287739634888882">https://twitter.com/_thomasip/status/2013287739634888882</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46681691">https://news.ycombinator.com/item?id=46681691</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 19 Jan 2026 17:18:55 +0000</pubDate><link>https://twitter.com/_thomasip/status/2013287739634888882</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46681691</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46681691</guid></item><item><title><![CDATA[New comment by red2awn in "Why DuckDB is my first choice for data processing"]]></title><description><![CDATA[
<p>It was mentioned that the performance of DuckDB is similar to that of Polars (among others). In that case why would one choose DuckDB over Polars? The only differentiator seems to be that you do the querying with standard SQL instead of the library specific APIs.</p>
]]></description><pubDate>Fri, 16 Jan 2026 21:26:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46652426</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46652426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46652426</guid></item><item><title><![CDATA[New comment by red2awn in "Anthropic invests $1.5M in the Python Software Foundation"]]></title><description><![CDATA[
<p>They are heavily focused on code. Claude Code likely generates 100 of millions lines of Python a day, make the language a little bit better with $1.5M is extremely high leverage.</p>
]]></description><pubDate>Tue, 13 Jan 2026 18:13:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=46605179</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46605179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46605179</guid></item><item><title><![CDATA[New comment by red2awn in "The Illustrated Transformer"]]></title><description><![CDATA[
<p>Implement transformers yourself (ie in Numpy). You'll never truly understand it by just watching videos.</p>
]]></description><pubDate>Mon, 22 Dec 2025 19:34:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46357915</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46357915</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46357915</guid></item><item><title><![CDATA[New comment by red2awn in "Structured outputs create false confidence"]]></title><description><![CDATA[
<p>It does NOT. Response healing only fixes JSON syntax errors, not schema differences.<p>See <a href="https://news.ycombinator.com/item?id=46332119">https://news.ycombinator.com/item?id=46332119</a></p>
]]></description><pubDate>Sun, 21 Dec 2025 22:38:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46349312</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46349312</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46349312</guid></item><item><title><![CDATA[New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"]]></title><description><![CDATA[
<p>Tokens are sampled from logits using the constraints after a normal forward pass. The forward pass is the expensive part of LLM inference which isn't affected by structured output.</p>
]]></description><pubDate>Sat, 20 Dec 2025 17:10:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46337667</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46337667</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46337667</guid></item><item><title><![CDATA[New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"]]></title><description><![CDATA[
<p>Cool stuff! I don't get how all the open source inference framework have this down but the big labs doesn't...<p>Gemini [0] is falsely advertising this:<p>> This capability guarantees predictable and parsable results, ensures format and type-safety, enables the programmatic detection of refusals, and simplifies prompting.<p>[0]: <a href="https://ai.google.dev/gemini-api/docs/structured-output?example=recipe#:~:text=This%20capability%20guarantees%20predictable%20and%20parsable%20results%2C%20ensures%20format%20and%20type%2Dsafety%2C%20enables%20the%20programmatic%20detection%20of%20refusals%2C%20and%20simplifies%20prompting." rel="nofollow">https://ai.google.dev/gemini-api/docs/structured-output?exam...</a></p>
]]></description><pubDate>Sat, 20 Dec 2025 17:07:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46337630</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46337630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46337630</guid></item><item><title><![CDATA[New comment by red2awn in "Response Healing: Reduce JSON defects by 80%+"]]></title><description><![CDATA[
<p>Very confused. When you enable structured output the response should adhere to the JSON schema EXACTLY, not best effort, by constraining the output via guided decoding. This is even documented in OpenRouter's structured output doc<p>> The model will respond with a JSON object that strictly follows your schema<p>Gemini is listed as a model supporting structured output, and yet its fail rate is 0.39% (Gemini 2.0 Flash)!! I get that structured output has a high performance cost but advertising it as supported when in reality it's not is a massive red flag.<p>Worst yet response healing only fixes JSON syntax error, not schema adherence. This is only mentioned at the end of the article which people are clearly not going to read.<p>WTF</p>
]]></description><pubDate>Fri, 19 Dec 2025 23:19:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46332119</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46332119</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46332119</guid></item><item><title><![CDATA[New comment by red2awn in "Nvidia Nemotron 3 Family of Models"]]></title><description><![CDATA[
<p>Very interesting release:<p>* Hybrid MoE: 2-3x faster than pure MoE transformers<p>* 1M context length<p>* Trained on NVFP4<p>* Open Source! Pretraining, mid-training, SFT and RL dataset released (SFT HF link is 404...)<p>* Open model training recipe (coming soon)<p>Really appreciate Nvidia being the most open lab but they really should make sure all the links/data are available on day 0.<p>Also interesting that the model is trained in NVFP4 but the inference weights are FP8.</p>
]]></description><pubDate>Mon, 15 Dec 2025 22:23:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46281632</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46281632</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46281632</guid></item><item><title><![CDATA[New comment by red2awn in "Ask HN: How can I get better at using AI for programming?"]]></title><description><![CDATA[
<p>Opus 4.5 is the highest quality code I've seen out of LLMs, still some way to go to match programmers who care, but much better than most people. I find it enough to let it write the code and then manually polish it afterwards.</p>
]]></description><pubDate>Sat, 13 Dec 2025 19:34:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46257207</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46257207</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46257207</guid></item><item><title><![CDATA[New comment by red2awn in "Ask HN: How can I get better at using AI for programming?"]]></title><description><![CDATA[
<p>Claude Code does not support AGENTS.md, you can symlink it to CLAUDE.md to workaround it. Anthropic: pls support!</p>
]]></description><pubDate>Sat, 13 Dec 2025 19:31:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46257181</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46257181</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46257181</guid></item><item><title><![CDATA[New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"]]></title><description><![CDATA[
<p>Correct, it's breaks the single prompt, single completion assumption baked into the frameworks. Conceptually it's still prompt/completion but for low latency response you have to do streaming KV cache prefill with a websocket server.</p>
]]></description><pubDate>Thu, 11 Dec 2025 09:21:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=46229292</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46229292</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46229292</guid></item><item><title><![CDATA[New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"]]></title><description><![CDATA[
<p>Nice work. Are you working on streaming input/output?</p>
]]></description><pubDate>Wed, 10 Dec 2025 21:08:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=46223885</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46223885</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46223885</guid></item><item><title><![CDATA[New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"]]></title><description><![CDATA[
<p>Why would you use an Omni model for text only workload... There is Qwen3-30B-A3B.</p>
]]></description><pubDate>Wed, 10 Dec 2025 20:45:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=46223561</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46223561</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46223561</guid></item><item><title><![CDATA[New comment by red2awn in "Qwen3-Omni-Flash-2025-12-01：a next-generation native multimodal large model"]]></title><description><![CDATA[
<p>They had a Flash variant released alongside the original open weight release. It is also mentioned in Section 5 of the paper: <a href="https://arxiv.org/pdf/2509.17765" rel="nofollow">https://arxiv.org/pdf/2509.17765</a><p>For the evals it's probably just trained on a lot of the benchmark adjacent datasets compared to the 235B model. Similar thing happened on other model today: <a href="https://x.com/NousResearch/status/1998536543565127968" rel="nofollow">https://x.com/NousResearch/status/1998536543565127968</a> (a 30B model trained specifically to do well in maths get near SOTA scores)</p>
]]></description><pubDate>Wed, 10 Dec 2025 20:19:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46223169</link><dc:creator>red2awn</dc:creator><comments>https://news.ycombinator.com/item?id=46223169</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46223169</guid></item></channel></rss>