<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: BarakWidawsky</title><link>https://news.ycombinator.com/user?id=BarakWidawsky</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 11 Jun 2026 09:20:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=BarakWidawsky" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by BarakWidawsky in "DiffusionGemma: 4x Faster Text Generation"]]></title><description><![CDATA[
<p>You’re mostly right but conflating attention with autoregressive/causal which is the real issue that prevents you from using more compute<p>You can use diffusion with attention, and this model does in fact use attention</p>
]]></description><pubDate>Wed, 10 Jun 2026 21:40:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48483116</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=48483116</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48483116</guid></item><item><title><![CDATA[Arcee AI Trinity Mini and Nano – US based open weight models]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.arcee.ai/blog/the-trinity-manifesto">https://www.arcee.ai/blog/the-trinity-manifesto</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46113841">https://news.ycombinator.com/item?id=46113841</a></p>
<p>Points: 4</p>
<p># Comments: 3</p>
]]></description><pubDate>Mon, 01 Dec 2025 21:56:05 +0000</pubDate><link>https://www.arcee.ai/blog/the-trinity-manifesto</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=46113841</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46113841</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "GPT-5.1: A smarter, more conversational ChatGPT"]]></title><description><![CDATA[
<p>I think it's extremely important to distinguish being friendly (perhaps overly so), and agreeing with the user when they're wrong<p>The first case is just preference, the second case is materially damaging<p>From my experience, ChatGPT <i>does</i> push back more than it used to</p>
]]></description><pubDate>Wed, 12 Nov 2025 19:43:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=45905291</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=45905291</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45905291</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "Pico-Banana-400k"]]></title><description><![CDATA[
<p>Looks like the dataset is distilled from Gemini nano-banana<p>Definitely very useful, but I’m so curious how the original datasets from these image editing models were created. I’m guessing a lot of it is synthetic data to construct scenes programmatically with layers</p>
]]></description><pubDate>Sun, 26 Oct 2025 05:26:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45709372</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=45709372</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45709372</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "M5 MacBook Pro"]]></title><description><![CDATA[
<p>I’m guessing this is so they optimize processor yields as manufacturing improves<p>Smaller chips means more of a wafer is usable when a defect exists</p>
]]></description><pubDate>Wed, 15 Oct 2025 15:56:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=45594531</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=45594531</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45594531</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "Diffusion language models are super data learners"]]></title><description><![CDATA[
<p>I wonder how much of this is due to Diffusion models having less capacity for memorization than auto regressive models<p>The auto regressive models consistently show better loss for the same number of training tokens<p>I find a lot of the conclusions compelling but I would’ve loved to see more epochs of training on the 1B model with a 10B dataset, as that model <i>was</i> showing epoch over epoch improvements</p>
]]></description><pubDate>Sun, 10 Aug 2025 18:44:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=44857283</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=44857283</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44857283</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "Smollm3: Smol, multilingual, long-context reasoner LLM"]]></title><description><![CDATA[
<p>It’s interesting that it looks like they didn’t apply their own RL to the model, and instead fine tuned on reasoning traces from large datasets and generating reasoning traces from larger models</p>
]]></description><pubDate>Tue, 08 Jul 2025 18:24:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=44502634</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=44502634</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44502634</guid></item><item><title><![CDATA[Red Star OS (North Korean OS)]]></title><description><![CDATA[
<p>Article URL: <a href="https://en.wikipedia.org/wiki/Red_Star_OS">https://en.wikipedia.org/wiki/Red_Star_OS</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44379821">https://news.ycombinator.com/item?id=44379821</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 25 Jun 2025 17:31:04 +0000</pubDate><link>https://en.wikipedia.org/wiki/Red_Star_OS</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=44379821</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44379821</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "TPDE: A Fast Adaptable Compiler Back-End Framework"]]></title><description><![CDATA[
<p>If this is a faster backend for LLVM, does it potentially obviate the niche Cranelift is optimizing for?</p>
]]></description><pubDate>Mon, 02 Jun 2025 03:53:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=44155718</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=44155718</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44155718</guid></item><item><title><![CDATA[New comment by BarakWidawsky in "A comparison to Waymo’s auto liability insurance claims at 25M miles"]]></title><description><![CDATA[
<p>I have taken a Waymo in the rain before, if they have stopped supporting that as part of the service that’s new, buts it’s definitely within the systems capabilities. It worked great</p>
]]></description><pubDate>Sat, 21 Dec 2024 02:02:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=42476927</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=42476927</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42476927</guid></item><item><title><![CDATA[Why we write numbers in big endian]]></title><description><![CDATA[
<p>Article URL: <a href="https://cesanta.com/blog/why-we-write-numbers-in-big-endian/">https://cesanta.com/blog/why-we-write-numbers-in-big-endian/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=39111804">https://news.ycombinator.com/item?id=39111804</a></p>
<p>Points: 2</p>
<p># Comments: 2</p>
]]></description><pubDate>Wed, 24 Jan 2024 00:07:21 +0000</pubDate><link>https://cesanta.com/blog/why-we-write-numbers-in-big-endian/</link><dc:creator>BarakWidawsky</dc:creator><comments>https://news.ycombinator.com/item?id=39111804</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39111804</guid></item></channel></rss>