<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jwan584</title><link>https://news.ycombinator.com/user?id=jwan584</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 28 Apr 2026 15:37:38 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jwan584" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Mistral Flash Answers Run on Cerebras]]></title><description><![CDATA[
<p>Article URL: <a href="https://cerebras.ai/blog/mistral-le-chat">https://cerebras.ai/blog/mistral-le-chat</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42967522">https://news.ycombinator.com/item?id=42967522</a></p>
<p>Points: 5</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 06 Feb 2025 23:24:11 +0000</pubDate><link>https://cerebras.ai/blog/mistral-le-chat</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=42967522</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42967522</guid></item><item><title><![CDATA[New comment by jwan584 in "The impact of competition and DeepSeek on Nvidia"]]></title><description><![CDATA[
<p>The point about using FP32 for training is wrong. Mixed precision (FP16 multiplies, FP32 accumulates) has been use for years – the original paper came out in 2017.</p>
]]></description><pubDate>Mon, 27 Jan 2025 08:10:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=42838614</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=42838614</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42838614</guid></item><item><title><![CDATA[New comment by jwan584 in "100x defect tolerance: How we solved the yield problem"]]></title><description><![CDATA[
<p>A good talk on how Cerebras does power & cooling (8min)
<a href="https://www.youtube.com/watch?v=wSptSOcO6Vw&ab_channel=AppliedMachineLearningDays" rel="nofollow">https://www.youtube.com/watch?v=wSptSOcO6Vw&ab_channel=Appli...</a></p>
]]></description><pubDate>Wed, 15 Jan 2025 23:30:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=42718733</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=42718733</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42718733</guid></item><item><title><![CDATA[100x defect tolerance: How we solved the yield problem]]></title><description><![CDATA[
<p>Article URL: <a href="https://cerebras.ai/blog/100x-defect-tolerance-how-cerebras-solved-the-yield-problem">https://cerebras.ai/blog/100x-defect-tolerance-how-cerebras-solved-the-yield-problem</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42717165">https://news.ycombinator.com/item?id=42717165</a></p>
<p>Points: 331</p>
<p># Comments: 179</p>
]]></description><pubDate>Wed, 15 Jan 2025 21:19:15 +0000</pubDate><link>https://cerebras.ai/blog/100x-defect-tolerance-how-cerebras-solved-the-yield-problem</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=42717165</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42717165</guid></item><item><title><![CDATA[New comment by jwan584 in "Cerebras Inference: AI at Instant Speed"]]></title><description><![CDATA[
<p>batch size by Q4 will be solid double digits (cerebras employee)</p>
]]></description><pubDate>Tue, 27 Aug 2024 20:19:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=41372504</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=41372504</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41372504</guid></item><item><title><![CDATA[Cerebras CS-3: the fastest and most scalable AI accelerator]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cerebras.net/blog/cerebras-cs3">https://www.cerebras.net/blog/cerebras-cs3</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=39698217">https://news.ycombinator.com/item?id=39698217</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 13 Mar 2024 22:13:28 +0000</pubDate><link>https://www.cerebras.net/blog/cerebras-cs3</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=39698217</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39698217</guid></item><item><title><![CDATA[New comment by jwan584 in "GigaGPT: GPT-3 sized models in 565 lines of code"]]></title><description><![CDATA[
<p>when you go from 1B to 175B, the model no longer fits in memory. so in practice you have to re-factor the model using tensor/pipeline parallelism. that's why it goes from 600 to 20K LOC.</p>
]]></description><pubDate>Mon, 11 Dec 2023 19:52:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=38604621</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=38604621</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38604621</guid></item><item><title><![CDATA[New comment by jwan584 in "GigaGPT: GPT-3 sized models in 565 lines of code"]]></title><description><![CDATA[
<p>Everyone knows Cerebras by their wafer scale chips. The less understood part is the 12TB of external memory. That's the real reason why large models fit by default and you don't have to chop it up in software ala megatron/deepspeed.</p>
]]></description><pubDate>Mon, 11 Dec 2023 19:50:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=38604592</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=38604592</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38604592</guid></item><item><title><![CDATA[New comment by jwan584 in "BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model"]]></title><description><![CDATA[
<p>A helpful paper with the full recipe Cerebras uses to train LLMs and their process including:
- Extensively deduplicated dataset (SlimPajama)
- Hyperparameter search using muP
- Variable sequence length training + ALiBi
- Aggressive LR decay</p>
]]></description><pubDate>Fri, 22 Sep 2023 18:35:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=37615876</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=37615876</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37615876</guid></item><item><title><![CDATA[BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2309.11568">https://arxiv.org/abs/2309.11568</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=37615875">https://news.ycombinator.com/item?id=37615875</a></p>
<p>Points: 3</p>
<p># Comments: 2</p>
]]></description><pubDate>Fri, 22 Sep 2023 18:35:13 +0000</pubDate><link>https://arxiv.org/abs/2309.11568</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=37615875</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37615875</guid></item><item><title><![CDATA[BTLM-3B-8K: 7B Performance in a 3B Parameter Model]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/">https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=36855750">https://news.ycombinator.com/item?id=36855750</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 24 Jul 2023 23:40:13 +0000</pubDate><link>https://www.cerebras.net/machine-learning/btlm-3b-8k-7b-performance-in-a-3-billion-parameter-model/</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=36855750</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36855750</guid></item><item><title><![CDATA[New comment by jwan584 in "Opentensor and Cerebras announce BTLM-3B-8K, a leading 3B param. language model"]]></title><description><![CDATA[
<p>Meta announced a partnership with Qualcomm to bring LLMs to mobile. But 3B is a lot more compact than LLaMA's 7B.</p>
]]></description><pubDate>Mon, 24 Jul 2023 19:59:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=36853427</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=36853427</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36853427</guid></item><item><title><![CDATA[NFTs Are Signatures That Come with Artworks]]></title><description><![CDATA[
<p>Article URL: <a href="https://draecomino.substack.com/p/nfts-are-signatures-that-come-with">https://draecomino.substack.com/p/nfts-are-signatures-that-come-with</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=26469207">https://news.ycombinator.com/item?id=26469207</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 15 Mar 2021 21:19:32 +0000</pubDate><link>https://draecomino.substack.com/p/nfts-are-signatures-that-come-with</link><dc:creator>jwan584</dc:creator><comments>https://news.ycombinator.com/item?id=26469207</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=26469207</guid></item></channel></rss>