<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: RandyOrion</title><link>https://news.ycombinator.com/user?id=RandyOrion</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 07:39:49 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=RandyOrion" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by RandyOrion in "Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency"]]></title><description><![CDATA[
<p>More rants about local inference, consider yourself warned.<p>Together with bf16 related deliberate hardward degrades on consumer-level nvidia gpus, i.e., gtx 10, rtx 20, 30, 40, 50 series, things gets sour really quickly.</p>
]]></description><pubDate>Sat, 06 Jun 2026 05:42:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48421767</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=48421767</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48421767</guid></item><item><title><![CDATA[New comment by RandyOrion in "Gemma 4 QAT models: Optimizing compression for mobile and laptop efficiency"]]></title><description><![CDATA[
<p>From the perspective of a  local llm user, I think the qat doesn't solve the major problem of the gemma models.<p>Gemma family (gen 1 to gen 4) is consistent with extreme range of activations, i.e., 600000, essentially forcing people to use bf16 kv cache and accept a short context window, e.g., 31b, iq4_xs quantization, 100k context window on 32gb memory. Or, people use q8 kv cache, 200k context window, and accept a large performance penalty.<p>In contrast, for qwen 3.5 family, the largest activation is below 2000, making q8 or even lower-precision kv cache essentially free estates. Together with linear attention, which doesn't require kv cache, full 262k context window  can be easily reached.<p>Qat training with w4a16 target, while improving performance on inference with low-precision weighs, doesn't solve kv cache problem at all.<p>In the end, a qat is a qat, and there are unseen efforts behind qat checkpoints. Thank you gemma team for releasing qat checkpoints.</p>
]]></description><pubDate>Sat, 06 Jun 2026 05:41:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48421755</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=48421755</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48421755</guid></item><item><title><![CDATA[New comment by RandyOrion in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>A small dense multimodal model with audio support, interesting.<p>Wait, *Excluding Chinese language.<p>This is ... curious.<p>P.S. Where is gemma 4 124b?</p>
]]></description><pubDate>Wed, 03 Jun 2026 18:05:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48387488</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=48387488</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48387488</guid></item><item><title><![CDATA[New comment by RandyOrion in "Show HN: Hallucinopedia"]]></title><description><![CDATA[
<p>This website brings me some good chuckles. Now I really know how powerful an on-demand bullsh*t generator is.</p>
]]></description><pubDate>Fri, 08 May 2026 09:17:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48060642</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=48060642</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48060642</guid></item><item><title><![CDATA[New comment by RandyOrion in "Google Chrome silently installs a 4 GB AI model on your device without consent"]]></title><description><![CDATA[
<p>Like the recent copilot silent signing incident, the without consent part is blatant foul move.<p>If you don't like be treated like anything but human, you should seriously consider replacing chrome with ungoogled chromium or other browsers.</p>
]]></description><pubDate>Tue, 05 May 2026 13:41:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48022450</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=48022450</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48022450</guid></item><item><title><![CDATA[New comment by RandyOrion in "VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage"]]></title><description><![CDATA[
<p>Yeah, this is part of the reason why vscodium exists.</p>
]]></description><pubDate>Sun, 03 May 2026 03:11:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47992963</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47992963</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47992963</guid></item><item><title><![CDATA[New comment by RandyOrion in "VS Code inserting 'Co-Authored-by Copilot' into commits regardless of usage"]]></title><description><![CDATA[
<p>Wow. Just like using ungoogled-chromium instead of chrome, lineage os instead of oem android, using vscodium instead of vscode is again justified. These decisions really are the ones that I'll never regret.<p>In addition, using the word microslop instead of microsoft is again justified, too.</p>
]]></description><pubDate>Sun, 03 May 2026 01:54:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47992554</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47992554</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47992554</guid></item><item><title><![CDATA[New comment by RandyOrion in "Shai-Hulud Themed Malware Found in the PyTorch Lightning AI Training Library"]]></title><description><![CDATA[
<p>One thing that makes me wonder is that there are 4 security issues raised and all of them were automatically commented and closed by some bot called `pl-ghost` [1][2][3][4]. In the end, only this one [4] properly handled, and all bot comments  are deleted. You can see the bot comments in another report [5], which is more informative than the OP one.<p>[1] <a href="https://github.com/Lightning-AI/pytorch-lightning/issues/21689" rel="nofollow">https://github.com/Lightning-AI/pytorch-lightning/issues/216...</a><p>[2] <a href="https://github.com/Lightning-AI/pytorch-lightning/issues/21690" rel="nofollow">https://github.com/Lightning-AI/pytorch-lightning/issues/216...</a><p>[3] <a href="https://github.com/Lightning-AI/pytorch-lightning/issues/21692" rel="nofollow">https://github.com/Lightning-AI/pytorch-lightning/issues/216...</a><p>[4] <a href="https://github.com/Lightning-AI/pytorch-lightning/issues/21691" rel="nofollow">https://github.com/Lightning-AI/pytorch-lightning/issues/216...</a><p>[5] <a href="https://socket.dev/blog/lightning-pypi-package-compromised" rel="nofollow">https://socket.dev/blog/lightning-pypi-package-compromised</a></p>
]]></description><pubDate>Fri, 01 May 2026 03:18:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47970986</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47970986</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47970986</guid></item><item><title><![CDATA[New comment by RandyOrion in "Granite 4.1: IBM's 8B Model Matching 32B MoE"]]></title><description><![CDATA[
<p>Although the performance claim of 8b dense matching 32b moe is somewhat questionable, thank you granite team for releasing small dense LLMs.</p>
]]></description><pubDate>Thu, 30 Apr 2026 17:06:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47965405</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47965405</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47965405</guid></item><item><title><![CDATA[New comment by RandyOrion in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>Thank you Qwen team. Small DENSE LLMs shapes the future of local LLM users.<p>When Qwen 3.5 27b released, I didn't really understand why linear attention is used instead of full attention because of the performance degradation and problems introduced with extra (linear) operators. After doing some tests, I found that with llama.cpp and IQ4_XS quant, the model and BF16 cache of the whole 262k context just fit on 32GB vram, which is impossible with full attention. In contrast, with gemma 4 31b IQ4_XS quant I have to use Q8_0 cache to fit 262k context on the vram, which is a little annoying (no offenses, thank you gemma team, too).<p>From benchmarks, 3.5->3.6 upgrade is about agent things. I hope future upgrades fix some problems I found, e.g., output repetitiveness in long conversations and knowledge broadness.</p>
]]></description><pubDate>Thu, 23 Apr 2026 03:13:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47871917</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47871917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47871917</guid></item><item><title><![CDATA[New comment by RandyOrion in "MegaTrain: Full Precision Training of 100B+ Parameter LLMs on a Single GPU"]]></title><description><![CDATA[
<p>Check out Fig. 6 in this paper, it shows the comparison between the proposed method and pytorch native FSDP offload method.</p>
]]></description><pubDate>Thu, 09 Apr 2026 03:01:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47698833</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47698833</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47698833</guid></item><item><title><![CDATA[New comment by RandyOrion in "Muse Spark: Scaling towards personal superintelligence"]]></title><description><![CDATA[
<p>No open weights.<p>Besides, I'm old enough to recall that META has trained a version of LLAMA 4 specifically for LM arena elo benchmaxxing and PR things, and proceeded to release a different version of LLAMA 4.</p>
]]></description><pubDate>Thu, 09 Apr 2026 02:10:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47698571</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47698571</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47698571</guid></item><item><title><![CDATA[New comment by RandyOrion in "Google releases Gemma 4 open models"]]></title><description><![CDATA[
<p>Thank you Gemma team for releasing small dense VLM(s).<p>The elo ranking [1] is too good to be true. I don't know why gemma-4-26b-a4b performs better than gemma-4-31b.<p>Also waiting for more bugfixes in llama.cpp, sglang and vllm to do proper evaluations.<p>[1] <a href="https://arena.ai/leaderboard/text/expert?license=open-source" rel="nofollow">https://arena.ai/leaderboard/text/expert?license=open-source</a></p>
]]></description><pubDate>Fri, 03 Apr 2026 07:53:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47624204</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47624204</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47624204</guid></item><item><title><![CDATA[New comment by RandyOrion in "Android Developer Verification"]]></title><description><![CDATA[
<p>Please no.<p>If you want to install APKs directly on Android phones selling in China, you'll face even more draconian restrictions imposed by both Chinese OEMs and Chinese government, e.g., cannot install telegram [1], cannot install VPNs [2], called by local police station after installing VPNs [3], and so on. And you do not have the freedom to even talk about these restrictions freely without getting sued or censored.<p>[1] <a href="https://xcancel.com/whyyoutouzhele/status/1689152388412616704" rel="nofollow">https://xcancel.com/whyyoutouzhele/status/168915238841261670...</a><p>[2] <a href="https://xcancel.com/whyyoutouzhele/status/1978430665562689716" rel="nofollow">https://xcancel.com/whyyoutouzhele/status/197843066556268971...</a><p>[3] <a href="https://xcancel.com/whyyoutouzhele/status/1702992057596276760" rel="nofollow">https://xcancel.com/whyyoutouzhele/status/170299205759627676...</a></p>
]]></description><pubDate>Tue, 31 Mar 2026 11:08:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47585613</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47585613</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47585613</guid></item><item><title><![CDATA[New comment by RandyOrion in "Android Developer Verification"]]></title><description><![CDATA[
<p>Yeah, let's hold Google accountable. Is there a way to practice anti-trust laws?</p>
]]></description><pubDate>Tue, 31 Mar 2026 10:47:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=47585404</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47585404</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47585404</guid></item><item><title><![CDATA[New comment by RandyOrion in "Android Developer Verification"]]></title><description><![CDATA[
<p>Thank you for standing against the Android Developer Verification enforced by Google. Now in addition to stopping using Youtube, replacing chrome with ungoogled chromium, I'm moving to de-googled AOSP builds, e.g., lineageOS, insted of stock OEM ROMs.</p>
]]></description><pubDate>Tue, 31 Mar 2026 10:35:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47585293</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47585293</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47585293</guid></item><item><title><![CDATA[New comment by RandyOrion in "Copilot edited an ad into my PR"]]></title><description><![CDATA[
<p>Wow, just wow.<p>1.5M records of PRs affected. Does Microsoft copilot ask users for the permission of adding ads inside their PRs before actually doing the thing? Do users show their consents on this matter?<p>Now EVERYONE can see ads disguised as PRs on GitHub. Does Microsoft asks everyone for the permission of showing ads before actually doing the thing? Do users show their consents on this matter?<p>Good taste Microslop.</p>
]]></description><pubDate>Tue, 31 Mar 2026 02:41:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47582147</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47582147</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47582147</guid></item><item><title><![CDATA[New comment by RandyOrion in "Flash-MoE: Running a 397B Parameter Model on a Laptop"]]></title><description><![CDATA[
<p>This project shows an interesting automated search for engineering problems that I like to see more.<p>The experience of utilizing tiered storage (gpu vram, ram, and ssd) is generally poor for a lot of LLM inference engines out there, e.g., llama.cpp, sglang, vllm, etc..<p>My own experience shows that both weight and KV cache offload to ram on sglang and vllm is unavailable or unusable. Copying extra parameters from documents and adding them to already working commands results in errors. Llama.cpp does support weight offload, but the experience is not pleasant, low pcie (gpu <-> ram) utilization, low gpu utilization, and really low tokens per second.</p>
]]></description><pubDate>Mon, 23 Mar 2026 02:26:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47484838</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47484838</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47484838</guid></item><item><title><![CDATA[New comment by RandyOrion in "Something is afoot in the land of Qwen"]]></title><description><![CDATA[
<p>First, thank you Junyang and Qwen team for your incredible work. You deserve better.<p>This is sad for local LLM community. First we lost wizardLM, Yi and others, then we lost Llama and others, now we lost Qwen...</p>
]]></description><pubDate>Thu, 05 Mar 2026 04:12:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47257471</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47257471</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47257471</guid></item><item><title><![CDATA[New comment by RandyOrion in "A CPU that runs entirely on GPU"]]></title><description><![CDATA[
<p>Well, I don't have enough knowledge on the boot process of RPi. However, I do expect that most modern hardware, e.g. x86, do not work like RPi, so your words do not hold in most realistic scenarios, at least for now. Besides, do current GPUs (not only GPUs on RPi) have the ability to self instruct in order to achieve what you said?</p>
]]></description><pubDate>Wed, 04 Mar 2026 19:19:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47252443</link><dc:creator>RandyOrion</dc:creator><comments>https://news.ycombinator.com/item?id=47252443</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47252443</guid></item></channel></rss>