<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: roosgit</title><link>https://news.ycombinator.com/user?id=roosgit</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 09:25:54 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=roosgit" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by roosgit in "DiffusionGemma: 4x Faster Text Generation"]]></title><description><![CDATA[
<p>Can LoRAs be used to increase the quality of these diffusion models? Nvidia mentions something about this <a href="https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-8B#inference-with-linear-self-speculation--lora-enhanced-drafter" rel="nofollow">https://huggingface.co/nvidia/Nemotron-Labs-Diffusion-8B#inf...</a></p>
]]></description><pubDate>Wed, 10 Jun 2026 17:39:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48479848</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=48479848</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48479848</guid></item><item><title><![CDATA[New comment by roosgit in "Real-time LLM Inference on Standard GPUs: 3k tokens/s per request"]]></title><description><![CDATA[
<p>Yeah, it should have been "Datacenter GPUs" or "Nvidia and AMD GPUs".</p>
]]></description><pubDate>Fri, 29 May 2026 11:09:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=48321630</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=48321630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48321630</guid></item><item><title><![CDATA[New comment by roosgit in "The local LLM ecosystem doesn’t need Ollama"]]></title><description><![CDATA[
<p>I just hit that error a few minutes ago. I build my llama.cpp from source because I use CUDA on Linux. So I made the mistake of trying to run Gemma4 on an older version I had and I got the same error. It’s possible brew installs an older version which doens’t support Gemma4 yet.</p>
]]></description><pubDate>Thu, 16 Apr 2026 08:40:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47790325</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=47790325</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47790325</guid></item><item><title><![CDATA[New comment by roosgit in "The $2k Laptop That Replaced My $200/Month AI Subscription"]]></title><description><![CDATA[
<p>Have you tried other local models?<p>The 14B Q4_K_M needs 9GB, but Q3_K_M is 7.3GB. But you also need some room for context. Still, maybe using `--override-tensor` in llama.cpp would get you a 50% improvement over "naively" offloading layers to the GPU. Or possibly GPT-OSS-20B. It's 12.1GB in MXFP4, but it’s a MOE model so only a part of it would need to be on the GPU. On my dedicated 12GB 3060 it runs at 85 t/s, with a smallish context. I've also read on Reddit some claims that Qwen3 4B 2507 might be better than 8B, because Qwen never released a "2507" update for 8B.</p>
]]></description><pubDate>Thu, 19 Feb 2026 15:43:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47075023</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=47075023</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47075023</guid></item><item><title><![CDATA[New comment by roosgit in "I now assume that all ads on Apple news are scams"]]></title><description><![CDATA[
<p>I wasn't sure where I'd seen that "retiring" spiel before, but then I remembered someone was (still is) selling a handmade jewelry website claiming $4.3M revenue and $1.3M profit.</p>
]]></description><pubDate>Fri, 06 Feb 2026 13:04:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46912338</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=46912338</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46912338</guid></item><item><title><![CDATA[New comment by roosgit in "How macOS has grown 2019-2025"]]></title><description><![CDATA[
<p>I use an even older Macbook and an even older macOS. Of course, the browsers no longer work with the latest JS, so occasionally when I need to use some webapp I boot up a Linux VM and do what I need to do. With limited RAM even that's a pain, but it works for now.</p>
]]></description><pubDate>Fri, 02 Jan 2026 09:24:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=46463028</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=46463028</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46463028</guid></item><item><title><![CDATA[New comment by roosgit in "Calendar"]]></title><description><![CDATA[
<p>While on the subject, you can make a calendar in as little as 3 lines of CSS: <a href="https://calendartricks.com/a-calendar-in-three-lines-of-css/" rel="nofollow">https://calendartricks.com/a-calendar-in-three-lines-of-css/</a></p>
]]></description><pubDate>Mon, 29 Dec 2025 08:47:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46418695</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=46418695</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46418695</guid></item><item><title><![CDATA[New comment by roosgit in "AWS multiple services outage in us-east-1"]]></title><description><![CDATA[
<p>Can confirm. I was trying to send the newsletter (with SES) and it didn't work. I was thinking my local boto3 was old, but I figured I should check HN just in case.</p>
]]></description><pubDate>Mon, 20 Oct 2025 07:37:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=45640907</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=45640907</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45640907</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: What kind of local on-device AI do you find useful?"]]></title><description><![CDATA[
<p>I have an RTX 3060 with 12GB VRAM. For simpler questions like "how do I change the modified date of a file in Linux", I use Qwen 14B Q4_K_M. It fits entirely in VRAM. If 14B doesn't answer correctly, I switch to Qwen 32B Q3_K_S, which will be slower because it needs to use the RAM. I haven't tried yet the 30B-A3B which I hear is faster and closer to 32B. BTW, I run these models with llama.cpp.<p>For image generation, Flux and Qwen Image work with ComfyUI. I also use Nunchaku, which improves speed considerably.</p>
]]></description><pubDate>Tue, 23 Sep 2025 18:30:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=45350982</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=45350982</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45350982</guid></item><item><title><![CDATA[Why would anybody start a website?]]></title><description><![CDATA[
<p>Article URL: <a href="https://daverupert.com/2025/09/why-would-anybody-start-a-website/">https://daverupert.com/2025/09/why-would-anybody-start-a-website/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45140138">https://news.ycombinator.com/item?id=45140138</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 05 Sep 2025 16:06:23 +0000</pubDate><link>https://daverupert.com/2025/09/why-would-anybody-start-a-website/</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=45140138</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45140138</guid></item><item><title><![CDATA[Effort-Outcome Asymmetry]]></title><description><![CDATA[
<p>Article URL: <a href="https://justinjackson.ca/effort-outcome">https://justinjackson.ca/effort-outcome</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45125545">https://news.ycombinator.com/item?id=45125545</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 04 Sep 2025 10:11:12 +0000</pubDate><link>https://justinjackson.ca/effort-outcome</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=45125545</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45125545</guid></item><item><title><![CDATA[To Infinity but Not Beyond]]></title><description><![CDATA[
<p>Article URL: <a href="https://meyerweb.com/eric/thoughts/2025/08/20/to-infinity-but-not-beyond/">https://meyerweb.com/eric/thoughts/2025/08/20/to-infinity-but-not-beyond/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44970751">https://news.ycombinator.com/item?id=44970751</a></p>
<p>Points: 45</p>
<p># Comments: 2</p>
]]></description><pubDate>Thu, 21 Aug 2025 09:30:52 +0000</pubDate><link>https://meyerweb.com/eric/thoughts/2025/08/20/to-infinity-but-not-beyond/</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=44970751</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44970751</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: What's in your crontab?"]]></title><description><![CDATA[
<p># Runs the DB backup script on Thu at 22:00 -- I download the database backup for a few websites that get new data every week. I do this in case my host bans my account.<p># Runs the IP change check on Mon - Sun at 09:00, 10:30, 12:00, 20:00 -- If the power goes out or the router reboots I get a new IP. On the server I use fail2ban and if I log into the admin panel I might get banned for making too many requests. So my IP needs to be "blessed".<p># Runs Letsencrypt certificate expiry check on Sundays at 11:00 and 18:00 -- I still have a server where I update the certificates by hand.<p># Runs the "daily" backup -- Just rsync<p># Download Godaddy auction data every day at 19:00 -- I don't actively do this anymore but I used to check, based on certain criteria, for domains that were about to expire.<p># Download the sellers.json on the 1st of every month at 19:00 -- I use this to collect data on websites that appear and disappear from the Mediavine and Adthrive sellers.json</p>
]]></description><pubDate>Sun, 10 Aug 2025 19:33:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=44857606</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=44857606</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44857606</guid></item><item><title><![CDATA[New comment by roosgit in "Reality Check on Deep Research by Ben Evans"]]></title><description><![CDATA[
<p>I've known about this issue since Lllama 1. Tried it with Llama 2 and Mistral when those models were released. LLMs are not databases.<p>The test I ran was to ask the LLM about an expired domain of a doctor (obstetrician). I no longer remember the exact domain, but it was similar to annasmithmd.com. One LLM would tell me it used to belong to a doctor named Megan Smith. Another got the name right, Anna Smith, but when I asked it what kind of a doctor, which specialty, it answered pediatrician.<p>So the LLM had no clue, but from the name of the domain it could infer (I guess that's why they call it inference) that the "md" part was associated with doctors.<p>By the way, newer LLMs are very good at making domains more human readable by splitting them into words.</p>
]]></description><pubDate>Wed, 19 Feb 2025 20:42:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=43107391</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=43107391</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43107391</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: If building a computer, what will be good for possible local GenAI use?"]]></title><description><![CDATA[
<p>I can answer question 3. Prompt processing (how fast your input is parsed) is highly correlated with computing speed. Inference (how fast the LLM answers) is highly correlated with memory bandwidth. So a good CPU might read your question faster, but it will answer pretty much as slow as a cheap CPU with the same RAM.<p>I have a Ryzen 3 4100. Just tested Qwen2.5-Coder-32B-Instruct-Q3_K_S.gguf with llama.cpp.<p>CPU-only:<p>54.08 t/s prompt eval<p>2.69 t/s inference<p>---<p>CPU + 52/65 layers offloaded to GPU (RTX 3060 12GB):<p>166.79 t/s prompt eval<p>6.62 t/s inference</p>
]]></description><pubDate>Sat, 08 Feb 2025 10:11:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=42981916</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=42981916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42981916</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: Anyone running AI locally? Mind to share your experience?"]]></title><description><![CDATA[
<p>Renting could be a good choice to get started. I used to rent a g4dn.xlarge instance from AWS (for Stable Diffusion, not LLMs). More affordable options are Runpod and Vast.ai.<p>I started with a local system using llama.cpp on CPU alone and for short questions and answers it was OK for me. Because (in 2023) I didn't know if LLMs would be any good, I chose cheap components <a href="https://news.ycombinator.com/item?id=40267208">https://news.ycombinator.com/item?id=40267208</a>.<p>Since AWS was getting pretty expensive, I also bought an RTX 3060(16GB), an extra 16GB RAM (for a total of 32GB) and a superfast 1TB M.2 SSD. The total cost of the components was around €620.<p>Here are some basic LLM performance numbers for my system:<p><a href="https://news.ycombinator.com/item?id=41845936">https://news.ycombinator.com/item?id=41845936</a><p><a href="https://news.ycombinator.com/item?id=42843313">https://news.ycombinator.com/item?id=42843313</a></p>
]]></description><pubDate>Tue, 28 Jan 2025 14:32:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=42852717</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=42852717</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42852717</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: Building a PC for AI Tasks"]]></title><description><![CDATA[
<p>Start with r/LocalLLama and r/StableDiffusion. Look for benchmarks for various GPUs.<p>I have an RTX 3060(12GB) and 32GB RAM. Just ran Qwen2.5-14B-Instruct-Q4_K_M.gguf in llama.cpp with flash attention enabled and 8K context. I get get 845t/s for prompt processing and 25t/s for generation.<p>For a while I even ran llama.cpp without a GPU (don't recommend it for diffusion) and with the same model (Qwen2.5 14B) I would get 11t/s for processing and 4t/s for generation. Acceptable for chats with short questions/instructions and answers.</p>
]]></description><pubDate>Mon, 27 Jan 2025 17:14:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=42843313</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=42843313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42843313</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: How do I become rich quickly?"]]></title><description><![CDATA[
<p>How rich?<p>You can get some inspiration from businesses for sale on Empire Flippers <a href="https://empireflippers.com/marketplace/" rel="nofollow">https://empireflippers.com/marketplace/</a>.<p>As a rule of thumb for choosing the niche, pick from one of these <a href="https://support.google.com/admob/answer/3150953?hl=en" rel="nofollow">https://support.google.com/admob/answer/3150953?hl=en</a></p>
]]></description><pubDate>Mon, 06 Jan 2025 12:36:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=42610060</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=42610060</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42610060</guid></item><item><title><![CDATA[New comment by roosgit in "Ask HN: What is your local LLM setup?"]]></title><description><![CDATA[
<p>I have a separate PC that I access through SSH. I recently bought a GPU for it, before that I was running it on CPU alone.<p>- B550MH motherboard<p>- Ryzen 3 4100 CPU<p>- 32GB (2x16) RAM cranked up to 3200MHz (prompt generation in memory bound)<p>- 256GB M.2 NVMe (helps with loading models faster)<p>- Nvidia 3060 12GB<p>Software-wise, I use llamafile because on the CPU it's faster by 10-20% for prompt processing than llama.cpp.<p>Performance "Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf":<p>CPU-only: 23.47 t/s (processing), 8.73 t/s (generation)<p>GPU: 941.5 t/s (processing), 29.4 t/s (generation)</p>
]]></description><pubDate>Tue, 15 Oct 2024 07:44:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=41845936</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=41845936</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41845936</guid></item><item><title><![CDATA[Sanding UI]]></title><description><![CDATA[
<p>Article URL: <a href="https://blog.jim-nielsen.com/2024/sanding-ui/">https://blog.jim-nielsen.com/2024/sanding-ui/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41612154">https://news.ycombinator.com/item?id=41612154</a></p>
<p>Points: 1300</p>
<p># Comments: 400</p>
]]></description><pubDate>Sat, 21 Sep 2024 19:36:20 +0000</pubDate><link>https://blog.jim-nielsen.com/2024/sanding-ui/</link><dc:creator>roosgit</dc:creator><comments>https://news.ycombinator.com/item?id=41612154</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41612154</guid></item></channel></rss>