<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: Patrick_Devine</title><link>https://news.ycombinator.com/user?id=Patrick_Devine</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 04 Jun 2026 01:44:15 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=Patrick_Devine" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>Given the model was just republished by Google 15 minutes ago and we're going to have to redo everything (and everyone will have to redownload for all platforms -- not just Ollama), I'll just say that sometimes things don't work out exactly the way you want them to. :-D<p>That said, I think the gemma4:12b-nvfp4 model is pretty solid. It's been tuned with Nvidia's model optimizer. I've been waiting on the results for MMLU-Pro, but I'll have to retrigger that after reconverting.</p>
]]></description><pubDate>Wed, 03 Jun 2026 22:22:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48390951</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=48390951</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48390951</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>I realize this is a little confusing; we're working w/ the MLX team to bring MLX to other platforms, but we're not quite there yet. The `gemma4:12b-nvfp4` model is specifically for the MLX engine.<p>For the GGUF 4bit variant (i.e. non-macs) you'll need `gemma4:12b-it-q4_K_M` which I just pushed. You'll also need to upgrade to version 0.30.4 which we're just about to release (it's in prerelease and we're running through our last regression tests).</p>
]]></description><pubDate>Wed, 03 Jun 2026 20:30:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48389551</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=48389551</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48389551</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 4 12B: A unified, encoder-free multimodal model"]]></title><description><![CDATA[
<p>I haven't yet pushed the MTP enabled gemma4 12b model for Ollama because in my testing I wasn't getting a performance bump. The other gemma4 MTP models should work OK right now, but there are some fixes we're just about to push. This is specifically for the MLX backend.</p>
]]></description><pubDate>Wed, 03 Jun 2026 20:19:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48389411</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=48389411</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48389411</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"]]></title><description><![CDATA[
<p>In my testing the Gemma 4 31b model had the biggest speed boost in Ollama w/ the MLX runner for coding tasks (at about 2x). Unfortunately you'll need a pretty beefy Mac to run it because quantization really hurts the acceptance rate. The three other smaller models didn't perform as well because the validation time of the draft model ate up most of the performance gains. I'm still trying to tune things to see if I can get better performance.<p>You can try it out with Ollama 0.23.1 by running `ollama run gemma4:31b-coding-mtp-bf16`.</p>
]]></description><pubDate>Tue, 05 May 2026 18:17:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48026404</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=48026404</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48026404</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "SFO Quiet Airport (2025)"]]></title><description><![CDATA[
<p>I wish they would do this when you're boarding the plane. I get that there is essential information that everyone needs to know, but if you're a frequent flier you've probably heard the "put your larger carry-on in the overhead bin and your smaller bag underneath the seat in front of you" hundreds, if not thousands of times.</p>
]]></description><pubDate>Fri, 24 Apr 2026 18:54:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47894370</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47894370</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47894370</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "All 12 moonwalkers had "lunar hay fever" from dust smelling like gunpowder (2018)"]]></title><description><![CDATA[
<p>Isn't this why NASA is developing the Electrodynamic Dust Shield [1] system?<p>[1] <a href="https://www.nasa.gov/image-article/nasas-dust-shield-successfully-repels-lunar-regolith-on-moon/" rel="nofollow">https://www.nasa.gov/image-article/nasas-dust-shield-success...</a></p>
]]></description><pubDate>Fri, 17 Apr 2026 19:59:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=47809919</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47809919</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47809919</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"]]></title><description><![CDATA[
<p>If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`.</p>
]]></description><pubDate>Thu, 16 Apr 2026 17:47:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47796976</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47796976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47796976</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>They are nvidia-fp4 weights, but CUDA support isn't _quite_ ready yet, but we've got that cooking.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:41:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594425</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594425</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594425</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>The 35b-a3b-coding-nvfp4 model has the recommended hyperparameters set for coding, not chatting. If you want to use it to chat you can pull the `35b-a3b-nvfp4` model (it doesn't need to re-download the weights again so it will pull quickly) which has the presence penalty turned on which will stop it from thinking so much. You can also try `/set nothink` in the CLI which will turn off thinking entirely.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:39:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594403</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594403</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594403</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>Try it with mxfp8 or bf16. It's a decent model for doing tool calling, but I wouldn't recommend using it with 4 bit quantization.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:31:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594338</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594338</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594338</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Mac mini will be made at a new facility in Houston"]]></title><description><![CDATA[
<p>I noticed the same thing. I'm assuming they forgot to photoshop out the chinese characters.</p>
]]></description><pubDate>Tue, 24 Feb 2026 23:29:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47145028</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47145028</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47145028</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "I converted 2D conventional flight tracking into 3D"]]></title><description><![CDATA[
<p>The Departing / Arrival airports plus a full track would be absolutely amazing.</p>
]]></description><pubDate>Wed, 18 Feb 2026 02:23:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47056361</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47056361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47056361</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "IBM CEO says there is 'no way' spending on AI data centers will pay off"]]></title><description><![CDATA[
<p>5 years is normal-ish depreciation time frame. I know they are gaming GPUs, but the RTX 3090 came out ~ 4.5 years before the RTX 5090. The 5090 has double the performance and 1/3 more memory. The 3090 is still a useful card even after 5 years.</p>
]]></description><pubDate>Wed, 03 Dec 2025 00:38:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46128877</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=46128877</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46128877</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Mistral 3 family of models released"]]></title><description><![CDATA[
<p>The instruct models are available on Ollama (e.g. `ollama run ministral-3:8b`), however the reasoning models still are a wip. I was trying to get them to work last night and it works for single turn, but is still very flakey w/ multi-turn.</p>
]]></description><pubDate>Tue, 02 Dec 2025 21:59:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=46127508</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=46127508</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46127508</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Claude Is Down"]]></title><description><![CDATA[
<p>The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output.<p>You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor.</p>
]]></description><pubDate>Sat, 08 Nov 2025 17:50:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=45858493</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=45858493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45858493</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 270M: Compact model for hyper-efficient AI"]]></title><description><![CDATA[
<p>We uploaded gemma3:270m-it-q8_0 and gemma3:270m-it-fp16 late last night which have better results. The q4_0 is the QAT model, but we're still looking at it as there are some issues.</p>
]]></description><pubDate>Fri, 15 Aug 2025 17:05:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=44914837</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44914837</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44914837</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama Turbo"]]></title><description><![CDATA[
<p>Ollama only uses llamacpp for running legacy models. gpt-oss runs entirely in the ollama engine.<p>You don't need to use Turbo mode; it's just there for people who don't have capable enough GPUs.</p>
]]></description><pubDate>Tue, 05 Aug 2025 19:54:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44803419</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44803419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44803419</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama's new engine for multimodal models"]]></title><description><![CDATA[
<p>I worked on the text portion of gemma3 (as well as gemma2) for the Ollama engine, and worked directly with the Gemma team at Google on the implementation. I didn't base the implementation off of the llama.cpp implementation which was done in parallel. We did our implementation in golang, and llama.cpp did theirs in C++. There was no "copy-and-pasting" as you are implying, although I do think collaborating together on these new models would help us get them out the door faster. I am really appreciative of Georgi catching a few things we got wrong in our implementation.</p>
]]></description><pubDate>Fri, 16 May 2025 06:40:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002410</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44002410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002410</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama's new engine for multimodal models"]]></title><description><![CDATA[
<p>Wait, what hosted APIs is Ollama wrapping?</p>
]]></description><pubDate>Fri, 16 May 2025 05:21:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002065</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44002065</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002065</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"]]></title><description><![CDATA[
<p>The vision tower is 7GB, so I was wondering if you were loading it without vision?</p>
]]></description><pubDate>Mon, 21 Apr 2025 17:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=43754618</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43754618</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43754618</guid></item></channel></rss>