<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: Patrick_Devine</title><link>https://news.ycombinator.com/user?id=Patrick_Devine</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 16 Apr 2026 20:53:17 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=Patrick_Devine" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by Patrick_Devine in "Qwen3.6-35B-A3B: Agentic coding power, now open to all"]]></title><description><![CDATA[
<p>If you're on a Mac, use the MLX backend versions which are considerably faster than the GGML based versions (including llama.cpp) and you don't need to fiddle with the context size. The models are `qwen3.6:35b-a3b-nvfp4`, `qwen3.6:35b-a3b-mxfp8`, and `qwen3.6:35b-a3b-mlx-bf16`.</p>
]]></description><pubDate>Thu, 16 Apr 2026 17:47:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47796976</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47796976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47796976</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>They are nvidia-fp4 weights, but CUDA support isn't _quite_ ready yet, but we've got that cooking.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:41:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594425</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594425</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594425</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>The 35b-a3b-coding-nvfp4 model has the recommended hyperparameters set for coding, not chatting. If you want to use it to chat you can pull the `35b-a3b-nvfp4` model (it doesn't need to re-download the weights again so it will pull quickly) which has the presence penalty turned on which will stop it from thinking so much. You can also try `/set nothink` in the CLI which will turn off thinking entirely.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:39:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594403</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594403</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594403</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama is now powered by MLX on Apple Silicon in preview"]]></title><description><![CDATA[
<p>Try it with mxfp8 or bf16. It's a decent model for doing tool calling, but I wouldn't recommend using it with 4 bit quantization.</p>
]]></description><pubDate>Tue, 31 Mar 2026 22:31:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47594338</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47594338</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47594338</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Mac mini will be made at a new facility in Houston"]]></title><description><![CDATA[
<p>I noticed the same thing. I'm assuming they forgot to photoshop out the chinese characters.</p>
]]></description><pubDate>Tue, 24 Feb 2026 23:29:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47145028</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47145028</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47145028</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "I converted 2D conventional flight tracking into 3D"]]></title><description><![CDATA[
<p>The Departing / Arrival airports plus a full track would be absolutely amazing.</p>
]]></description><pubDate>Wed, 18 Feb 2026 02:23:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47056361</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=47056361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47056361</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "IBM CEO says there is 'no way' spending on AI data centers will pay off"]]></title><description><![CDATA[
<p>5 years is normal-ish depreciation time frame. I know they are gaming GPUs, but the RTX 3090 came out ~ 4.5 years before the RTX 5090. The 5090 has double the performance and 1/3 more memory. The 3090 is still a useful card even after 5 years.</p>
]]></description><pubDate>Wed, 03 Dec 2025 00:38:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=46128877</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=46128877</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46128877</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Mistral 3 family of models released"]]></title><description><![CDATA[
<p>The instruct models are available on Ollama (e.g. `ollama run ministral-3:8b`), however the reasoning models still are a wip. I was trying to get them to work last night and it works for single turn, but is still very flakey w/ multi-turn.</p>
]]></description><pubDate>Tue, 02 Dec 2025 21:59:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=46127508</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=46127508</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46127508</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Claude Is Down"]]></title><description><![CDATA[
<p>The default ones on Ollama are MXFP4 for the feed forward network and use BF16 for the attention weights. The default weights for llama.cpp quantize those tensors as q8_0 which is why llama.cpp can eek out a little bit more performance at the cost of worse output. If you are using this for coding, you definitely want better output.<p>You can use the command `ollama show -v gpt-oss:120b` to see the datatype of each tensor.</p>
]]></description><pubDate>Sat, 08 Nov 2025 17:50:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=45858493</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=45858493</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45858493</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 270M: Compact model for hyper-efficient AI"]]></title><description><![CDATA[
<p>We uploaded gemma3:270m-it-q8_0 and gemma3:270m-it-fp16 late last night which have better results. The q4_0 is the QAT model, but we're still looking at it as there are some issues.</p>
]]></description><pubDate>Fri, 15 Aug 2025 17:05:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=44914837</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44914837</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44914837</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama Turbo"]]></title><description><![CDATA[
<p>Ollama only uses llamacpp for running legacy models. gpt-oss runs entirely in the ollama engine.<p>You don't need to use Turbo mode; it's just there for people who don't have capable enough GPUs.</p>
]]></description><pubDate>Tue, 05 Aug 2025 19:54:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44803419</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44803419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44803419</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama's new engine for multimodal models"]]></title><description><![CDATA[
<p>I worked on the text portion of gemma3 (as well as gemma2) for the Ollama engine, and worked directly with the Gemma team at Google on the implementation. I didn't base the implementation off of the llama.cpp implementation which was done in parallel. We did our implementation in golang, and llama.cpp did theirs in C++. There was no "copy-and-pasting" as you are implying, although I do think collaborating together on these new models would help us get them out the door faster. I am really appreciative of Georgi catching a few things we got wrong in our implementation.</p>
]]></description><pubDate>Fri, 16 May 2025 06:40:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002410</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44002410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002410</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Ollama's new engine for multimodal models"]]></title><description><![CDATA[
<p>Wait, what hosted APIs is Ollama wrapping?</p>
]]></description><pubDate>Fri, 16 May 2025 05:21:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002065</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=44002065</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002065</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"]]></title><description><![CDATA[
<p>The vision tower is 7GB, so I was wondering if you were loading it without vision?</p>
]]></description><pubDate>Mon, 21 Apr 2025 17:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=43754618</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43754618</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43754618</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"]]></title><description><![CDATA[
<p>Ollama has had vision support for Gemma3 since it came out. The implementation is <i>not</i> based on llama.cpp's version.</p>
]]></description><pubDate>Mon, 21 Apr 2025 17:47:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=43754518</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43754518</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43754518</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 Technical Report [pdf]"]]></title><description><![CDATA[
<p>My point was multi-images and pan-and-scan. We haven't implemented those yet in Ollama, but soon!</p>
]]></description><pubDate>Wed, 12 Mar 2025 09:03:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=43341151</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43341151</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43341151</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma3 – The current strongest model that fits on a single GPU"]]></title><description><![CDATA[
<p>There are some fixes coming to uniformly speed up pulls. We've been testing that out but there are a lot of moving pieces with the new engine so it's not here quite yet.</p>
]]></description><pubDate>Wed, 12 Mar 2025 09:00:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=43341124</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43341124</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43341124</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Gemma 3 Technical Report [pdf]"]]></title><description><![CDATA[
<p>Not quite yet on Ollama, but hopefully we'll add this soon. Also, we didn't add the pan-and-scan algorithm yet for getting better clarity in the original image.</p>
]]></description><pubDate>Wed, 12 Mar 2025 07:51:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=43340800</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=43340800</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43340800</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Detrimental Decibels: A Study of Noise Levels in Vancouver's SkyTrain System (2022)"]]></title><description><![CDATA[
<p>When Skytrain first came out it was touted about how quiet it was vs. other metro systems. [1] The problem (as others have pointed out) is just the Mark I trains are 40 years old and the maintenance hasn't kept up with the track and the wheels. Things wear out.<p>[1] <a href="https://youtu.be/pTsSXdSjU1I?t=455" rel="nofollow">https://youtu.be/pTsSXdSjU1I?t=455</a></p>
]]></description><pubDate>Tue, 14 Jan 2025 00:30:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=42691694</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=42691694</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42691694</guid></item><item><title><![CDATA[New comment by Patrick_Devine in "Railroad Tycoon II"]]></title><description><![CDATA[
<p>I still have my boxed copy (along with everything Loki produced) in a big box in the garage.</p>
]]></description><pubDate>Mon, 13 Jan 2025 18:52:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=42687074</link><dc:creator>Patrick_Devine</dc:creator><comments>https://news.ycombinator.com/item?id=42687074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42687074</guid></item></channel></rss>