<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jmorgan</title><link>https://news.ycombinator.com/user?id=jmorgan</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 20 Jun 2026 11:09:35 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jmorgan" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jmorgan in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"]]></title><description><![CDATA[
<p>The larger models are available on Ollama's cloud as most folks don't have the hardware to run 500B-1T parameter models.</p>
]]></description><pubDate>Mon, 15 Jun 2026 18:38:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48545301</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=48545301</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48545301</guid></item><item><title><![CDATA[New comment by jmorgan in "Pi – A minimal terminal coding harness"]]></title><description><![CDATA[
<p>For local models I've been trying it with GLM-4.7-Flash and the new LFM2 24B model. I'm excited to try it with the new Qwen3.5 models that came out today as well.</p>
]]></description><pubDate>Wed, 25 Feb 2026 06:51:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47148207</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=47148207</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47148207</guid></item><item><title><![CDATA[New comment by jmorgan in "Pi – A minimal terminal coding harness"]]></title><description><![CDATA[
<p>I've been using Pi day to day recently for simple, smaller tasks. It's a great harness for use with smaller parameter size models given the system prompt is quite a bit shorter vs Claude or Codex (and it uses a nice small set of tools by default).</p>
]]></description><pubDate>Tue, 24 Feb 2026 22:33:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47144284</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=47144284</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47144284</guid></item><item><title><![CDATA[New comment by jmorgan in "ollama launch"]]></title><description><![CDATA[
<p>That's not good, sorry. I work on Ollama - shoot me an email (jeff@ollama.com) and we can help debug</p>
]]></description><pubDate>Sun, 25 Jan 2026 21:05:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46758211</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=46758211</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46758211</guid></item><item><title><![CDATA[New comment by jmorgan in "GLM-4.7-Flash"]]></title><description><![CDATA[
<p>It's available (with tool parsing, etc.): <a href="https://ollama.com/library/glm-4.7-flash">https://ollama.com/library/glm-4.7-flash</a> but requires 0.14.3 which is in pre-release (and available on Ollama's GitHub repo)</p>
]]></description><pubDate>Mon, 19 Jan 2026 22:28:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46685383</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=46685383</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46685383</guid></item><item><title><![CDATA[New comment by jmorgan in "A guide to local coding models"]]></title><description><![CDATA[
<p>The source is available here: <a href="https://github.com/ollama/ollama/tree/main/app" rel="nofollow">https://github.com/ollama/ollama/tree/main/app</a></p>
]]></description><pubDate>Mon, 22 Dec 2025 03:59:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46351173</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=46351173</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46351173</guid></item><item><title><![CDATA[New comment by jmorgan in "Claude Is Down"]]></title><description><![CDATA[
<p>The gpt-oss weights on Ollama are native mxfp4 (the same weights provided by OpenAI). No additional quantization is applied, so let me know if you're seeing any strange results with Ollama.<p>Most gpt-oss GGUF files online have parts of their weights quantized to q8_0, and we've seen folks get some strange results from these models. If you're importing these to Ollama to run, the output quality may decrease.</p>
]]></description><pubDate>Sat, 08 Nov 2025 04:13:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=45854067</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=45854067</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45854067</guid></item><item><title><![CDATA[New comment by jmorgan in "Ollama Web Search"]]></title><description><![CDATA[
<p>We did consider building functionality into Ollama that would go fetch search results and website contents using a headless browser or similar. However we had a lot of worries about result quality and also IP blocking from Ollama creating crawler-like behavior. Having a hosted API felt like a fast path to get results into users' context window, but we are still exploring the local option. Ideally you'd be able to stay fully local if you want to (even when using capabilities like search)</p>
]]></description><pubDate>Thu, 25 Sep 2025 20:18:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=45378422</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=45378422</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45378422</guid></item><item><title><![CDATA[Ollama Web Search]]></title><description><![CDATA[
<p>Article URL: <a href="https://ollama.com/blog/web-search">https://ollama.com/blog/web-search</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45377641">https://news.ycombinator.com/item?id=45377641</a></p>
<p>Points: 348</p>
<p># Comments: 176</p>
]]></description><pubDate>Thu, 25 Sep 2025 19:21:52 +0000</pubDate><link>https://ollama.com/blog/web-search</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=45377641</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45377641</guid></item><item><title><![CDATA[New comment by jmorgan in "Gemma 3 270M: Compact model for hyper-efficient AI"]]></title><description><![CDATA[
<p>Amazing work. This model feels really good at one-off tasks like summarization and autocomplete. I really love that you released a quantized aware training version on launch day as well, making it even smaller!</p>
]]></description><pubDate>Thu, 14 Aug 2025 18:32:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=44903942</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=44903942</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44903942</guid></item><item><title><![CDATA[New comment by jmorgan in "Ollama Turbo"]]></title><description><![CDATA[
<p>It should open ollama.com/connect – sorry about that. Feel free to message me jeff@ollama.com if you keep seeing issues</p>
]]></description><pubDate>Wed, 06 Aug 2025 05:42:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=44808111</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=44808111</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44808111</guid></item><item><title><![CDATA[New comment by jmorgan in "Open models by OpenAI"]]></title><description><![CDATA[
<p>Sorry about this. Re-downloading Ollama should fix the error</p>
]]></description><pubDate>Tue, 05 Aug 2025 19:25:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44803013</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=44803013</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44803013</guid></item><item><title><![CDATA[New comment by jmorgan in "Magistral — the first reasoning model by Mistral AI"]]></title><description><![CDATA[
<p>Working on adding tool calling support to Magistral in Ollama. It requires a tokenizer change and also uses a new tool calling format. Excited to see the results of combining thinking + tool calling!</p>
]]></description><pubDate>Wed, 11 Jun 2025 03:50:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44244026</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=44244026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44244026</guid></item><item><title><![CDATA[UI-TARS-1.5: open-source computer use vision-language model]]></title><description><![CDATA[
<p>Article URL: <a href="https://seed-tars.com/1.5/">https://seed-tars.com/1.5/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43719311">https://news.ycombinator.com/item?id=43719311</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 17 Apr 2025 16:44:40 +0000</pubDate><link>https://seed-tars.com/1.5/</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=43719311</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43719311</guid></item><item><title><![CDATA[New comment by jmorgan in "Deepseek R1 Distill 8B Q40 on 4 x Raspberry Pi 5"]]></title><description><![CDATA[
<p>This is a great point. apt-get would definitely be a better install experience and upgrade experience (that's what I would want too). Tailscale does this amazing well: <a href="https://tailscale.com/download/linux" rel="nofollow">https://tailscale.com/download/linux</a><p>The main issue for the maintainer team would be the work in hosting and maintaining all the package repos for apt, yum, etc, and making sure the we handle the case where nvidia/amd drivers aren't installed (quite common on cloud VMs). Mostly a matter of time and putting in the work.<p>For now every release of Ollama includes a minimal archive with the ollama binary and required dynamic libraries: <a href="https://github.com/ollama/ollama/blob/main/docs/linux.md#manual-install">https://github.com/ollama/ollama/blob/main/docs/linux.md#man...</a>. But we could definitely do better</p>
]]></description><pubDate>Sun, 16 Feb 2025 22:38:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=43072480</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=43072480</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43072480</guid></item><item><title><![CDATA[New comment by jmorgan in "Qwen2.5-1M: Deploy your own Qwen with context length up to 1M tokens"]]></title><description><![CDATA[
<p>Sorry this isn't more obvious. Ideally VRAM usage for the context window (the KV cache) becomes dynamic, starting small and growing with token usage, whereas right now Ollama defaults to a size of 2K which can be overridden at runtime. A great example of this is vLLM's PagedAttention implementation [1] or Microsoft's vAttention [2] which is CUDA-specific (and there are quite a few others).<p>1M tokens will definitely require a lot of KV cache memory. One way to reduce the memory footprint is to use KV cache quantization, which has recently been added behind a flag [3] and will 1/4 the memory footprint if 4-bit KV cache quantization is used (OLLAMA_KV_CACHE_TYPE=q4_0 ollama serve)<p>[1] <a href="https://arxiv.org/pdf/2309.06180" rel="nofollow">https://arxiv.org/pdf/2309.06180</a><p>[2] <a href="https://github.com/microsoft/vattention">https://github.com/microsoft/vattention</a><p>[3] <a href="https://smcleod.net/2024/12/bringing-k/v-context-quantisation-to-ollama/" rel="nofollow">https://smcleod.net/2024/12/bringing-k/v-context-quantisatio...</a></p>
]]></description><pubDate>Sun, 26 Jan 2025 20:38:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=42833836</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=42833836</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42833836</guid></item><item><title><![CDATA[New comment by jmorgan in "Phi 4 available on Ollama"]]></title><description><![CDATA[
<p>Phi-4's architecture changed slightly from Phi-3.5 (it no longer uses a sliding window of 2,048 tokens [1]), causing a change in the hyperparameters (and ultimately an error at inference time for some published GGUF files on Hugging Face, since the same architecture name/identifier was re-used between the two models).<p>For the Phi-4 uploaded to Ollama, the hyperparameters were set to avoid the error. The error should stop occurring in the next version of Ollama [2] for imported GGUF files as well<p>In retrospect, a new architecture name should probably have been used entirely, instead of re-using "phi3".<p>[1] <a href="https://arxiv.org/html/2412.08905v1" rel="nofollow">https://arxiv.org/html/2412.08905v1</a><p>[2] <a href="https://github.com/ollama/ollama/releases/tag/v0.5.5">https://github.com/ollama/ollama/releases/tag/v0.5.5</a></p>
]]></description><pubDate>Sun, 12 Jan 2025 04:16:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=42671237</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=42671237</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42671237</guid></item><item><title><![CDATA[New comment by jmorgan in "Fast LLM Inference From Scratch (using CUDA)"]]></title><description><![CDATA[
<p>Thank you for writing this!</p>
]]></description><pubDate>Mon, 16 Dec 2024 04:17:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=42427973</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=42427973</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42427973</guid></item><item><title><![CDATA[Diffusion models are real-time game engines]]></title><description><![CDATA[
<p>Article URL: <a href="https://gamengen.github.io">https://gamengen.github.io</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41375548">https://news.ycombinator.com/item?id=41375548</a></p>
<p>Points: 1149</p>
<p># Comments: 409</p>
]]></description><pubDate>Wed, 28 Aug 2024 02:59:40 +0000</pubDate><link>https://gamengen.github.io</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=41375548</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41375548</guid></item><item><title><![CDATA[New comment by jmorgan in "Gemma 2: Improving Open Language Models at a Practical Size [pdf]"]]></title><description><![CDATA[
<p>Currently when the context limit is hit, there's a halving of the context window (or a "context shift") to allow inference to continue – this is helpful for smaller (e.g. 1-2k) context windows.<p>However, not all models (especially newer ones) respond well to this, which makes sense. We're working on changing the behavior in Ollama's API to be more similar to OpenAI, Anthropic and similar APIs so that when the context limit is hit, the API returns a "limit" finish/done reason. Hope this is helpful!</p>
]]></description><pubDate>Fri, 28 Jun 2024 07:39:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=40818556</link><dc:creator>jmorgan</dc:creator><comments>https://news.ycombinator.com/item?id=40818556</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40818556</guid></item></channel></rss>