<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: hansonw</title><link>https://news.ycombinator.com/user?id=hansonw</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 04 May 2026 16:23:26 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=hansonw" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by hansonw in "GPT-5.4"]]></title><description><![CDATA[
<p>The skill source is here: <a href="https://github.com/openai/skills/blob/main/skills/.curated/playwright-interactive/SKILL.md" rel="nofollow">https://github.com/openai/skills/blob/main/skills/.curated/p...</a><p>$skill-installer playwright-interactive in Codex! the model writes normal JS playwright code in a Node REPL</p>
]]></description><pubDate>Fri, 06 Mar 2026 03:59:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47270692</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=47270692</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47270692</guid></item><item><title><![CDATA[New comment by hansonw in "Building more with GPT-5.1-Codex-Max"]]></title><description><![CDATA[
<p>Rest assured that we are better at training models than naming them ;D<p>- New benchmark SOTAs with 77.9% on SWE-Bench-Verified, 79.9% on SWE-Lancer, and 58.1% on TerminalBench 2.0<p>- Natively trained to work across many hours across multiple context windows via compaction<p>- 30% more token-efficient at the same reasoning level across many tasks<p>Let us know what you think!</p>
]]></description><pubDate>Wed, 19 Nov 2025 18:19:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=45982917</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=45982917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45982917</guid></item><item><title><![CDATA[Building more with GPT-5.1-Codex-Max]]></title><description><![CDATA[
<p>Article URL: <a href="https://openai.com/index/gpt-5-1-codex-max/">https://openai.com/index/gpt-5-1-codex-max/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45982649">https://news.ycombinator.com/item?id=45982649</a></p>
<p>Points: 483</p>
<p># Comments: 319</p>
]]></description><pubDate>Wed, 19 Nov 2025 18:01:59 +0000</pubDate><link>https://openai.com/index/gpt-5-1-codex-max/</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=45982649</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45982649</guid></item><item><title><![CDATA[New comment by hansonw in "A Research Preview of Codex"]]></title><description><![CDATA[
<p>More about that here! <a href="https://platform.openai.com/docs/codex#advanced-configuration" rel="nofollow">https://platform.openai.com/docs/codex#advanced-configuratio...</a></p>
]]></description><pubDate>Fri, 16 May 2025 19:08:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44008833</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=44008833</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44008833</guid></item><item><title><![CDATA[New comment by hansonw in "An embarrassingly simple approach to recover unlearned knowledge for LLMs"]]></title><description><![CDATA[
<p>The ELI5 of the paper is that most "unlearning" methods can be regarded as adding some delta `w` to the parameters of the network, but most of `w` just gets "rounded away" during quantization (i.e. `quantize(X+w) ~= quantize(X)`). Pretty clever idea as a lot of cited methods explicitly optimize/regularize to keep `w` small to avoid degrading evaluation accuracy.<p>To your point, it does put into question the idea of whether these methods can actually be considered truly "unlearning" from an information-theoretic perspective (or if it is the equivalent of e.g. just putting `if (false)` around the still latent knowledge)</p>
]]></description><pubDate>Mon, 04 Nov 2024 07:42:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=42039413</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=42039413</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42039413</guid></item><item><title><![CDATA[New comment by hansonw in "Fine-tuning now available for GPT-4o"]]></title><description><![CDATA[
<p>It looks like they didn't want to make a public submission in order to avoid disclosing the model internals: <a href="https://cosine.sh/blog/genie-technical-report#:~:text=SWE%2DBench%20has,SWE%2DBench%20tasks">https://cosine.sh/blog/genie-technical-report#:~:text=SWE%2D...</a>.</p>
]]></description><pubDate>Tue, 20 Aug 2024 19:14:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=41303025</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=41303025</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41303025</guid></item><item><title><![CDATA[New comment by hansonw in "Prompt Caching"]]></title><description><![CDATA[
<p>It’s probably more. Pretty conservatively, if the KV embedding dimension for each token is ~10K x 100 attention layers (this is roughly the scale of Llama3.1 405B) that’s already 1M 16-bit floats per token = 2MB. They have likely needed to implement some kind of KV compression (like DeepSeek) to make this even feasible.</p>
]]></description><pubDate>Sun, 18 Aug 2024 23:25:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=41286396</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=41286396</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41286396</guid></item><item><title><![CDATA[New comment by hansonw in "AI Search: The Bitter-Er Lesson"]]></title><description><![CDATA[
<p><a href="https://news.ycombinator.com/item?id=40675577">https://news.ycombinator.com/item?id=40675577</a></p>
]]></description><pubDate>Sat, 15 Jun 2024 05:10:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=40687651</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=40687651</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40687651</guid></item><item><title><![CDATA[New comment by hansonw in "What can LLMs never do?"]]></title><description><![CDATA[
<p>This is also a good paper on the subject:<p>What Algorithms can Transformers Learn? A Study in Length Generalization <a href="https://arxiv.org/abs/2310.16028" rel="nofollow">https://arxiv.org/abs/2310.16028</a></p>
]]></description><pubDate>Sat, 27 Apr 2024 16:09:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=40181032</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=40181032</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40181032</guid></item><item><title><![CDATA[New comment by hansonw in "Ask HN: How does deploying a fine-tuned model work"]]></title><description><![CDATA[
<p><a href="https://predibase.com" rel="nofollow">https://predibase.com</a></p>
]]></description><pubDate>Wed, 24 Apr 2024 03:07:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=40140048</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=40140048</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40140048</guid></item><item><title><![CDATA[New comment by hansonw in "GPT-4 Turbo with Vision Generally Available"]]></title><description><![CDATA[
<p>Yes. But also note that the new function calling is actually “tool calling” where the model is also fine-tuned to expect and react to the <i>output</i> of the function (and there are various other nuances like being able to call multiple functions in parallel and matching up the outputs to function calls precisely).<p>When used in multi-turn “call/response” mode it actually does start to unlock some new capabilities.</p>
]]></description><pubDate>Tue, 09 Apr 2024 20:23:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=39983832</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39983832</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39983832</guid></item><item><title><![CDATA[New comment by hansonw in "How we built Text-to-SQL at Pinterest"]]></title><description><![CDATA[
<p>Not the author, but really nice that they shared some real data points:<p>> Once our Text-to-SQL solution was in production, we were also able to observe how users interacted with the system. As our implementation improved and as users became more familiar with the feature, our first-shot acceptance rate for the generated SQL increased from 20% to above 40%. In practice, most queries that are generated require multiple iterations of human or AI generation before being finalized. In order to determine how Text-to-SQL affected data user productivity, the most reliable method would have been to experiment. Using such a method, previous research has found that AI assistance improved task completion speed by over 50%. In our real world data (which importantly does not control for differences in tasks), we find a 35% improvement in task completion speed for writing SQL queries using AI assistance.</p>
]]></description><pubDate>Mon, 08 Apr 2024 16:22:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=39971265</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39971265</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39971265</guid></item><item><title><![CDATA[How we built Text-to-SQL at Pinterest]]></title><description><![CDATA[
<p>Article URL: <a href="https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff">https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=39971231">https://news.ycombinator.com/item?id=39971231</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 08 Apr 2024 16:19:09 +0000</pubDate><link>https://medium.com/pinterest-engineering/how-we-built-text-to-sql-at-pinterest-30bad30dabff</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39971231</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39971231</guid></item><item><title><![CDATA[New comment by hansonw in "Big Post About Big Context"]]></title><description><![CDATA[
<p>If you think about it, RAG is a relatively primitive “first pass attention layer” that is binary and semi-heuristic based. I think it’s fairly safe to say that in the long term RAG will be integrated into the model architecture somehow, just a matter of when :)</p>
]]></description><pubDate>Fri, 01 Mar 2024 04:00:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=39558400</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39558400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39558400</guid></item><item><title><![CDATA[New comment by hansonw in "Big Post About Big Context"]]></title><description><![CDATA[
<p>If sub-quadratic architectures (eg Mamba) become a thing, it will become feasible to precompute most of the work for a fixed prefix (i.e. system prompt) and the latency can be pretty minimal. Even with current transformers, if you have a fixed system prompt, you can save the KV cache and it helps a lot (though the inference time of each incremental token is still linear).</p>
]]></description><pubDate>Fri, 01 Mar 2024 03:49:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=39558336</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39558336</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39558336</guid></item><item><title><![CDATA[New comment by hansonw in "Mamba: The Easy Way"]]></title><description><![CDATA[
<p>Indeed: <a href="https://arxiv.org/pdf/2402.01032.pdf" rel="nofollow">https://arxiv.org/pdf/2402.01032.pdf</a>
Perhaps future iterations of SSMs will accommodate dynamically sized (but still non-linearly-growing) hidden states / memories!</p>
]]></description><pubDate>Sat, 24 Feb 2024 08:51:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=39490189</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39490189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39490189</guid></item><item><title><![CDATA[New comment by hansonw in "Mamba: The Easy Way"]]></title><description><![CDATA[
<p>“RNN-mode inference” is also extremely exciting because you can precompute the hidden state of any prompt prefix (i.e. a long system prompt, or statically retrieved context) and continued generations pay the same cost irrespective of the prefix length.</p>
]]></description><pubDate>Fri, 23 Feb 2024 20:45:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=39485868</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=39485868</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39485868</guid></item><item><title><![CDATA[New comment by hansonw in "Grist is a modern, relational spreadsheet"]]></title><description><![CDATA[
<p>Our startup is building <a href="https://arcwise.app" rel="nofollow noreferrer">https://arcwise.app</a>, which allows you to embed full-fledged SQL tables inside Google Sheets! We’re in the process of building out support for joins & subqueries, would be curious what people think.</p>
]]></description><pubDate>Thu, 02 Nov 2023 16:31:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=38116045</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=38116045</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38116045</guid></item><item><title><![CDATA[New comment by hansonw in "I built Excel for Uber and they ditched it"]]></title><description><![CDATA[
<p>I’m building a solution that works like this - we directly connect spreadsheet models to company databases (even converting pivots/formulas to SQL). Would love to chat with anyone that might find this valuable: <a href="https://arcwise.app" rel="nofollow noreferrer">https://arcwise.app</a></p>
]]></description><pubDate>Sat, 16 Sep 2023 02:14:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=37531366</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=37531366</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37531366</guid></item><item><title><![CDATA[New comment by hansonw in "Persimmon-8B"]]></title><description><![CDATA[
<p>This is the best comparison I've found that benchmarks the current OSS inference solutions: <a href="https://hamel.dev/notes/llm/inference/03_inference.html" rel="nofollow noreferrer">https://hamel.dev/notes/llm/inference/03_inference.html</a><p>IME the streaming API in text-generation-inference works fine in production. (Though some of the other solutions may be better). I've used it with Starcoder (15B) and the time-to-first-token / tokens per second all seem quite reasonable out of the box.</p>
]]></description><pubDate>Fri, 08 Sep 2023 11:26:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=37432159</link><dc:creator>hansonw</dc:creator><comments>https://news.ycombinator.com/item?id=37432159</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37432159</guid></item></channel></rss>