<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: pcwelder</title><link>https://news.ycombinator.com/user?id=pcwelder</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 20 May 2026 06:18:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=pcwelder" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by pcwelder in "Gemini 3.5 Flash"]]></title><description><![CDATA[
<p>Opus is not the correct tier to compare this flash model with.<p>On my tasks it has not been as good as even Sonnet 4.6 so far.<p>Instruction following over long context feels worse.<p>It's not a bad model by any means, better than any pro open source model for sure.</p>
]]></description><pubDate>Tue, 19 May 2026 20:39:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48199276</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=48199276</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48199276</guid></item><item><title><![CDATA[New comment by pcwelder in "LLMs corrupt your documents when you delegate"]]></title><description><![CDATA[
<p>It's worth noting that Claude Code itself doesn't use the `insert` tool. (It also uses custom edit tool not the suite's predefined str_replace)<p>Also as a person developing agentic code tools since before Claude Code, I'm skeptical if str_replace provides accuracy improvement over just full rewrite.<p>Back in the day when SOTA models would do lazy coding like `// ... rest of the code ...`, full rewrite wasn't easy. 
Search/replace was fast, efficient and without the lazy coding. However, it came with slight accuracy drop.<p>Today that accuracy drop might be minimal/absent, but I'm not sure if it could lead to improvements like preventing doc corruption.</p>
]]></description><pubDate>Sun, 10 May 2026 07:01:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=48081642</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=48081642</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48081642</guid></item><item><title><![CDATA[New comment by pcwelder in "“Car Wash” test with 53 models"]]></title><description><![CDATA[
<p>To sonnet 4.6 if you tell it first that "You're being tested for intelligence." It answers correctly 100% of the times.<p>My hypothesis is that some models err towards assuming human queries are real and consistent and not out there to break them.<p>This comes in real handy in coding agents because queries are sometimes gibberish till the models actually fetch the code files, then they make sense. Asking clarification immediately breaks agentic flows.</p>
]]></description><pubDate>Tue, 24 Feb 2026 04:28:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47132866</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=47132866</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47132866</guid></item><item><title><![CDATA[New comment by pcwelder in "Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changed"]]></title><description><![CDATA[
<p>Great work, but concurrency is lost.<p>With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.<p>Have you tested followup edits on the same files?</p>
]]></description><pubDate>Thu, 12 Feb 2026 14:16:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46989105</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46989105</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46989105</guid></item><item><title><![CDATA[New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>Cool! Please share your work if possible!<p>I couldn't decide on folding and reducing noise so I'm stuck on that front. I believe there is some elegant solution that I'm missing, hope to see your take.</p>
]]></description><pubDate>Thu, 12 Feb 2026 10:24:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46987020</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46987020</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46987020</guid></item><item><title><![CDATA[New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>All anthropic models. Gemini 2.5 pro and above. Gemini 3 flash is very good too.<p>GPT models can follow tool format correctly but don't keep on going.<p>Grok-4+ are decent but with issues in longer chats.<p>Kimi 2.5 has issues with it reverting to its RL tool format.</p>
]]></description><pubDate>Thu, 12 Feb 2026 07:16:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=46985735</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46985735</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46985735</guid></item><item><title><![CDATA[New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>I had added z-ai in allow list explicitly and verified that it's the one being used.</p>
]]></description><pubDate>Wed, 11 Feb 2026 18:12:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=46978567</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46978567</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46978567</guid></item><item><title><![CDATA[New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>It's live on openrouter now.<p>In my personal benchmark it's bad. So far the benchmark has been a really good indicator of instruction following and agentic behaviour in general.<p>To those who are curious, the benchmark is just the ability of model to follow a custom tool calling format. I ask it to using coding tasks using chat.md [1] + mcps. And so far it's just not able to follow it at all.<p>[1] <a href="https://github.com/rusiaaman/chat.md" rel="nofollow">https://github.com/rusiaaman/chat.md</a></p>
]]></description><pubDate>Wed, 11 Feb 2026 17:28:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=46977903</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46977903</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46977903</guid></item><item><title><![CDATA[New comment by pcwelder in "Parse, Don't Validate (2019)"]]></title><description><![CDATA[
<p>Each repost is worth it.<p>This, along with John Ousterhout's talk [1] on deep interfaces was transformational for me. And this is coming from a guy who codes in python, so lots of transferable learnings.<p>[1] <a href="https://www.youtube.com/watch?v=bmSAYlu0NcY" rel="nofollow">https://www.youtube.com/watch?v=bmSAYlu0NcY</a></p>
]]></description><pubDate>Tue, 10 Feb 2026 16:10:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46961829</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46961829</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46961829</guid></item><item><title><![CDATA[New comment by pcwelder in "MaliciousCorgi: AI Extensions send your code to China"]]></title><description><![CDATA[
<p>> These are sending all files it can access<p>TBF, Cursor's code indexing works the same way, it has to send all workspace files to their servers.<p>Auto-completion systems need previous edits to suggest next edits so no surprises their either.</p>
]]></description><pubDate>Mon, 02 Feb 2026 13:49:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46856007</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46856007</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46856007</guid></item><item><title><![CDATA[New comment by pcwelder in "Unrolling the Codex agent loop"]]></title><description><![CDATA[
<p>Sonnet has the same behavior: drops thinking on user message. Curiously in the latest Opus they have removed this behavior and all thinking tokens are preserved.</p>
]]></description><pubDate>Sat, 24 Jan 2026 04:30:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46741073</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46741073</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46741073</guid></item><item><title><![CDATA[New comment by pcwelder in "Announcing the Beta release of ty"]]></title><description><![CDATA[
<p>```<p>from anthropic.types import MessageParam<p>data: list[MessageParam] = [{"role": "user", "content": [{"type": "text", "text": ""}]}]<p>```<p>This for example works both in mypy and pyright. (Also autocompletion of typedict keys / literals from pylance is missing)</p>
]]></description><pubDate>Wed, 17 Dec 2025 08:36:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=46299570</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46299570</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46299570</guid></item><item><title><![CDATA[New comment by pcwelder in "Announcing the Beta release of ty"]]></title><description><![CDATA[
<p>Displaying inferred types inline is a killer feature (inspired from rust lang server?). It was a pleasant surprise!<p>It's fast too as promised.<p>However, it doesn't work well with TypedDicts and that's a show-stopper for us. Hoping to see that support soon.</p>
]]></description><pubDate>Wed, 17 Dec 2025 06:39:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46298924</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46298924</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46298924</guid></item><item><title><![CDATA[New comment by pcwelder in "Claude CLI deleted my home directory and wiped my Mac"]]></title><description><![CDATA[
<p>To those who are not deterred and feel yolo mode is worth the risk, there are two patterns that should perk your ears up.<p>- Cleanup or deletion tasks. Be ready to hit ctrl c anytime. Led to disastrous nukes in two reddit threads.<p>- Errors impacting the whole repo, especially those that are difficult to solve. In such cases if it decides to reset and redo, it may remove sensitive paths as well.<p>It removed my repo once because "it had multiple problems and was better to it write from scratch".<p>- Any weird behavior, "this doesn't seem right", "looks like shell isn't working correctly" indicative of application bug. It might employ dangerous workarounds.</p>
]]></description><pubDate>Mon, 15 Dec 2025 05:11:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=46270703</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46270703</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46270703</guid></item><item><title><![CDATA[New comment by pcwelder in "I failed to recreate the 1996 Space Jam website with Claude"]]></title><description><![CDATA[
<p>It just fetched the HTML and replicated it. The usage of table is a giveaway.<p>Any LLM with browser tool can do it (Kombai one shots it too for example), because it's just cheating.</p>
]]></description><pubDate>Mon, 08 Dec 2025 11:33:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=46191101</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46191101</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46191101</guid></item><item><title><![CDATA[New comment by pcwelder in "I failed to recreate the 1996 Space Jam website with Claude"]]></title><description><![CDATA[
<p>But that's cheating because it then has the source code containing the table and its styles.<p>I can confirm that this is what it does.<p>And if you ask it to not use tables, it cleverly uses div with the same layout as the table instead.</p>
]]></description><pubDate>Mon, 08 Dec 2025 11:31:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=46191088</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46191088</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46191088</guid></item><item><title><![CDATA[New comment by pcwelder in "What I don’t like about chains of thoughts (2023)"]]></title><description><![CDATA[
<p>In RNNs and Transformers we obtain probability distribution of target variable directly and sample using methods like top-k or temprature sampling.<p>I don't see the equivalence to MCMC. It's not like we have a complex probability function that we are trying to sample from using a chain.<p>It's just logistic regression at each step.</p>
]]></description><pubDate>Thu, 04 Dec 2025 05:30:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=46144120</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=46144120</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46144120</guid></item><item><title><![CDATA[New comment by pcwelder in "Should LLMs just treat text content as an image?"]]></title><description><![CDATA[
<p>I  ϲаn guаrаntее thаt thе ОСR ϲаn't rеаd thіs sеntеnсе ϲоrrесtlу.</p>
]]></description><pubDate>Mon, 27 Oct 2025 11:30:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=45719746</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=45719746</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45719746</guid></item><item><title><![CDATA[New comment by pcwelder in "Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?"]]></title><description><![CDATA[
<p>There are many unicode characters that look alike. There are also those zero width characters.</p>
]]></description><pubDate>Thu, 23 Oct 2025 06:27:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=45678803</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=45678803</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45678803</guid></item><item><title><![CDATA[New comment by pcwelder in "Python developers are embracing type hints"]]></title><description><![CDATA[
<p>It doesn't throw error in the REPL though. Surely you meant to share some other example?</p>
]]></description><pubDate>Sun, 28 Sep 2025 06:56:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=45402290</link><dc:creator>pcwelder</dc:creator><comments>https://news.ycombinator.com/item?id=45402290</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45402290</guid></item></channel></rss>