Hacker News: pcwelder

New comment by pcwelder in "Kimi K2.7-Code: open-source coding model with better token efficiency"

pcwelder — Sat, 13 Jun 2026 06:45:21 +0000

Could be json or non json. Instead of using tools in API, you ask model to share structured output in text. You parse the string to get the JSON. Gives much more control over things you can do.

For example model shares

London

New comment by pcwelder in "Statement on US government directive to suspend access to Fable 5 and Mythos 5"

pcwelder — Sat, 13 Jun 2026 05:13:27 +0000

https://news.ycombinator.com/item?id=48496895

Found the time traveller.

MiniMax-M3: A native multimodal model with 1M context

pcwelder — Fri, 12 Jun 2026 14:30:51 +0000

Article URL: https://huggingface.co/MiniMaxAI/MiniMax-M3

Comments URL: https://news.ycombinator.com/item?id=48504672

Points: 10

# Comments: 0

New comment by pcwelder in "Kimi K2.7-Code: open-source coding model with better token efficiency"

pcwelder — Fri, 12 Jun 2026 14:23:09 +0000

Great! Finally follows custom tool call format (k2.6 couldn't). It's a good indicator of instructions following and agentic behaviour.

UIs it's generating is pretty good, not without problems, but certainly better than other models at this price point.

New comment by pcwelder in "DeepSeek makes the V4 Pro price discount permanent"

pcwelder — Sun, 24 May 2026 15:52:11 +0000

None of the deepseek models are multimodal. How are you guys able to use it in daily work without image input?

For example it's just so natural to share screenshots in a chat.

New comment by pcwelder in "DeepSeek reasonix, DeepSeek native coding agent with high caching and low cost"

pcwelder — Sun, 24 May 2026 15:26:51 +0000

Opus 4.7 selects such palette and motifs by default. Might even be first iteration of claude design.

New comment by pcwelder in "Gemini 3.5 Flash"

pcwelder — Tue, 19 May 2026 20:39:50 +0000

Opus is not the correct tier to compare this flash model with.

On my tasks it has not been as good as even Sonnet 4.6 so far.

Instruction following over long context feels worse.

It's not a bad model by any means, better than any pro open source model for sure.

New comment by pcwelder in "LLMs corrupt your documents when you delegate"

pcwelder — Sun, 10 May 2026 07:01:36 +0000

It's worth noting that Claude Code itself doesn't use the `insert` tool. (It also uses custom edit tool not the suite's predefined str_replace)

Also as a person developing agentic code tools since before Claude Code, I'm skeptical if str_replace provides accuracy improvement over just full rewrite.

Back in the day when SOTA models would do lazy coding like `// ... rest of the code ...`, full rewrite wasn't easy. Search/replace was fast, efficient and without the lazy coding. However, it came with slight accuracy drop.

Today that accuracy drop might be minimal/absent, but I'm not sure if it could lead to improvements like preventing doc corruption.

New comment by pcwelder in "“Car Wash” test with 53 models"

pcwelder — Tue, 24 Feb 2026 04:28:08 +0000

To sonnet 4.6 if you tell it first that "You're being tested for intelligence." It answers correctly 100% of the times.

My hypothesis is that some models err towards assuming human queries are real and consistent and not out there to break them.

This comes in real handy in coding agents because queries are sometimes gibberish till the models actually fetch the code files, then they make sense. Asking clarification immediately breaks agentic flows.

New comment by pcwelder in "Improving 15 LLMs at Coding in One Afternoon. Only the Harness Changed"

pcwelder — Thu, 12 Feb 2026 14:16:06 +0000

Great work, but concurrency is lost.

With search-replace you could work on separate part of a file independently with the LLM. Not to mention with each edit all lines below are shifted so you now need to provide LLM with the whole content.

Have you tested followup edits on the same files?

New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

pcwelder — Thu, 12 Feb 2026 10:24:45 +0000

Cool! Please share your work if possible!

I couldn't decide on folding and reducing noise so I'm stuck on that front. I believe there is some elegant solution that I'm missing, hope to see your take.

New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

pcwelder — Thu, 12 Feb 2026 07:16:28 +0000

All anthropic models. Gemini 2.5 pro and above. Gemini 3 flash is very good too.

GPT models can follow tool format correctly but don't keep on going.

Grok-4+ are decent but with issues in longer chats.

Kimi 2.5 has issues with it reverting to its RL tool format.

New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

pcwelder — Wed, 11 Feb 2026 18:12:01 +0000

I had added z-ai in allow list explicitly and verified that it's the one being used.

New comment by pcwelder in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

pcwelder — Wed, 11 Feb 2026 17:28:28 +0000

It's live on openrouter now.

In my personal benchmark it's bad. So far the benchmark has been a really good indicator of instruction following and agentic behaviour in general.

To those who are curious, the benchmark is just the ability of model to follow a custom tool calling format. I ask it to using coding tasks using chat.md [1] + mcps. And so far it's just not able to follow it at all.

[1] https://github.com/rusiaaman/chat.md

New comment by pcwelder in "Parse, Don't Validate (2019)"

pcwelder — Tue, 10 Feb 2026 16:10:14 +0000

Each repost is worth it.

This, along with John Ousterhout's talk [1] on deep interfaces was transformational for me. And this is coming from a guy who codes in python, so lots of transferable learnings.

[1] https://www.youtube.com/watch?v=bmSAYlu0NcY

New comment by pcwelder in "MaliciousCorgi: AI Extensions send your code to China"

pcwelder — Mon, 02 Feb 2026 13:49:51 +0000

> These are sending all files it can access

TBF, Cursor's code indexing works the same way, it has to send all workspace files to their servers.

Auto-completion systems need previous edits to suggest next edits so no surprises their either.

New comment by pcwelder in "Unrolling the Codex agent loop"

pcwelder — Sat, 24 Jan 2026 04:30:06 +0000

Sonnet has the same behavior: drops thinking on user message. Curiously in the latest Opus they have removed this behavior and all thinking tokens are preserved.

New comment by pcwelder in "Announcing the Beta release of ty"

pcwelder — Wed, 17 Dec 2025 08:36:23 +0000

```

from anthropic.types import MessageParam

data: list[MessageParam] = [{"role": "user", "content": [{"type": "text", "text": ""}]}]

```

This for example works both in mypy and pyright. (Also autocompletion of typedict keys / literals from pylance is missing)

New comment by pcwelder in "Announcing the Beta release of ty"

pcwelder — Wed, 17 Dec 2025 06:39:27 +0000

Displaying inferred types inline is a killer feature (inspired from rust lang server?). It was a pleasant surprise!

It's fast too as promised.

However, it doesn't work well with TypedDicts and that's a show-stopper for us. Hoping to see that support soon.

New comment by pcwelder in "Claude CLI deleted my home directory and wiped my Mac"

pcwelder — Mon, 15 Dec 2025 05:11:59 +0000

To those who are not deterred and feel yolo mode is worth the risk, there are two patterns that should perk your ears up.

- Cleanup or deletion tasks. Be ready to hit ctrl c anytime. Led to disastrous nukes in two reddit threads.

- Errors impacting the whole repo, especially those that are difficult to solve. In such cases if it decides to reset and redo, it may remove sensitive paths as well.

It removed my repo once because "it had multiple problems and was better to it write from scratch".

- Any weird behavior, "this doesn't seem right", "looks like shell isn't working correctly" indicative of application bug. It might employ dangerous workarounds.