Hacker News: t1amat

New comment by t1amat in "OpenCode – Open source AI coding agent"

t1amat — Sat, 21 Mar 2026 11:54:44 +0000

Claude Code is closed source so this isn’t a concern they should have as Opus is great at Rust.

New comment by t1amat in "Show HN: Badge that shows how well your codebase fits in an LLM's context window"

t1amat — Fri, 27 Feb 2026 18:41:07 +0000

Interesting idea, but I think it might have made more sense to use something like repomix to generate the source bundle and tiktoken’d that. Practically speaking you don’t send many source files in raw text form, either they have some sort of file wrapper with metadata or are pulled in from a tool call where the tool call arguments act as the metadata.

New comment by t1amat in "Qwen3.5: Towards Native Multimodal Agents"

t1amat — Mon, 16 Feb 2026 19:23:58 +0000

When most people refer to “GLM” they refer to the mainline model. The difference in scale between GLM 5 and GLM 4.7 Flash is enormous: one runs on acceptably on a phone, the other on $100k+ hardware minimum. While GLM 4.7 Flash is a gift to the local LLM crowd, it is nowhere near as capable as its bigger sibling in use cases beyond typical chat.

New comment by t1amat in "Hoot: Scheme on WebAssembly"

t1amat — Sat, 07 Feb 2026 20:08:02 +0000

Perhaps the opposite: a language small enough that its entirety can easily be stuffed in context.

New comment by t1amat in "TimeCapsuleLLM: LLM trained only on data from 1800-1875"

t1amat — Mon, 12 Jan 2026 16:29:35 +0000

Not a direct answer but it looks like v0.5 is a nanoGPT arch and v1 is a Phi 1.5 arch, which should be well supported by quanting utilities for any engine. They are small too and should be able to be done on a potato.

New comment by t1amat in "MiniMax M2.1: Built for Real-World Complex Tasks, Multi-Language Programming"

t1amat — Fri, 26 Dec 2025 16:02:56 +0000

With M2, yes - I’ve used it in Claude Code (e.g. native tool calling), Roo/Cline (e.g. custom tool parsing), etc. It’s quite good and for some time the best model to self-host. At 4bit it can fit on 2x RTX 6000 Pro (e.g. ~200GB VRAM) with about 400k context at fp8 kv cache. It’s very fast due to low active params, stable at long context, quite capable in any agent harness (its training specialty). M2.1 should be a good bump beyond M2, which was undertrained relative to even much smaller models.

New comment by t1amat in "Microsoft blocks Israel’s use of its tech in mass surveillance of Palestinians"

t1amat — Thu, 25 Sep 2025 17:12:06 +0000

You might have 1A rights as an American but it seems to me the manner in which this person protested would be grounds for termination in many jurisdictions.

New comment by t1amat in "I regret building this $3000 Pi AI cluster"

t1amat — Fri, 19 Sep 2025 18:13:36 +0000

This is the right take. You might be able to get decent (2-3x less than a GPU rig) token generation, which is adequate, but your prompt processing speeds are more like 50-100x slower. A hardware solution is needed to make long context actually usable on a Mac.

New comment by t1amat in "GPT-5 for Developers"

t1amat — Fri, 08 Aug 2025 05:40:52 +0000

The problem with OpenAI models is the lack of a Max-like subscription for a good agentic harness. Maybe OpenAI or Microsoft could fix this.

I just went through the agony of provisioning my team with new Claude Code 5x subs 2 weeks ago after reviewing all of the options available at that time. Since then, the major changes include a Cerebras sub for Qwen3 Coder 480B, and now GPT-5. I’m still not sure I made the right choice, but hey, I’m not married to it either.

If you plan on using this much at all then the primary thing to avoid is API-based pay per use. It’s prohibitively costly to use regularly. And even for less important changes it never feels appropriate to use a lower quality model when the product counts.

Claude Code won primarily because of the sub and that they have a top tier agentic harness and models that know how to use it. Opus and Sonnet are fantastic agents and very good at our use case, and were our preferred API-based models anyways. We can use Claude Code basically all day with at least Sonnet after using our Opus limits up. Worth nothing that Cline built a Claude Code provider that the derivatives aped which is great but I’ve found Claude Code to be as good or better anyways. The CLI interface is actually a bonus for ease of sharing state via copy/paste.

I’ll probably change over to Gemini Code Assist next, as it’s half the price and more context length, but I’m waiting for a better Gemini 2.5 Pro and the gemini-cli/Code Assist extensions to have first party planning support, which you can get some form of third party through custom extensions with the cli, but as an agent harness they are incomplete without.

The Cerebras + Qwen3 Coder 480B with qwen3-cli is seriously tempting. Crazy generation speed. Theres some question about how long big the rate limit really is but it’s half the cost of Claude Code 5x. I haven’t checked but I know qwen3-cli, which was introduced along side the model, is a fork of gemini-cli with Qwen-focused updates; wonder if they landed a planning tool?

I don’t really consider Cursor, Windsurf, Cline, Roo, Kilo et al as they can’t provide a flat rate service with the kind of rate limits you can get with the aforementioned.

GitHub Copilot could be a great offering if they were willing to really compete with a good unlimited premium plan but so far their best offering has less premium requests than I make in a week, possibly even in a few days.

Would love to hear if I missed anything, or somehow missed some dynamic here worth considering. But as far as I can tell, given heavy use, you only have 3 options today: Claude Max, Gemini Code Assist, Cerebras Code.

New comment by t1amat in "GPT-5 for Developers"

t1amat — Fri, 08 Aug 2025 04:52:29 +0000

Is this actually true? Last I checked (a week ago?) Codex the agents were free at some tiers in a preview capacity (with future rate limits based on tier), but codex cli was not. With codex cli you can log in but the purpose of that is to link it to an API key where you pay per use. The sub tiers give one time credits you would burn through quickly.

New comment by t1amat in "Apple Intelligence Foundation Language Models Tech Report 2025"

t1amat — Fri, 18 Jul 2025 01:53:23 +0000

I doubt this is true anymore, if ever. Both require string escaping, which is the real hurdle. And they are heavily trained on JSON for tool calling.

New comment by t1amat in "Kimi K2"

t1amat — Fri, 11 Jul 2025 17:13:21 +0000

With 32B active parameters it would be ridiculously slow at generation.

New comment by t1amat in "MCP-B: A Protocol for AI Browser Automation"

t1amat — Thu, 10 Jul 2025 01:48:04 +0000

The user should be able to enable/disable tools or an entire tab’s toolset. Some keep open hundreds of tabs and that’s simply too many potential tools to expose. Deduping doesn’t make sense for the reasons you say, and that one logical task could lead to a series of operations missequenced across a range of tabs.

New comment by t1amat in "Data on AI-related Show HN posts"

t1amat — Mon, 07 Jul 2025 01:57:42 +0000

Your filter doesn’t seem to be working properly right now.

New comment by t1amat in "LM Studio is now an MCP Host"

t1amat — Thu, 26 Jun 2025 00:22:35 +0000

(Replying to both siblings questioning this)

If the primary use case is input heavy, which is true of agentic tools, there’s a world where partial GPU offload with many channels of DDR5 system RAM leads to an overall better experience. A good GPU will process input many times faster, and with good RAM you might end up with decent output speed still. Seems like that would come in close to $12k?

And there would be no competition for models that do fit entirely inside that VRAM, for example Qwen3 32B.

New comment by t1amat in "MCP in LM Studio"

t1amat — Thu, 26 Jun 2025 00:08:26 +0000

The UI is the product. If you just want the engine, use mlx-omni-server (for MLX) or llama-swap (for GGUF) and huggingface-cli (for model downloads).

New comment by t1amat in "MCP in LM Studio"

t1amat — Thu, 26 Jun 2025 00:04:44 +0000

I would recommend Qwen3 30B A3B for you. The MLX 4bit DWQ quants are fantastic.

New comment by t1amat in "MCP in LM Studio"

t1amat — Thu, 26 Jun 2025 00:03:47 +0000

Gemma3 models can follow instructions but were not trained to call tools, which is the backbone of MCP support. You would likely have a better experience with models from the Qwen3 family.

New comment by t1amat in "Gemini in Chrome"

t1amat — Tue, 03 Jun 2025 21:24:26 +0000

Not a lawyer but the timing of this seems poor when the govt is deciding on whether Google should spin off Chrome or not.

New comment by t1amat in "xAI's Grok 3 comes to Microsoft Azure"

t1amat — Mon, 19 May 2025 20:44:00 +0000

Llama is arguably the reason open weight LLM’s are a thing, with the leak of Llama 1 and subsequent release of Llama 2. Llama 3 was a huge push for quality, size, context length, and multi-modality. Llama 4 Maverick is clearly better than it looks if a fine tune can put it at the top of LMArena human preferences leaderboard.

Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.