Hacker News: jameswhitford

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 10:51:47 +0000

That is a great suggestion that I am definitely going to look into, thanks!

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 10:49:30 +0000

I hear you

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:41:39 +0000

Cool to hear, what kind of tasks have you been using GLM for? And what other models have you found useful through Ollama?

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:39:54 +0000

I see your point. Just the fact that one model does have vision and one does not might be an interesting point of comparison, however.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:38:17 +0000

This is excellent feedback thank you! These LLMisms in writing are a challenge I am living with currently and trying to improve on. The technical writing industry is taking a huge knock right now with companies demanding more work in less time with a big drop in quality, day to day I get less and less time to work on the quality in the prose of my work. We are working at the frontier of this right now, so we are the most heavily effected, but also get to experiment with the changes first which can be both stimulating and very frustrating.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:30:29 +0000

Hi, author here, can you link? I would love to read about this.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:29:53 +0000

Yes I agree 100%. My next guide would do better to use identical harnesses.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:27:51 +0000

GLM 5.2 is text only, not multi modal. And Opus is multi modal.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:27:27 +0000

Hi, author here, I cannot give an exact number for how many token the verification step took, but the verification GLM 5.2 ran was very stupid and definitely a waste of time. It read the pixel color data to try and verify the scene rendered properly. Which is really bad. Opus opened the game in a Playwright browser and took screenshots to verify the actual image. Which helped a lot.

Pro tip: You could use a multi-modal model to verify images as a subagent spawned by GLM 5.2, to get around this issue.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:23:05 +0000

Yes I 100% agree. Time-taken can be improved (with harnesses, subagent workflows etc.) and varies based on task.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:20:33 +0000

Yes, part of the reason I chose the one-shot test was really to test long-running tasks. A lot of people seem to be experimenting with this format, for example in the now trending loop-writing workflows. And really I am interested in diving into the murky waters of these novel workflows.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 09:18:07 +0000

I appreciate the feedback!

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 07:52:54 +0000

Yes this is true. This test was run on a $20 pro Claude subscription. I would definitely love to try use both models on the highest plans for a whole month and compare the two, great format for a future head-to-head comparison.

New comment by jameswhitford in "GLM 5.2 vs. Opus"

jameswhitford — Mon, 22 Jun 2026 07:46:53 +0000

Hi, I am the author, I completely agree! I set out to run a vibe test on this one, not a benchmark, the real benchmarks are listed. My test shows what the models can do when both tasked with a long-running, technically difficult, one-shot task.

I think your test you describe (collaborative, task delegation, task completion, TTD, steerability) is a great format for a future test that I will definitely try out.

New comment by jameswhitford in "Claude is skeptical about OpenClaw"

jameswhitford — Sun, 19 Apr 2026 08:25:24 +0000

I asked Claude Code to research Openclaw. It spawned a subagent, got back detailed results, and then flagged them as unreliable and/or hallucinated before I could read them.

TL;DR:

Claude isn't trained on openclaw data due to its knowledge cutoff, but this is the first time I have been asked to look at research myself to verify it isn't hallucinated or unreliable.

I am not making any claims about Anthropic training their models to perform worse when dealing with information about competitors...

But I am worried about this behaviour of flagging certain sources as unreliable for what seem like arbitrary reasons.

It could also be a case of prompt poisoning at one of the research URLs.

Claude is skeptical about OpenClaw

jameswhitford — Sun, 19 Apr 2026 08:25:24 +0000

Article URL: https://wecreatethis.com/blog/post?slug=claude-is-skeptical-about-openclaw

Comments URL: https://news.ycombinator.com/item?id=47822692

Points: 2

# Comments: 2

New comment by jameswhitford in "The Case That A.I. Is Thinking"

jameswhitford — Tue, 04 Nov 2025 07:09:08 +0000

Who would not want to say their product is the second coming of Christ if they could.

New comment by jameswhitford in "The Case That A.I. Is Thinking"

jameswhitford — Tue, 04 Nov 2025 06:58:18 +0000

This submarine isn’t swimming, it’s us that are submarining!

I think I hear my master’s voice..

Or is that just a fly trapped in a bottle?

New comment by jameswhitford in "How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference"

jameswhitford — Tue, 22 Jul 2025 14:09:35 +0000

It's a demo project using the free tier hardware from Cerebrum, demonstrating how to migrate with a few lines of code from OpenAI. The cost is never going to beat OpenAI on an A10, there are more powerful options available.

New comment by jameswhitford in "How to Migrate from OpenAI to Cerebrium for Cost-Predictable AI Inference"

jameswhitford — Tue, 22 Jul 2025 10:05:32 +0000

Serverless setups (like Cerebrium) charge per second the model is running, its not token based.