Hacker News: Jweb_Guru

New comment by Jweb_Guru in "Roblox shares plummet 18% as child safety measures weigh on bookings"

Jweb_Guru — Sat, 02 May 2026 19:06:48 +0000

God forbid people want to work on video game stuff instead of for an advertising company.

New comment by Jweb_Guru in "Anonymous request-token comparisons from Opus 4.6 and Opus 4.7"

Jweb_Guru — Sat, 18 Apr 2026 20:20:56 +0000

For some reason people are perfectly able to understand this in the context of, say, cursive, calculator use, etc., but when it comes to their own skillset somehow it's going to be really different.

New comment by Jweb_Guru in "Anonymous request-token comparisons from Opus 4.6 and Opus 4.7"

Jweb_Guru — Sat, 18 Apr 2026 20:18:03 +0000

No, it hasn't. I did not have a problem before AI with people sending in gigantic pull requests that made absolutely no sense, and justifying them with generated responses that they clearly did not understand. This is not a thing that used to happen. That's not to say people wouldn't have done it if it were possible, but there was a barrier to submitting a pull request that no longer exists.

New comment by Jweb_Guru in "Measuring Claude 4.7's tokenizer costs"

Jweb_Guru — Sat, 18 Apr 2026 03:43:47 +0000

I'm mostly surprised that people found the output quality of Opus 4.6 good enough... 4.7 so far is a pretty sizable improvement for the stuff I care about. I don't really care how cheap 4.6 was per task when 90% of the tasks weren't actually being done correctly. Or maybe it's that people like the LLM agreeing with them blindly while sneakily doing something else under the hood? Did people enjoy Claude routinely disregarding their instructions? Not really sure I understand, I truly found 4.6 immensely frustrating (from the getgo, not just the "pre-nerf" version, whatever that means). 4.7 is a buggy mess, it's slow, and it costs a lot per token. It's also a huge breath of fresh air because it actually seems to make a good faith effort at doing the thing you asked it to do, and doesn't waste your time with irrelevant nonsense just to make it look busy or because it thinks you want that nonsense (I mean, it still does all of these things to some extent, but so far it seems like it does them much less than 4.6 did).

Disclaimer: I'm always running on max and don't really have token limits so I am in a position not to care about cost per token. But I am not surprised by the improved benchmark results at all, 4.6 was really not nearly as strong of a model as people seem to remember it being.

New comment by Jweb_Guru in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

Jweb_Guru — Tue, 07 Apr 2026 08:10:48 +0000

Yup. Every single time it's about to do the dumbest thing I've seen in my life.

New comment by Jweb_Guru in "LLMs work best when the user defines their acceptance criteria first"

Jweb_Guru — Sat, 07 Mar 2026 17:30:25 +0000

You may have had one. It clearly made a pretty negative impression on you because you are still complaining about them years later. I find it pretty misanthropic when people ascribe this kind of antisocial behavior to all of their coworkers.

New comment by Jweb_Guru in "LLMs work best when the user defines their acceptance criteria first"

Jweb_Guru — Sat, 07 Mar 2026 17:15:57 +0000

In the long run, good code makes everyone much happier than code that is bad because people are being "nice" and letting things slide in code review to avoid confrontation.

New comment by Jweb_Guru in "LLMs work best when the user defines their acceptance criteria first"

Jweb_Guru — Sat, 07 Mar 2026 06:51:30 +0000

It's not reality. I'm really not a fan of the way that people excuse the really terrible code LLMs write by claiming that people write code just as bad. Even if that were true, it is not true that when you ask those people to do otherwise they simply pretend to have done it and forget you asked later.

New comment by Jweb_Guru in "Claude's Cycles [pdf]"

Jweb_Guru — Tue, 03 Mar 2026 16:04:54 +0000

I assure you that LLM thinking also has a speed limit.

New comment by Jweb_Guru in "A16z partner says that the theory that we’ll vibe code everything is wrong"

Jweb_Guru — Wed, 25 Feb 2026 15:48:41 +0000

My point is that a lot of people think it'd be really easy to build the next Salesforce until they actually try to compete with Salesforce in the market. Like it or not, if you want to build a Salesforce competitor (or try to get your company to build its own) you're going to be compared to actual Salesforce, not the version of Salesforce that existed when the market was new.

New comment by Jweb_Guru in "A16z partner says that the theory that we’ll vibe code everything is wrong"

Jweb_Guru — Sun, 22 Feb 2026 16:05:16 +0000

Salesforce literally has its own query optimizer, you are vastly underestimating the complexity of its software.

New comment by Jweb_Guru in "How I use Claude Code: Separation of planning and execution"

Jweb_Guru — Sun, 22 Feb 2026 08:28:36 +0000

> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms

I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.

New comment by Jweb_Guru in "Asahi Linux Progress Report: Linux 6.19"

Jweb_Guru — Thu, 19 Feb 2026 12:34:18 +0000

This comment expresses how it feels to work in a corporate environment better than anything I've ever seen on this site.

New comment by Jweb_Guru in "What every compiler writer should know about programmers (2015) [pdf]"

Jweb_Guru — Tue, 17 Feb 2026 06:56:29 +0000

It's ironic that I have to tell you of all people this, but many users of C (or at least, backends of compilers targeted by C) do actually want the compiler to aggressively optimize around UB.

New comment by Jweb_Guru in "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

Jweb_Guru — Mon, 16 Feb 2026 18:56:16 +0000

Yeah people are always like "these are just trick questions!" as though the correct mode of use for an LLM is quizzing it on things where the answer is already available. Where LLMs have the greatest potential to steer you wrong is when you ask something where the answer is not obvious, the question might be ill-formed, or the user is incorrectly convinced that something should be possible (or easy) when it isn't. Such cases have a lot more in common with these "nonsensical riddles" than they do with any possible frontier benchmark.

This is especially obvious when viewing the reasoning trace for models like Claude, which often spends a lot of time speculating about the user's "hints" and trying to parse out the intent of the user in asking the question. Essentially, the model I use for LLMs these days is to treat them as very good "test takers" which have limited open book access to a large swathe of the internet. They are trying to ace the test by any means necessary and love to take shortcuts to get there that don't require actual "reasoning" (which burns tokens and increases the context window, decreasing accuracy overall). For example, when asked to read a full paper, focusing on the implications for some particular problem, Claude agents will try to cheat by skimming until they get to a section that feels relevant, then searching directly for some words they read in that section. They will do this even if told explicitly that they must read the whole paper. I assume this is because the vast majority of the time, for the kinds of questions that they are trained on, this sort of behavior maximizes their reward function (though I'm sure I'm getting lots of details wrong about the way frontier models are trained, I find it very unlikely that the kinds of prompts that these agents get very closely resemble data found in the wild on the internet pre-LLMs).

New comment by Jweb_Guru in "Claude Code's new hidden feature: Swarms"

Jweb_Guru — Sat, 24 Jan 2026 18:16:11 +0000

It affects it very heavily IME. People need to make sure they are getting a good mix of writing from other sources.

New comment by Jweb_Guru in "Letting Claude play text adventures"

Jweb_Guru — Thu, 22 Jan 2026 03:38:44 +0000

Yeah, I do not find performances like this very impressive.

New comment by Jweb_Guru in "Lightpanda migrate DOM implementation to Zig"

Jweb_Guru — Thu, 15 Jan 2026 14:54:55 +0000

Believe it or not, using arenas does not provide free memory safety. You need to statically bound allocations to make sure they don't escape the arena (which is exactly how arenas work in Rust, but not Zig). There are also quite a lot of ways of generating memory unsafe code that aren't just use after free or array-out-of-bounds in a language like Zig, especially in the context of stuff like DOM nodes where one frequently needs to swap out pointers between elements of one type and a different type.

New comment by Jweb_Guru in "Lightpanda migrate DOM implementation to Zig"

Jweb_Guru — Mon, 12 Jan 2026 14:37:01 +0000

Respectfully, for browser-based work, simplicity is absolutely not a good enough reason to use a memory-unsafe language. Your claim that Zig is in some way safer than Rust for something like this is flat out untrue.

New comment by Jweb_Guru in "“Erdos problem #728 was solved more or less autonomously by AI”"

Jweb_Guru — Sat, 10 Jan 2026 17:03:46 +0000

Yeah people dramatically overestimate the difficulty of getting one's definitions correct for most problems, especially when you are doing an end to end proof rather than just axiomatizing some system. They are still worth looking at carefully, especially for AI-generated proofs where you don't get the immediate feedback that you do as a human when something you expect to be hard goes through easily, but contrary to what seems to be popular belief here they are generally much easier to verify than the corresponding proof (in the case of formally verified software, the corresponding analogy is verifying that the spec is what you want vs. verifying that the program matches the spec; the former is generally much easier).