Hacker News: Benjammer

New comment by Benjammer in "Slop is not necessarily the future"

Benjammer — Tue, 31 Mar 2026 18:45:15 +0000

> the quality really does matter.

If this level of quality/rigor does matter for something like a game, do you think the market will enforce this? If low rigor leads to a poor product, won't it sell less than a good product in this market? Shouldn't the market just naturally weed out the AI slop over time, assuming it's true that "quality really does matter"?

Or were you thinking about "matter" in some other sense than business/product success?

New comment by Benjammer in "Over 40% of deceased drivers in vehicle crashes test positive for THC: Study"

Benjammer — Sat, 20 Dec 2025 17:32:45 +0000

>Since COVID in CA, it feels like driving has become far more dangerous with much more lawlessness regarding excessive speeding and running red lights, going into the left lane to turn right in front of stopped cars, all sorts of weird things

NYC has had the same effect since COVID, and over the last year or two it's gotten to the point where every single light at every busy intersection in Manhattan you get 2-3 cars speeding through the red light right after it turns. I bike ride a lot so I'm looking around at drivers a lot, and for the most part the crazy drivers seem to be private citizens in personal cars, not Uber or commercial/industrial drivers.

New comment by Benjammer in "Gemini 3 Pro: the frontier of vision AI"

Benjammer — Tue, 09 Dec 2025 03:44:12 +0000

>It is the loose equivalent of asking why are you getting hung up on the type of a variable in a programming language? A float or a string? Who cares if it works?

No, it's not. This is like me saying "string and float are two types of variables" and you going "what is a 'type' even??? Bertrand Russell said some bullshit and that means I'm right and you suck!"

New comment by Benjammer in "Gemini 3 Pro: the frontier of vision AI"

Benjammer — Sat, 06 Dec 2025 00:09:01 +0000

Wind and sunshine are both types of weather, what are you talking about?

New comment by Benjammer in "Gemini 3 Pro: the frontier of vision AI"

Benjammer — Fri, 05 Dec 2025 22:29:25 +0000

>They belong in different categories

Categories of _what_, exactly? What word would you use to describe this "kind" of which LLMs and humans are two very different "categories"? I simply chose the word "cognition". I think you're getting hung up on semantics here a bit more than is reasonable.

New comment by Benjammer in "Gemini 3 Pro: the frontier of vision AI"

Benjammer — Fri, 05 Dec 2025 22:26:42 +0000

So the idea is what? What's the successful outcome look like for this test, in your mind? What should good software do? Respond and say there are 5 legs? Or question what kind of dog this even is? Or get confused by a nonsensical picture that doesn't quite match the prompt in a confusing way? Should it understand the concept of a dog and be able to tell you that this isn't a real dog?

New comment by Benjammer in "Gemini 3 Pro: the frontier of vision AI"

Benjammer — Fri, 05 Dec 2025 21:22:52 +0000

It always feels to me like these types of tests are being somewhat intentionally ignorant of how LLM cognition differs from human cognition. To me, they don't really "prove" or "show" anything other than simply - LLMs thinking works different than human thinking.

I'm always curious if these tests have comprehensive prompts that inform the model about what's going on properly, or if they're designed to "trick" the LLM in a very human-cognition-centric flavor of "trick".

Does the test instruction prompt tell it that it should be interpreting the image very, very literally, and that it should attempt to discard all previous knowledge of the subject before making its assessment of the question, etc.? Does it tell the model that some inputs may be designed to "trick" its reasoning, and to watch out for that specifically?

More specifically, what is a successful outcome here to you? Simply returning the answer "5" with no other info, or back-and-forth, or anything else in the output context? What is your idea of the LLMs internal world-model in this case? Do you want it to successfully infer that you are being deceitful? Should it respond directly to the deceit? Should it take the deceit in "good faith" and operate as if that's the new reality? Something in between? To me, all of this is very unclear in terms of LLM prompting, it feels like there's tons of very human-like subtext involved and you're trying to show that LLMs can't handle subtext/deceit and then generalizing that to say LLMs have low cognitive abilities in a general sense? This doesn't seem like particularly useful or productive analysis to me, so I'm curious what the goal of these "tests" are for the people who write/perform/post them?

New comment by Benjammer in "Interview with RollerCoaster Tycoon's Creator, Chris Sawyer (2024)"

Benjammer — Wed, 03 Dec 2025 17:25:50 +0000

amusement park --> park amusement... Is that the joke?

New comment by Benjammer in "ChatGPT – Truth over comfort instruction set"

Benjammer — Wed, 15 Oct 2025 18:41:25 +0000

I mean ok, but it's all just prompting on top of the same base model weights...

I tried the same prompt, and I simply added to the end of it "Prioritize truth over comfort" and got a very similar response to the "improved" answer in the article: https://chatgpt.com/share/68efea3d-2e88-8011-b964-243002db34...

This is sort of a "Prompting 101" level concept - indicate clearly the tone of the reply that you'd like. I disagree that this belongs in a system prompt or default user preferences, and even if you want to put it in yours, you don't need this long preamble as if you're "teaching" the model how the world works - it's just hints to give it the right tone, you can get the same results with just three words in your raw prompt.

New comment by Benjammer in "A staff engineer's journey with Claude Code"

Benjammer — Wed, 03 Sep 2025 00:15:15 +0000

I'm not sure how much experience you have, I'm not trying to make assumptions, but I've been working in software over 15 years. The exact skill you mentioned - can visualize the plan for a change quickly - is what makes my LLM usage so powerful, imo.

I can say the right precise wording in my prompt to guide it to a good plan very quickly. As the other commenter mentioned, the entire above process only takes something like 30-120 minutes depending on scope, and then I can generate code in a few minutes that would take 2-6 weeks to write myself, working 8 hr days. Then, it takes something like 0.5-1.5 days to work out all the bugs and clean up the weird AI quirks and maybe have the LLM write some playwright tests or whatever testing framework you use for integration tests to verify it's own work.

So yes, it takes significant time to plan things well for good results, and yes the results are often sloppy in some parts and have weird quirks that no human engineer would make on purpose, but if you stick to working on prompt/context engineering and getting better and faster at the above process, the key unlock is not that it just does the same coding for you, with it generating the code instead. It's that you can work as a solo developer at the abstraction level of a small startup company. I can design and implement an enterprise grade SSO auth system over a weekend that integrates with Okta and passes security testing. I can take a library written in one language and fully re-implement it in another language in a matter of hours. I recently took the native libraries for Android and iOS for a fairly large, non-trivial SDK, and had Claude build me a React Native wrapper library with native modules that integrates both natives libraries and presents a clean, unified interface and typescript types to the react native layer. This took me about two days, plus one more for validation testing. I have never done this before. I have no idea how "Nitro Modules" works, or how to configure a react native library from scratch. But given the immense scaffolding abilities of LLMs, plus my debugging/hacking skills, I can get to a really confident place, really quickly and ship production code at work with this process, regularly.

New comment by Benjammer in "A staff engineer's journey with Claude Code"

Benjammer — Tue, 02 Sep 2025 21:42:32 +0000

My method is that I work together with the LLM to figure out the step-by-step plan.

I give an outline of what I want to do, and give some breadcrumbs for any relevant existing files that are related in some way, ask it to figure out context for my change and to write up a summary of the full scope of the change we're making, including an index of file paths to all relevant files with a very concise blurb about what each file does/contains, and then also to produce a step-by-step plan at the end. I generally always have to tell it to NOT think about this like a traditional engineering team plan, this is a senior engineer and LLM code agent working together, think only about technical architecture, otherwise you get "phase 1 (1-2 weeks), phase 2 (2-4 weeks), step a (4-8 hours)" sort of nonsense timelines in your plan. Then I review the steps myself to make sure they are coherent and make sense, and I poke and prod the LLM to fix anything that seems weird, either fixing context or directions or whatever. Then I feed the entire document to another clean context window (or two or three) and ask it to "evaluate this plan for cohesiveness and coherency, tell me if it's ready for engineering or if there's anything underspecified or unclear" and iterate on that like 1-3 times until I run a fresh context window and it says "This plan looks great, it's well crafted, organized, etc...." and doesn't give feedback. Then I go to a fresh context window and tell it "Review the document @MY_PLAN.md thoroughly and begin implementation of step 1, stop after step 1 before doing step 2" and I start working through the steps with it.

New comment by Benjammer in "LLMs and coding agents are a security nightmare"

Benjammer — Mon, 18 Aug 2025 18:37:27 +0000

Are you unaware of the concept of a junior engineer working in a company? You realize that not all human code is written by someone with domain expertise, right?

Are you aware that your wording here is implying that you are describing a unique issue with AI code that is not present in human code?

>What would have happened if someone without your domain expertise wasn't reviewing every line and making the changes you mentioned?

So, we're talking about two variables, so four states: human-reviewed, human-not-reviewed, ai-reviewed, ai-not-reviewed.

[non ai]

*human-reviewed*: Humans write code, sometimes humans make mistakes, so we have other humans review the code for things like critical security issues

*human-not-reviewed*: Maybe this is a project with a solo developer and automated testing, but otherwise this seems like a pretty bad idea, right? This is the classic version of "YOLO to production", right?

[with ai]

*ai-reviewed*: AI generates code, sometimes AI hallucinates or gets things very wrong or over-engineers things, so we have humans review all the code for things like critical security issues

*ai-not-reviewed*: AI generates code, YOLO to prod, no human reads it - obviously this is terrible and barely works even for hobby projects with a solo developer and no stakes involved

I'm wondering if the disconnect here is that actual professional programmers are just implicitly talking about going from [human-reviewed] to [ai-reviewed], assuming nobody in their right mind would just _skip code reviews_. The median professional software team would never build software without code reviews, imo.

But are you thinking about this as going from [human-reviewed] straight to [ai-not-reviewed]? Or are you thinking about [human-not-reviewed] code for some reason? I guess it's not clear why you immediately latch onto the problems with [ai-not-reviewed] and seem to refuse to acknowledge the validity of the state [ai-reviewed] as being something that's possible?

It's just really unclear why you are jumping straight to concerns like this without any nuance for how the existing industry works regarding similar problems before we used AI at all.

New comment by Benjammer in "LLMs and coding agents are a security nightmare"

Benjammer — Mon, 18 Aug 2025 15:13:50 +0000

Why is the "threshold" argument never the first thing mentioned? Do you not understand what I'm saying here? Can you explain why the "code slop" argument is _always_ the first thing that people mention, without discussing this threshold?

Every post like this has a tone like they are describing a new phenomenon caused by AI, but it's just a normal professional code quality problem that has always existed.

Consider the difference between these two:

1. AI allows programmers to write sloppy code and commit things without fully checking/testing their code

2. AI greatly increases the speed at which code can be generated, but doesn't nearly improve as much the speed of reviewing code, so we're making software harder to verify

The second is a more accurate picture of what's happening, but comes off much less sensational in a social media post. When people post the 1st example, I discredit them immediately for trying to fear-monger and bait engagement rather than discussing the real problems with AI programming and how to prevent/solve them.

New comment by Benjammer in "LLMs and coding agents are a security nightmare"

Benjammer — Mon, 18 Aug 2025 12:55:31 +0000

This is the common refrain from the anti-AI crowd, they start by talking about an entire class of problems that already exist in humans-only software engineering, without any context or caveats. And then, when someone points out these problems exist with humans too, they move the goalposts and make it about the "volume" of code and how AI is taking us across some threshold where everything will fall apart.

The telling thing is they never mention this "threshold" in the first place, it's only a response to being called on the bullshit.

New comment by Benjammer in "Search all text in New York City"

Benjammer — Wed, 13 Aug 2025 02:39:33 +0000

I found "intertwining" with a score of 3 also. Two instances of the word on the same sign and then a false positive third pic.

New comment by Benjammer in "Nobody knows how to build with AI yet"

Benjammer — Sat, 19 Jul 2025 21:13:49 +0000

Why are engineers so obstinate about this stuff? You really need a GUI built for you in order to do this? You can't take the time to just type up this instruction to the LLM? Do you realize that's possible? You can just write instructions "Don't modify XYZ.ts file under any circumstances". Not to mention all the tools have simple hotkeys to dismiss changes for an entire file with the press of a button if you really want to ignore changes to a file or whatever. In Cursor you can literally select a block of text and press a hotkey to "highlight" that code to the LLM in the chat, and you could absolutely tell it "READ BUT DON'T TOUCH THIS CODE" or something, directly tied to specific lines of code, literally the feature you are describing. BUT, you have to work with the LLM and tooling, it's not just going to be a button for you or something.

You can also literally do exactly what you said with "going a step further".

Open Claude Code, run `/init`. Download Superwhisper, open a new file at project root called BRAIN_DUMP.md, put your cursor in the file, activate Superwhisper, talk in stream of consciousness-style about all the parts of the code and your own confidence level, with any details you want to include. Go to your LLM chat, tell it to "Read file @BRAIN_DUMP.md" and organize all the contents into your own new file CODE_CONFIDENCE.md. Tell it to list the parts of the code base and give it's best assessment of the developer's confidence in that part of the code, given the details and tone in the brain dump for each part. Delete the brain dump file if you want. Now you literally have what you asked for, an "index" of sorts for your LLM that tells it the parts of the codebase and developer confidence/stability/etc. Now you can just refer to that file in your project prompting.

Please, everyone, for the love of god, just start prompting. Instead of posting on hacker news or reddit about your skepticism, literally talk to the LLM about it and ask it questions, it can help you work through almost any of this stuff people rant about.

New comment by Benjammer in "Nobody knows how to build with AI yet"

Benjammer — Sat, 19 Jul 2025 17:07:09 +0000

Are you paying for the higher end models? Do you have proper system prompts and guidance in place for proper prompt engineering? Have you started to practice any auxiliary forms of context engineering?

This isn't a magic code genie, it's a very complicated and very powerful new tool that you need to practice using over time in order to get good results from.

New comment by Benjammer in "Claude for Financial Services"

Benjammer — Wed, 16 Jul 2025 13:41:39 +0000

This isn't a financial model, they aren't selling the system itself, it's all tooling for data access and financial modeling. It's like they're setting up an OTB, not like they're selling you a system to pick winning horses at the track.

New comment by Benjammer in "Curtis Yarvin's Plot Against America"

Benjammer — Wed, 04 Jun 2025 19:47:01 +0000

The fact that Thiel backs him so hard is what worries me more than anything. Thiel has a way of making things happen when he's really committed to something on a personal level... (see the Gawker Media case)

New comment by Benjammer in "LLMs are more persuasive than incentivized human persuaders"

Benjammer — Sun, 18 May 2025 02:37:12 +0000

This kind of “hair splitting” is the foundation on current prompt engineering though…