Hacker News: JoshuaDavid

New comment by JoshuaDavid in "Tell HN: Claude 4.7 is ignoring stop hooks"

JoshuaDavid — Sat, 25 Apr 2026 01:35:13 +0000

You can treat the LLM's answers ass hypotheses about why it did what it did, and test those hypotheses. The hypotheses the LLM comes up with might be better than the ones you come up with, because the LLM has seen a lot more text than you have, and particularly has seen a lot more of its own outputs than you have (e.g. from training to use other instances of itself as subagents).

New comment by JoshuaDavid in "GPT-5.5"

JoshuaDavid — Fri, 24 Apr 2026 00:29:46 +0000

I think if you're vibe coding to the extent that you don't even know the shapes of data your system works with (e.g. the schema if you use a database) you might be outsourcing a bit too much of your thinking.

New comment by JoshuaDavid in "Claude Opus 4.7"

JoshuaDavid — Thu, 16 Apr 2026 18:36:00 +0000

Don't.

The first 500 or so tokens are raw thinking output, then the summarizer kicks in for longer thinking traces. Sometimes longer thinking traces leak through, or the summarizer model (i.e. Claude Haiku) refuses to summarize them and includes a direct quote of the passage which it won't summarize. Summarizer prompt can be viewed [here](https://xcancel.com/lilyofashwood/status/2027812323910353105...), among other places.

New comment by JoshuaDavid in "Anna's Archive loses $322M Spotify piracy case without a fight"

JoshuaDavid — Wed, 15 Apr 2026 18:49:26 +0000

Ideally it is common knowledge that the onion service exists, and then people can go look at the onion service and update Wikipedia based on what they see there.

New comment by JoshuaDavid in "Anna's Archive loses $322M Spotify piracy case without a fight"

JoshuaDavid — Wed, 15 Apr 2026 17:45:22 +0000

Would be fairly easy for them to offer an onion service on which they publish the current list of domains, as one option among many, many options for distributing small strings on the internet in an uncensorable way.

New comment by JoshuaDavid in "Project Glasswing: Securing critical software for the AI era"

JoshuaDavid — Wed, 08 Apr 2026 09:33:41 +0000

Did that happen to a lot of companies during the log4shell fiasco? I'm sure some companies had their permissions misconfigured in a way such that a malicious actor who could execute code on their servers could also drop their database and delete their backups.

New comment by JoshuaDavid in "The Claude Code Leak"

JoshuaDavid — Thu, 02 Apr 2026 07:02:32 +0000

Claude Code's main advantage is that it's the only TOS-compliant way to access subscription Claude tokens, which cost about 10% as much as pay-as-you-go Claude API tokens.

New comment by JoshuaDavid in "Microsoft Set for Worst Quarter Since 2008"

JoshuaDavid — Sat, 28 Mar 2026 18:54:02 +0000

What specific thing(s) are you worried that USD will take a dive relative to?

Then once you have an answer to that question, that might point you towards what you want to be long.

New comment by JoshuaDavid in "Attempts to post the latest Trivy security incident have been marked [dead]"

JoshuaDavid — Sun, 22 Mar 2026 03:23:45 +0000

Pinning the tag will not save you - the tags were force-pushed. The cooldown probably did save you but you should check for the indicators of compromise listed on the security advisory page.

New comment by JoshuaDavid in "Attempts to post the latest Trivy security incident have been marked [dead]"

JoshuaDavid — Sun, 22 Mar 2026 00:37:49 +0000

Noted and sent. Thanks for all your hard work.

New comment by JoshuaDavid in "Delve – Fake Compliance as a Service"

JoshuaDavid — Sat, 21 Mar 2026 22:07:15 +0000

FWIW I think the 30u30 to fraud pipeline is overstated. There are 600 people on the American Forbes 30u30 list every year (it's "30 under 30 each year in each of 20 categories"), with 20ish notable instances of fraud, so maybe a quarter percent of the people on the 30u30 list will later become famous for fraud.

New comment by JoshuaDavid in "Attempts to post the latest Trivy security incident have been marked [dead]"

JoshuaDavid — Sat, 21 Mar 2026 21:49:56 +0000

Trivy (a very widely-used security scanner) was recently compromised. Anyone who installed the aquasecurity/trivy-action dependency by tag rather than by sha during a 3 hour period on March 19 was likely compromised. There is a Github security advisory at https://github.com/aquasecurity/trivy/security/advisories/GH...

6 separate people have tried to submit this to HN. All of the submissions are marked as [dead]. I am unsure whether this is a malicious action taken by the actors who compromised trivy or whether it's just the result of prior spam under github.com/aquasecurity, but regardless it is probably not ideal for security advisories to be auto-marked as [dead].

Attempts to post the latest Trivy security incident have been marked [dead]

JoshuaDavid — Sat, 21 Mar 2026 21:49:56 +0000

Article URL: https://news.ycombinator.com/from?site=github.com%2Faquasecurity

Comments URL: https://news.ycombinator.com/item?id=47471805

Points: 90

# Comments: 23

New comment by JoshuaDavid in "Ask HN: How is AI-assisted coding going for you professionally?"

JoshuaDavid — Mon, 16 Mar 2026 17:12:41 +0000

I've had quite a bit of luck with using AI-assisted tooling for some specific workflows, and very little luck with others. To the extent that there's a trend[^1], it seems to be that tasks where I would spend a lot of time to produce a very small amount of output which is easy to evaluate objectively[^2] are sped up considerably, tasks where I would produce a large amount of output quickly (e.g. boilerplate) are sped up slightly, and most other tasks are unaffected or even slowed down (if I try to use AI tooling for them and decide it's not good enough yet).

As always, my views are my own and do not necessarily reflect the views of my employer.

[^1]: There's less of a trend than I'd expect. There are some quite difficult-to-me tasks that AI nails (e.g. type system puzzles) and some trivial-to-me tasks that AI struggles with (e.g. "draw correct conclusions when an image is uploaded of an ever-so-slightly nonstandard data visualization like a stacked bar chart").

[^2]: My favorite example of this is creating a failing test with a local reproduction of a reported bug on production - sure I _could_ write this myself, but usually these tests are a little bit finicky to write, but once written are either obviously testing the right thing or obviously testing the wrong thing, and the code quality doesn't really matter, so there's not much benefit in having human-written code while there's a substantial benefit in having any tests like this vs not having them.

New comment by JoshuaDavid in "Breaking the spell of vibe coding"

JoshuaDavid — Sun, 15 Feb 2026 08:26:48 +0000

It definitely comes up if you're just reviewing an already-"completed" PR. Even if you're not going to ship AI-generated code to prod (and I think that's a reasonable choice), it's often informative to give a high-level description of what you want to accomplish to a coding agent and see what it does in your codebase. You might find that the AI covered a particular edge case that you would have missed. You might find that even if the PR as a whole is slop.

New comment by JoshuaDavid in "My AI Adoption Journey"

JoshuaDavid — Thu, 05 Feb 2026 21:51:42 +0000

Low hundreds ($190 for me) but yes.

New comment by JoshuaDavid in "Show HN: I'm building an AI-proof writing tool. How would you defeat it?"

JoshuaDavid — Thu, 29 Jan 2026 01:07:43 +0000

The following by "Claude Slopson" (Claude Opus asked to write an answer that was obviously AI) scored 87% authentic:

> Ah, what a fantastic question

> For me, it's Breaking Bad–and honestly? It's not just a show, it's a masterclass in storytelling that fundamentally reshaped the television landscape.

> What keeps drawing me back? The way it seamlessly blends moral complexity with edge-of-your-seat tension is nothing short of breathtaking. Walter White's transformation isn't just compelling–it's a profound meditation on identity, ambition, and the human condition itself.

> But here's the thing–it's also deeply rewatchable. Every frame is meticulously crafted. Every detail matters. The foreshadowing alone is chef's kiss!

> Whether you're a first-time viewer or a seasoned fan, Breaking Bad offers something for everyone. It's a testament to what happens when visionary creators push the boundaries of their medium.

> In an era of endless content, some shows simply transcend. This is one of them.

> 10/10, would recommend! What's YOUR comfort rewatch? Drop it below!

(HN strips the emojis, but don't worry–they were there)

New comment by JoshuaDavid in "Clawdbot Renames to Moltbot"

JoshuaDavid — Wed, 28 Jan 2026 04:25:35 +0000

I doubt you'd need to build and hype your own, just find a popular already-existing one with auto-update where the devs automatically try to solve user-generated tickets and hijack a device machine.

New comment by JoshuaDavid in "Using proxies to hide secrets from Claude Code"

JoshuaDavid — Mon, 19 Jan 2026 05:22:55 +0000

That's how they did "build an AI app" back when the claude.ai coding tool was javascript running in a web worker on the client machine.

New comment by JoshuaDavid in "Ask HN: By what percentage has AI changed your output as a software engineer?"

JoshuaDavid — Sun, 28 Dec 2025 10:28:05 +0000

1.3x when working on a large janky which codebase I am very familiar with, very unevenly distributed.

- Writing new code it's probably 3x or so[1].

- Writing automated tests for reproducible bugs, it's probably 2x or so.

- Fixing those bugs I try every so often but it still seems to be a net negative even for Opus 4.5, so call it 0.95x because I mostly just do it myself.

- Figuring out how to reproduce an undesired behavior that was observed in the wild in a controlled environment is still net negative - call it 0.8x because I keep being tempted by this siren song[2]

- Code review it's hard to say, I definitely am able to give _better_ reviews now than I was able to before, but I don't think I spend significantly less time on them. Call it 1.2x.

- Taking some high-level feature request and figuring which parts of the feature request already exist and are likely to work, which parts should be built, which parts we tried to build 5+ years ago and abandoned due to either issues with the implementation or issues with the idea that only became apparent after we observed actual users using it, and which parts are in tension with other parts of the system: net negative. 0.95x, just from trying again every so often.

- Writing new one-off utility tools for myself and my team: 10x-100x. LLMs are amazing. I can say "I want to see a Gantt chart style breakdown of when jobs in a gitlab pipeline start and finish each step of execution, here's the network log, here's a link to the gitlab api docs, write me a bookmarklet I can click on when I'm viewing a pipeline" and go get coffee and come back and have a bookmarklet[3].

Unfortunately for me, a significant fraction of my tasks are of the form "hey so this weird bug showed up in feature X, and the last employee to work on feature X left 6 years ago, can you figure out what's going on and fix it" or "we want to change Y functionality, what's the level of risk and effort".

-----

[1] This number would be higher, but pre-LLMs I invested quite a bit of effort into tooling to make repetitive boilerplate tasks faster, so that e.g. creating the skeleton of a unit or functional test for a module was 5 keystrokes. There's a large speedup in the tasks that are almost boilerplate, but not quite worth it for me to write my own tooling, counterbalanced by a significant slowdown if some but not all tasks had existing tooling that I have muscle memory for but the LLM agent doesn't.

[2] This feels like the sort of thing that the models should be good at. After all, if I fed in the observed behavior, the relevant logs, and the relevant files, even Sonnet 3.7 was capable of identifying the problem most of the time. The issue is that by the time I've figured out what happened at that level of detail, I usually already know what the issue was.

[3] Ok, it actually took a coffee break plus 3 rounds of debugging over about 30 minutes. Still, it's a very useful little tool and one I probably wouldn't have spent the time building in the before times.