Hacker News: generallyjosh

New comment by generallyjosh in "Things I've Done with AI"

generallyjosh — Tue, 10 Mar 2026 23:47:58 +0000

Did it make any mistakes on your taxes?

Personally, I know coding pretty well. So when I'm using it for coding, I can spot most of its mistakes / misunderstandings

I would not trust using it on a complex domain I'm not super familiar with, like doing taxes

A mistake here is pretty high cost (getting audited, and/or having to pay a bunch in penalties)

New comment by generallyjosh in "This time is different"

generallyjosh — Tue, 03 Mar 2026 03:32:31 +0000

Well, you said:

The potential of the current crop of LLM/AIs will stop at being a very powerful tool to search large volumes of text using free-form questions.

I do think that pretty clearly contradicts with what a lot of people who make/use LLM models are saying haha

New comment by generallyjosh in "Cognitive Debt: When Velocity Exceeds Comprehension"

generallyjosh — Tue, 03 Mar 2026 03:21:31 +0000

Hey you can try it if you like. That's one of the beauties of the current moment, nobody REALLY knows what works best, just a whole lot of people trying stuff

And no, I wouldn't ever give it a year of changelog.md. I give it a short description of the current functionality, and a well-trimmed list of 'lessons-learned' (specific pitfalls/traps from previous work, so the AI doesn't have to repeat them)

If you think git logs are a good way to give context, try it and and see how it works! My instinct's that it won't work as well as a short readme, but I could be wrong. It's so easy to prototype these days, no reason to not give it a shot

New comment by generallyjosh in "OpenAI raises $110B on $730B pre-money valuation"

generallyjosh — Sun, 01 Mar 2026 14:10:42 +0000

Larger models need more hardware resources to run

And, depending on effort settings, they do more 'thinking', i.e., use more rounds of inference to generate longer internal chains of thought

Both very good reasons to prefer a smaller model, if the small model is good enough for the task

New comment by generallyjosh in "Cognitive Debt: When Velocity Exceeds Comprehension"

generallyjosh — Sun, 01 Mar 2026 13:17:23 +0000

The problem isn't giving MORE context to an agent, it's giving the right context

These things are built for pattern matching, and if you keep their context focused on one pattern, they'll perform much better

You want to avoid dumping in a bunch of data (like a year's worth of git logs) and telling it to sort out what's relevant itself

Better to have pre-processing steps, that find (and maybe summarize) what's relevant, then only bring that into context

You can do that by running your git history through a cheap model, and asking it to extract the relevant bits for the current change. But, that can be overkill and error prone, compared to just maintaining markdown files as you make changes

New comment by generallyjosh in "We gave terabytes of CI logs to an LLM"

generallyjosh — Sat, 28 Feb 2026 00:33:45 +0000

I'd assume it probably depends how large and varied your logs are?

But, my guess, I could see an algorithm like that being very fast. It's basically just doing a form of compression, so I'm thinking ballpark, like similar amount to just zipping the log

Can't be anything CLOSE to the compute cost of running any part of the file through an LLM haha

New comment by generallyjosh in "This time is different"

generallyjosh — Fri, 27 Feb 2026 12:41:26 +0000

I think what most people are worried about is that, as you say, AGI won't necessarily have our biases/biological drives

That might also mean it has no drive for self-determination. It might just be perfectly happy to do whatever humans tell it to, even if it's far smarter than us (and, this is exactly the sort of AI people are trying to make)

So, superintelligence winds up doing whatever a very small group of controlling humans say. And, like you say, humans want to win

New comment by generallyjosh in "This time is different"

generallyjosh — Fri, 27 Feb 2026 12:34:30 +0000

You say this as though it's a pithy point.

Might as well say humans are just a better search tool - it's true in the exact same sense you're using.

All humans do is absorb information, then search through our memories and apply that information in relevant contexts to affect the world

New comment by generallyjosh in "This time is different"

generallyjosh — Fri, 27 Feb 2026 12:28:14 +0000

The whole point of an economy is to generate value. Very, very different than caring for people

Feudalism was the dominant economic system for millennia. The point is to extract value for the upper class. Peasants only matter as a source of labor, and they only get 'cared for' to the extent of keeping them alive and working.

Now think about what feudalism might look like if the peasants' labor could be automated

New comment by generallyjosh in "This time is different"

generallyjosh — Fri, 27 Feb 2026 12:12:38 +0000

There are lots of intelligent people looking at AI and imagining its potential

Are you just saying that you're more intelligent than them? You can see clearly, where all the steam engine technicians can't?

New comment by generallyjosh in "Claws are now a new layer on top of LLM agents"

generallyjosh — Sun, 22 Feb 2026 19:50:18 +0000

Openclaw isn't new (and the actual project never made itself out to be new)

It's a nice packaging, of a whole bunch of preexisting things. Agentic AI inside a nice sandbox container, running the model on a cron schedule, and with an ecosystem of ready made skills

Nothing new, but it made the tech easy for people to download and start using immediately. That's why you see so many people treating it as new - it's their first time hearing about such a setup

New comment by generallyjosh in "Why I don't think AGI is imminent"

generallyjosh — Sat, 21 Feb 2026 14:09:14 +0000

I do strongly agree on the framing, but I'd argue with the conclusion

Yeah, it really doesn't matter if AGI has happened, is going to happen, will never happen, whatever. No matter what sort of definition we make for it, someone's always doing to disagree anyway. For a looong time, we thought the Turing test was the standard, and that only a truly intelligent computer could beat it. It's been blown out of the water for years now, and now we're all arguing about new definitions for AGI

At the end of the day, like you say, it doesn't matter a bit how we define terms. We can label it whatever we want, but the label doesn't change what it can DO

What it can DO is the important part. I think a lot of software devs are coming to terms with the idea that AI will be able to replace vast chunks of our jobs in the very near future.

If you use these things heavily, you can see the trajectory.

6 months ago I'd only trust them for boiler plate code generation and writing/reviewing short in-line documentation.

Today, with the latest models and tools, I'm trusting them with short/low impact tasks (go implement this UI fix, then redeploy the app locally, navigate to it, and verify the fix looks correct).

6 months from now, my best guess is that they'll continue to become more capable of handling longer + more complex tasks on their own.

5 years from now, I'm seeing a real possibility that they'll be handling all the code, end to end.

Doesn't matter if we call that AGI or not. It very much will matter whose jobs get cut, because one person with AI can do the work of 20 developers

New comment by generallyjosh in "Dario Amodei – "We are near the end of the exponential" [video]"

generallyjosh — Sun, 15 Feb 2026 16:25:04 +0000

I'm finding the latest models are pretty good at debugging, if you give them the tools to debug properly

If they can run a tool from the terminal, see all the output in text format, and have a clear 'success' criteria, then they're usually able to figure out the issue and fix it (often with spaghetti code patching, but it does at least fix the bug)

I think the testing/verification part is going to keep getting better, as we figure out better tools the AI can use here (ex, parsing the accessibility tree in a web UI to click around in it and verify)

New comment by generallyjosh in "Dario Amodei – "We are near the end of the exponential" [video]"

generallyjosh — Sun, 15 Feb 2026 16:14:52 +0000

One of the first skills I made for Claude was a research skill.

I give it a question (narrow or really broad), and the model does a bunch of web searches using subagents, to try and get a comprehensive answer using current results

The important part is, when the model answers, I have it cite its sources, using direct links. So, I can directly confirm the accuracy and quality of any info it finds

It's been super helpful. I can give it super broad questions like "Here's the architecture and environment details I'm planning for a new project. Can you see if there's any known issues with this setup". Then, it'll give me direct links + summaries to any relevant pages.

Saves a ton of time manually searching through the haystack, and so far, the latest models are pretty good about not missing important things (and catching plenty of things I missed)

New comment by generallyjosh in "Breaking the spell of vibe coding"

generallyjosh — Sun, 15 Feb 2026 10:44:05 +0000

I do think there's value in trying out fully vibe coding some toy projects today (probably nothing real or security sensitive haha).

The AI will get better at compensating, but I think some of it's weaknesses are fundamental, and are going to be showing up in some form or another for a while yet

Ex, the AI doesn't know about what you don't tell it. There's a LOT of context we take for granted while programming (especially in a corporate environment). Recognizing what sort of context is useful to give the AI without distracting it (and under what conditions it should load/forget context), I think is going to be a very valuable skill over the next few years. That's a skill you can start building now

New comment by generallyjosh in "Claude’s C Compiler vs. GCC"

generallyjosh — Mon, 09 Feb 2026 19:54:48 +0000

That mirrors my experience so far. The AI is fantastic for prototyping, in languages/frameworks you might be totally unfamiliar with. You can make all sorts of cool little toy projects in a few hours, with just some minimal promoting

The danger is, it doesn't quite scale up. The more complex the project, the more likely the AI is to get confused and start writing spaghetti code. It may even work for a while, but eventually the spaghetti piles up to the point that not even more spaghetti will fix it

I'll get that's going to get better over the next few years, with better tooling and better ways to get the AI to figure out/remember relevant parts of the code base, but that's just my guess

New comment by generallyjosh in "GPT-5.3-Codex"

generallyjosh — Sat, 07 Feb 2026 01:14:13 +0000

All we can do is try our best to look at the world with clear eyes, and think about where the industry's going over the next couple years

Not how we want things to be, but how they actually are and will be

I don't think AI for programming is a passing fad

New comment by generallyjosh in "Vibe coding kills open source"

generallyjosh — Tue, 27 Jan 2026 11:19:18 +0000

IMO, documentation becomes much more important if we're planning to hand off coding to the LLMs

You can ask it about the code, sure, and it'll try to tell you how it works. But, what if there's a bug in the code? Maybe the LLM will guess at how it was supposed to work, or maybe it'll start making stuff up to justify the bug's existence (it's actually a hidden feature!)

The docs say how the code should work. For an LLM that has to go relearn everything about your code base every time you invoke it, that's vitally important

New comment by generallyjosh in "Claude Code's new hidden feature: Swarms"

generallyjosh — Sun, 25 Jan 2026 19:21:48 +0000

I do think there is some actual value in telling an LLM "you are an expert code reviewer". You really do tend to get better results in the output

When you think about what an LLM is, it makes more sense. It causes a strong activation for neorons related to "code review", and so the model's output sounds more like a code review.