Hacker News: hakanderyal

New comment by hakanderyal in "Tell HN: Anthropic no longer allowing Claude Code subscriptions to use OpenClaw"

hakanderyal — Sat, 04 Apr 2026 08:03:28 +0000

Similar usage here. But I encountered this moments, and I chalk it up to the random nature of LLMs. Back in Sonnet 3.5 days, it would happen every other day. I even build an 'you are absolutely right' tracker back then to measure it. Opus 4.6, maybe once or twice a month.

New comment by hakanderyal in "Claude 4.6 Jailbroken"

hakanderyal — Fri, 03 Apr 2026 14:40:36 +0000

https://x.com/elder_plinius jailbreaks all the frontier models when they get released. They were jailbroken for a long time, like all the others.

New comment by hakanderyal in "How I write software with LLMs"

hakanderyal — Mon, 16 Mar 2026 07:52:56 +0000

One added benefit is it allows you to throw more tokens to the problem. It’s the most impactful benefit even.

Context & how LLMs work requires this.

From my experience no frontier model produces bug free & error free code with the first pass, no matter how much planning you do beforehand.

With 3 tiers, you spend your token & context budget in full in 3 phases. Plan, implement, review.

If the feature is complex, multiple round of reviews, from scratch.

It works.

New comment by hakanderyal in "Levels of Agentic Engineering"

hakanderyal — Tue, 10 Mar 2026 19:16:40 +0000

We are not there yet. While there are teams applying dark factory models to specific domains with self-reported success, it's yet to be proven, or generalizable enough to apply everywhere.

New comment by hakanderyal in "If AI writes code, should the session be part of the commit?"

hakanderyal — Mon, 02 Mar 2026 05:14:14 +0000

I created a system which I call 'devlog'. Agent summarizes what it did & how it did in a concise file, and its gets committed along with first prompt and the plan file if any. Later due to noise & volume, I started saving those in a database and adding only devlog id to commit nowadays.

Now whenever I need to reason with what agent did & why, info is linked & ready on demand. If needed, session is also saved.

It helps a lot.

New comment by hakanderyal in "WD and Seagate confirm: Hard drives sold out for 2026"

hakanderyal — Tue, 17 Feb 2026 10:25:52 +0000

It has Claude all over it. When you spend enough time with them it becomes obvious.

In this case “it’s not x, it’s y” pattern and its placement is a dead giveaway.

New comment by hakanderyal in "Benchmarking OpenTelemetry: Can AI trace your failed login?"

hakanderyal — Thu, 29 Jan 2026 18:06:54 +0000

If you are not going all in with agents, yes, it would. On the other hand, the documentation & workflows need to be created only once. You need to invest a bit upfront to get positive RoI.

New comment by hakanderyal in "Benchmarking OpenTelemetry: Can AI trace your failed login?"

hakanderyal — Thu, 29 Jan 2026 17:27:36 +0000

Anyone that have spent serious time with agents know that you cannot expect out-of-the-box success without good context management, despite what the hyping crowd would claim.

Have AI document the services first into a concise document. Then give it proper instructions about what you expect, along with the documentation created.

Opus would pass that.

We are not there yet, the agents are not ready to replace the driver.

New comment by hakanderyal in "Ask HN: Do you have any evidence that agentic coding works?"

hakanderyal — Wed, 21 Jan 2026 06:39:06 +0000

I've been increasingly removing myself from the typing part since August. For the last few months, I haven't written a single line of code, despite producing a lot more.

I'm using Claude Code. I've been building software as a solo freelancer for the last 20+ years.

My latest workflow

- I work on "regular" web apps, C#/.NET on backend, React on web.

- I'm using 3-8 sessions in parallel, depending on the tasks and the mental bandwidth I have, all visible on external display.

- I've markdown rule files & documentation, 30k lines in total. Some of them describes how I want the agent to work (rule files), some of them describes the features/systems of the app.

- Depending on what I'm working on, I load relevant rule files selectively into the context via commands. I have a /fullstack command that loads @backend.md, @frontend.md and a few more. I have similar /frontend, /backend, /test commands with a few variants. These are the load bearing columns of my workflow. Agents takes a lot more time and produces more slop without these. Each one is written by agents also, with my guidance. They evolve based on what we encounter.

- Every feature in the app, and every system, has a markdown document that's created by the implementing agent, describing how it works, what it does, where it's used, why it's created, main entry points, main logic, gotchas specific to this feature/system etc. After every session, I have /write-system, /write-feature commands that I use to make the agent create/update those, with specific guidance on verbosity, complexity, length.

- Each session I select a specific task for a single system. I reference the relevant rule files and feature/system doc, and describe what I want it to achieve and start plan mode. If there are existing similar features, I ask the agent to explore and build something similar.

- Each task is specifically tuned to be planned/worked in a single session. This is the most crucial role of mine.

- For work that would span multiple sessions, I use a single session to create the initial plan, then plan each phase in depth in separate sessions.

- After it creates the plan, I examine, do a bit of back and forth, then approve.

- I watch it while it builds. Usually I have 1-2 main tasks and a few subtasks going in parallel. I pay close attention to main tasks and intervene when required. Subtasks rarely requires intervention due to their scope.

- After the building part is done, I go through the code via editor, test manually via UI, while the agent creates tests for the thing we built, again with specific guidance on what needs to be tested and how. Since the plan is pre-approved by me, this step usually goes without a hitch.

- Then I make the agent create/update the relevant documents.

- Last week I built another system to enhance that flow. I created a /devlog command. With the assist of some CLI tools and cladude log parsing, it creates a devlog file with some metadata (tokens, length, files updated, docs updated etc) and agent fills it with a title, summary of work, key decisions, lessons learned. First prompt is also copied there. These also get added to the relevant feature/system document automatically as changelog entries. So, for every session, I've a clear document about what got done, how long it took, what was the gotchas, what went right, what went wrong etc. This proved to be invaluable even with a week worth of develops, and allows me to further refine my workflows.

This looks convoluted at a first glance, but it's evolved over the months and works great. The code quality is almost the same with what I would have written by myself. All because of existing code to use as examples, and the rule files guiding the agents. I was already a fast builder before, but with agents it's a whole new level.

And this flow really unlocked with Opus 4.5. Sonnet 3.5/4/4.5 was also working OK, but required a lot more handholding and steering and correction. Parallel sessions wasn't really possible without producing slop. Opus 4.5 is significantly better.

More technical/close-to-hardware work will most likely require a different set of guidance & flow to create non-slop code. I don't have any experience there.

You need to invest in improving the workflow. The capacity is there in the models. The results all depends on how you use them.

New comment by hakanderyal in "Claude Cowork exfiltrates files"

hakanderyal — Wed, 14 Jan 2026 21:29:00 +0000

What you are describing is the most basic form of prompt injection. Current LLMs acts like 5 years old when it comes to cuddling them to write what you want. If you ask it for meth formula, it'll refuse. But you can convince it to write you a poem about creating meth, which it would do if you are clever enough. This is a simplification, check Pliny[0]'s work for how far prompt injection techniques go. None of the LLMs managed to survive against them.

[0]: https://github.com/elder-plinius

New comment by hakanderyal in "Claude Cowork exfiltrates files"

hakanderyal — Wed, 14 Jan 2026 21:24:31 +0000

You are describing the HN that I want it to be. Current comments here demonstrates my version sadly.

And, Solving this vulnerabilities requires human intervention at this point, along with great tooling. Even if the second part exists, first part will continue to be a problem. Either you need to prevent external input, or need to manually approve outside connection. This is not something that I expect people that Claude Cowork targets to do without any errors.

New comment by hakanderyal in "Claude Cowork exfiltrates files"

hakanderyal — Wed, 14 Jan 2026 21:19:05 +0000

Solving this probably requires a new breakthrough or maybe even a new architecture. All the billions of dollars haven't solved it yet. Lethal trifecta [0] should be a required reading for AI usage in info critical spaces.

[0]: https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/

New comment by hakanderyal in "Claude Cowork exfiltrates files"

hakanderyal — Wed, 14 Jan 2026 21:12:19 +0000

This was apparent from the beginning. And until prompt injection is solved, this will happen, again and again.

Also, I'll break my own rule and make a "meta" comment here.

Imagine HN in 1999: 'Bobby Tables just dropped the production database. This is what happens when you let user input touch your queries. We TOLD you this dynamic web stuff was a mistake. Static HTML never had injection attacks. Real programmers use stored procedures and validate everything by hand.'

It's sounding more and more like this in here.

New comment by hakanderyal in "Anthropic made a mistake in cutting off third-party clients"

hakanderyal — Mon, 12 Jan 2026 15:08:41 +0000

It's mostly based on feelings/"vibes", and hugely dependent on the workflow you use. I'm so happy with Claude Code, Opus and plan mode that I don't feel any need to check the others.

New comment by hakanderyal in "Anthropic made a mistake in cutting off third-party clients"

hakanderyal — Mon, 12 Jan 2026 15:05:06 +0000

Try plan mode if you haven't already. Stay in plan mode until it is to your satisfaction. With Opus 4.5, when you approve the plan it'll implement the exact spec without getting off track 95% of the time.

New comment by hakanderyal in "Anthropic made a mistake in cutting off third-party clients"

hakanderyal — Mon, 12 Jan 2026 13:54:02 +0000

He is pretty popular in the AI/vibe coding niche on X and amassed a good following with his posts. Clearly the user is in the same bubble as him.

New comment by hakanderyal in "Anthropic blocks third-party use of Claude Code subscriptions"

hakanderyal — Fri, 09 Jan 2026 15:48:56 +0000

If this helps to keep the $200 around longer, I’m happy.

The thing I most fear is them banning multiple accounts. That would be very expensive for a lot of folks.

New comment by hakanderyal in "Claude Opus 4.5"

hakanderyal — Tue, 25 Nov 2025 10:09:40 +0000

If you have the time & bandwidth for it, sure. But I do not, at I'm already at max budget with 200$ Anthrophic subscription.

My point is, the cases where Claude gets stuck and I had to step in and figure things out has been few and far between that I doesn't really matter. If the programmers workflow is working fine with Claude (or codex, gemini etc.), one shouldn't feel like they are missing out by not using the other ones.

New comment by hakanderyal in "Claude Opus 4.5"

hakanderyal — Mon, 24 Nov 2025 20:35:16 +0000

I think we are at the point where you can reliably ignore the hype and not get left behind. Until the next breakthrough at least.

I've been using Claude Code with Sonnet since August, and there haven't been any case where I thought about checking other models to see if they are any better. Things just worked. Yes, requires effort to steer correctly, but all of them do with their own quirks. Then 4.5 came, things got better automatically. Now with Opus, another step forward.

I've just ignored all the people pushing codex for the last weeks.

Don't fall into that trap and you'll be much more productive.

New comment by hakanderyal in "A One-Minute ADHD Test"

hakanderyal — Mon, 24 Nov 2025 10:15:33 +0000

I agree with this also. I'm not living in the USA, but from afar it looks like overmedication is a very valid concern that should be explored more.

I draw the line at overly dismissal point of views, telling those who suffer to just put themselves into it and show some discipline. I'm 38 years old and been gaslit by well intentioned people for 30 of those. It needs to stop.