Hacker News: orange_puff

New comment by orange_puff in "Writing code is cheap now"

orange_puff — Tue, 24 Feb 2026 06:59:31 +0000

I basically fully agree with this. I am not sure how to handle the ramifications of this in my day to day work yet. But at least one habit I have been forming is sometimes I find that even though the cost of writing code is immensely cheap, reviewing and validating that it works in certain code bases (like the millions of line mono repo I work in at my job) is extremely high. I try to think through, and improve, our testability such that a few hundred line of code change that modifies the DB really can be a couple of hours of work.

Also, I do want to note that these little "Here is how I see the world of SWE given current model capabilities and tooling" posts are MUCH appreciated, given how much you follow the landscape. When a major hype wave is happening and I feel like I am getting drowned on twitter, I tend to wonder "What would Simon say about this?"

New comment by orange_puff in "Ask HN: What explains the recent surge in LLM coding capabilities?"

orange_puff — Thu, 19 Feb 2026 05:08:06 +0000

This is also my feeling. When people keep referring to big jumps or inflection points, I am left confused, because the models have felt good for a long time and feel like they are getting steadily better. This could be biased by what I use them for though.

Ask HN: What explains the recent surge in LLM coding capabilities?

orange_puff — Sun, 15 Feb 2026 00:13:09 +0000

It seems like we are in the midst of another AI hype cycle. Many people are calling the current coding models an "inflection point", where now the capabilities are so high that future model growth will be explosive. I have heard serious people, like economics writer Noah Smith, make this argument [0].

But it's not just the commentariat. I have seen very serious people in software engineering and tech talk about the ways in which their coding habits have change drastically.

Benchmarks [1] alone don't seem to capture everything, although there have been jumps in the agentic sections, so maybe they actually do.

My question is; what explains these big jumps in capabilities that many serious people seem to be noticing all at once? Is it simply that we have thrown enough data and compute at the models, or instead, are labs perhaps fine-tuning models to get really good at tool calls, which leads to this new, surprising behavior?

When I explain agents to people, I usually walk them through a manual task one might go through when debugging code. You copy some code into ChatGPT, it asks you for more context, you copy some more code in, it suggests and edit, you edit and run, there is an error, so you paste that in, and so on. An agent is just an LLM in that loop which can use tools to do those things automatically. It would not be shocking to me if we took weaker models like Claude Opus 4.0 and made it 10x better at tool calls, it would be a much stronger and more impressive model. But is that all that is happening, or am I missing something big?

[0] https://substack.com/@noahpinion/p-187818379

[1] https://www.anthropic.com/news/claude-opus-4-6

Comments URL: https://news.ycombinator.com/item?id=47019798

Points: 12

# Comments: 9

New comment by orange_puff in "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours"

orange_puff — Wed, 17 Dec 2025 22:39:53 +0000

Do you mind elaborating? By API design, do you mean how they structured their classes, methods, etc. or something else?

New comment by orange_puff in "I ported JustHTML from Python to JavaScript with Codex CLI and GPT-5.2 in hours"

orange_puff — Wed, 17 Dec 2025 05:12:12 +0000

This seems really impressive. I am too lazy to replicate this, but I do wonder how important the test suite is for a a port that likely uses straight forward, dependency free python code https://github.com/EmilStenstrom/justhtml/tree/main/src/just...

It is enormously useful for the author to know that the code works, but my intuition is if you asked an agent to port files slowly, forming its own plan, making commits every feature, it would still get reasonably close, if not there.

Basically, I am guessing that this impressive output could have been achieved based on how good models are these days with large amounts of input tokens, without running the code against tests.

New comment by orange_puff in "A Research Preview of Codex"

orange_puff — Fri, 16 May 2025 21:51:48 +0000

I used to think this way too. Here are a few ways I've tried to re frame things that has helped.

1. When I work on side projects and use AI, sometimes I wonder "what's the point if I am just copy / pasting code? I am not learning anything" but what I have come to realize is building apps with AI assistance is the skill that I am learning, rather than writing code per se as it was a few years ago.

2. I work in high scale distributed computing, so I am still presented with ample opportunities to get very low level, which I love. I am not sure how much I care about writing code per se anymore. Working with AI still is tinkering, it has not changed that much for me. It is quite different, but the underlying fun parts are still present.

New comment by orange_puff in "The unreasonable effectiveness of an LLM agent loop with tool use"

orange_puff — Fri, 16 May 2025 05:59:39 +0000

I have been trying to find such an article for so long, thank you! I think a common reaction to Agents is “well, it probably cannot solve a really complex problem very well”. But to me, that isn’t the point of an agent. LLMs function really well with a lot of context, and agent allows the LLM to discover more context and improve its ability to answer questions.

New comment by orange_puff in "Void: Open-source Cursor alternative"

orange_puff — Fri, 09 May 2025 15:23:57 +0000

As others have mentioned please add more docs / details to the README

I want to mention my current frustration with cursor recently and why I would love an OSS alternative that gives me control; I feel cursor has dumped agentic capabilities everywhere, regardless of whether the user wants it or not. When I use the Ask function as opposed to Agent, it seems to still be functioning in an agentic loop. It takes longer to have basic conversations about high level ideas and really kills my experience.

I hope void doesn’t become an agent dumping ground where this behavior is thrust upon the user as much as possible

Not to say I dislike agent mode, but I like to choose when I use it.

New comment by orange_puff in "Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison"

orange_puff — Mon, 31 Mar 2025 18:12:14 +0000

When you say "vibe code" do you mean the true definition of that term, which is to blindly accept any code generated by the AI, see if it works (maybe agent mode does this) and move on to the next feature? Or do you mean prompt driven development, where although you are basically writing none of the code, you are still reading every line and maintain high involvement in the code base?

New comment by orange_puff in "I've been using Claude Code for a couple of days"

orange_puff — Mon, 10 Mar 2025 15:37:59 +0000

https://open.substack.com/pub/orangepuff/p/first-impressions... I used Claude code to get started on a pdf reader I wanted to build. This pdf reader has a built in LLM chat and when you ask a question about the pdf you’re reading, the page text will be automatically prepended to the question.

Nothing fancy or special. It was built with streamlit in about 150 lines and a single file. But I was impressed that Claude code 1 shot it

New comment by orange_puff in "ForeverVM: Run AI-generated code in stateful sandboxes that run forever"

orange_puff — Wed, 26 Feb 2025 18:24:02 +0000

May I ask how you got the opportunity to invest in this company? If you are a VC, makes sense, just wondering how normies can get access to invest in companies they believe in. Thanks

New comment by orange_puff in "Microsoft cancels leases for AI data centers, analyst says"

orange_puff — Wed, 26 Feb 2025 17:05:14 +0000

I have started using AI for all of my side projects, and am now building stuff almost everyday. I did this as a way to ease some of my anxiety related to AI progress and how fast it is moving. It has actually had the opposite effect; it's more amazing than I thought. I think the difficulty in reasoning about 2) is that given what interesting and difficult problems it can already solve, it's hard to reason about where it will be in 3-5 years.

But, I am also having more fun building things than perhaps the earliest days of my first code written, which is just over 7 years now. Insofar as 1) goes, yes, I never want to go back. I can learn faster and more deeply than I ever could. It's really exciting!

New comment by orange_puff in "Show HN: We're building a desktop app for browser-based AI agents"

orange_puff — Sun, 02 Feb 2025 04:42:07 +0000

When making requests, does your tool use the normal chrome user agent header or does it specify the request is coming from meha?

New comment by orange_puff in "Trae: An AI-powered IDE by ByteDance"

orange_puff — Fri, 24 Jan 2025 21:00:39 +0000

I am enjoying Trae so far, but was wondering if it is possible to point it at a local model. https://docs.trae.ai/docs/open-source-software-notice?_lang=... I am not seeing any mention of this in the docs

New comment by orange_puff in "30% drop in O1-preview accuracy when Putnam problems are slightly variated"

orange_puff — Wed, 01 Jan 2025 19:45:02 +0000

This is very interesting, but a couple of things to note; 1. o1 still achieves > 40% on the varied Putnam problems, which is still a feat most math students would not achieve. 2. o3 solved 25% of the Epoch AI dataset. - There was an interesting post which calls into question how difficult some of those problems actually are, but it still seems very impressive.

I think a fair conclusion here is reasoning models are still really good at solving very difficult math and competitive programming problems, but just better at ones they have seen before.

New comment by orange_puff in "Adventures in algorithmic trading on the Runescape Grand Exchange"

orange_puff — Sun, 27 Oct 2024 06:33:53 +0000

I really enjoyed the article, as I am also a nerd who has played runescape for what seems like forever now, and now most of my interaction with the game is via programming. I have two types of bots I use mostly, color bots, where a screenshot is taken and objects are detected by their surrounding pixel colors, and basic click bots. I've actually found that ~100 lines of python code using pyautogui is more than enough to automate tons of annoying aspects of the game.

I am curious, is your Java client one of the many open source bot clients that actually calls into client code? Or is it some type of click script which does some repetitive inputs?

I have had bad luck with the former in terms of getting banned.

New comment by orange_puff in "I Accidentally Deleted 7TB of Videos Before Going to Production"

orange_puff — Thu, 05 May 2022 14:32:49 +0000

As everyone else has already pointed out, better testing would have been very useful here. For instance, print(len(our_ids)) would have been a dead giveaway that that something was up

I am also a junior dev and completely empathize with being given a lot of responsibility and potentially messing up. I think for someone with < 1 year of experience, to solve the problems you created as fast as you did is really impressive. Thankfully your story ends well :)

New comment by orange_puff in "I changed my mind about advertising"

orange_puff — Thu, 10 Feb 2022 23:34:24 +0000

I don't like ads either, but I also couldn't possibly expect content on the internet to be free, so am happy with the exchange we have currently.

New comment by orange_puff in "Using a mild Twitter addiction to get things done"

orange_puff — Mon, 03 Jan 2022 15:25:37 +0000

Hello. I just recently built a website blocker Firefox extension so I wanted to check your productivity tool out. The UI is really good and I left it a 5 star.

If interesting in feedback; 1. It seems to only run the block/reroute to to-do list logic when a page is loaded. If the user is already on their site and then adds that site to the blocked list and has to-do list items, it won't be blocked immediately. I don't think this feature is vital but might align better with expectation. 2. Maybe show the to-do list and blocked sites button on the main extension popup, rather than having to go -> to-do list -> blocked sites 3. Maybe on/off button in case user has some sort of issue and needs to quickly access a site that is blocked.

New comment by orange_puff in "Advent of Code 2021"

orange_puff — Wed, 01 Dec 2021 18:13:03 +0000

This is my first year participating. I've solved about 20-30 questions from previous years and really loved them. I tried to compete last night for top 100 and was 7 seconds off (rank 123). I shouldn't have tested my code for the first problem because it was too trivial!!

But, that gets the competitive aspect of this challenge out of the way immediately so I can simply have fun with these problems :)

I am excited for them to become a bit more challenging because my friend and I are going to work on them together.