Hacker News: mg

New comment by mg in "Don't trust large context windows"

mg — Sun, 14 Jun 2026 09:21:27 +0000

From the SOTA model providers, I only use OpenAI and Google. And between gpt-5.5 and gemini-3.1-pro-preview, gpt-5.5 is currently leading.

New comment by mg in "Don't trust large context windows"

mg — Sun, 14 Jun 2026 08:48:07 +0000

Agreed that it is not linear.

I wrote my own agent, and it sends data to LLMs in this order: "General Prompts (How to write good code)" + "The Code" + "The Feature Request". This means the KV cache will be used even when the feature request changes.

And output tokens are usually way less than the input tokens.

So I think that my approach is very lightweight on token usage compared to an interactive session.

It would be interesting to measure it for the other agents out there. Sending a feature request two times vs an interactive session.

New comment by mg in "Don't trust large context windows"

mg — Sun, 14 Jun 2026 08:17:03 +0000

It is the other way round.

In an interactive session, adding "Fine, but make the button red" after the model generated a first solution more than doubles the tokens used. As the model now not only gets the original code and the feature request but also the updated code plus the change request as input tokens.

Sending a feature request to an LLM and then sending the feature request again with "The button shall be red" only doubles the tokens used.

New comment by mg in "Don't trust large context windows"

mg — Sun, 14 Jun 2026 07:37:27 +0000

Considerations about what goes on in agents internally will probably not be part of software development for long.

Personally, I already see LLMs and agents as blackboxes. I give each feature request to multiple LLMs and then compare the results. I don't manually use "sessions" at all. I just look at the outcome. When I dislike it, I "git reset --hard", change my prompts and restart the feature request.

To have an ongoing sense of which agents perform best, I keep a log and calculate an ELO score of which agents meet my demands best. This score is imporant to me, not so much how the agent achieves it.

New comment by mg in "The iPhone's Last Stand?"

mg — Tue, 09 Jun 2026 14:48:52 +0000

In my current workflow, yes, I read all code.

In fact, I usually let multiple LLMs implement the same feature, and then I compare them. I even run my own arena in which I calculate Elo scores for LLMs from my perspective of which one implemented features better.

Having the ability to control code agents via voice would not take away my ability to do that. But I think in the future, that will become less and less necessary. If we look back at this conversation in five years, it will look very archaic, and we will be used to having superhuman AI do everything for us. In 10 years, it will sound like a strange idea that humans were once fiddling with code to improve the quality.

New comment by mg in "The iPhone's Last Stand?"

mg — Tue, 09 Jun 2026 14:12:53 +0000

    How would you detect the
    presence of bugs in this
    scenario?

I would ask AI. "Did the last commit introduce any bugs or unintended consequences?". In fact I already use this prompt after every change I make manually.

    How would you make sure the LLM
    isn't adding yet another
    useless, redundant function to
    the code base?

By asking AI. In fact, I already run a long "Can you refactor anything in this codebase to reduce redundancy, improve readability, performance or maintainability" pretty regularly.

New comment by mg in "The iPhone's Last Stand?"

mg — Tue, 09 Jun 2026 13:43:21 +0000

It would tell me about the changes like a human would.

"It changed the plot function so it takes another parameter called linewidth. It also added an input field in the stylecontrols section where the user can ...".

New comment by mg in "The iPhone's Last Stand?"

mg — Tue, 09 Jun 2026 11:40:51 +0000

    you will be surrounded by an ecosystem of
    devices, none of which stand alone, but are
    more like portals to interact with your agents

I would be really happy with my phone + headphones as the device I use most. But only if I could use Gemini (or ChatGPT or Grok or any other chat agent) in voice mode and say "SSH into my GitHub Codespace soandso and implement feature soandso.". And it replies "Did it. I told copilot (or codex or whatever coding agent lives on that VM) to implement the feature".

And then a minute later I could ask it "Is copilot done yet?" and it replies "No, looks like it is still working on it". And then a minute later I ask again. It replies "Yes, it finished. It changed chart.py and styles.css. Do you want me to tell you what specific changes it made to the files?".

But it looks like none of the chat agents with voice interface have such a connector at the moment? An SSH connector would be the most useful. But a "GitHub Codespace connector" or something like that would also do.

I wonder if that will be a missing piece for long. If so, I would build an agent with voice mode and ssh connector myself. But I guess it should come out from the big guys any moment now?

New comment by mg in "Ask HN: What is your (AI) dev tech stack / workflow?"

mg — Fri, 05 Jun 2026 15:52:41 +0000

I wrote my own tooling around the raw LLMs:

I can tick files in Vim, those get concatenated into a prompt. Along with a feature request. Plus an instructions file that tells the LLM how to reply. Plus my general "rules for good code" file, plus one "rules for good code" file per language involved, plus a project specific overview file. The LLM then answers with a list of changes it wants to make to the code. My tooling then applies those changes and I look at them via "git diff". If I like it, I commit. If not, I change one of the prompts and start the process again.

Instead of replying with code changes, the LLM can also decide to request more files. I wrote a little DSL for that.

I described the beginnings of this workflow last July:

https://www.gibney.org/prompt_coding

Feels like an eternity ago. I think I will write a new blog post this July and describe how the workflow has evolved over the past year.

New comment by mg in "Wake up! 16b"

mg — Sun, 24 May 2026 10:58:17 +0000

Makes me wonder how many bytes the shortest possible Mandelbrot implementation would need.

ChatGPT Shopping

mg — Sat, 18 Apr 2026 15:06:31 +0000

Article URL: https://chatgpt.com/shopping/

Comments URL: https://news.ycombinator.com/item?id=47816487

Points: 1

# Comments: 1

New comment by mg in "Node.js needs a virtual file system"

mg — Tue, 17 Mar 2026 15:43:17 +0000

    You can’t import or require() a module
    that only exists in memory.

You can convert it into a data url and import that, can't you?

New comment by mg in "The engine of Germany's wealth is blocking its future"

mg — Sat, 14 Mar 2026 07:49:50 +0000

However you look at it, sitting at home doing nothing is not the right approach for engineers to get their company back on track.

If there is no money to pay them, they should get shares in the company. So if their R&D is successful, they participate in the outcome.

New comment by mg in "I'm going to build my own OpenClaw, with blackjack and bun"

mg — Wed, 11 Mar 2026 18:40:34 +0000

Why can't that "dedicated development environment" be a cloud VM with a web interface, a GitHub codespace for example?

You could put the example code on the filesystem of that VM too.

New comment by mg in "I'm going to build my own OpenClaw, with blackjack and bun"

mg — Wed, 11 Mar 2026 12:17:35 +0000

But does the agent have access to a whole computer to write those tools?

Couldn't it write them in a web based dev environment?

New comment by mg in "I'm going to build my own OpenClaw, with blackjack and bun"

mg — Wed, 11 Mar 2026 11:48:39 +0000

    how would claude code work from a browser environment?

If you want an agent (like OpenClaw) to write software, why have it use another agent (Claude Code) in the first place? Why not let it develop the software directly? As for how that works in a browser - there are countless web based solutions to write and run software in the cloud. GitHub Codespaces is an example.

New comment by mg in "I'm going to build my own OpenClaw, with blackjack and bun"

mg — Wed, 11 Mar 2026 10:58:45 +0000

I wonder if we really need agents to have control of a full computer.

Maybe a browser plugin that lets the agent use websites is enough?

What would be a task that an agent cannot do on the web?

New comment by mg in "The engine of Germany's wealth is blocking its future"

mg — Mon, 09 Mar 2026 15:45:42 +0000

    German chancellor Friedrich Merz ... 
    lashed out at German workers to
    “simply do a little more,”

Germany literally pays people to do nothing.

A friend of mine, an engineer who works in the German car industry, recently told me that nowadays he has a lot of free time. Because the company he works for has so few orders that the company is granted "Kurzarbeitergeld" - the government pays 60% of the salary if the employees work less.

That blew my mind. If I had fewer orders, I would work more to increase the quality of my product and my efficiency. Working less as a reaction to losing market share seems completely counterproductive to me.

New comment by mg in "MacBook Air with M5"

mg — Tue, 03 Mar 2026 16:08:50 +0000

What basic office tasks are that?

The last time I was excited about the performance of local computers was in the 90s I think.

Modern laptops are so insanely fast. Not sure if they are 2x, 10x or 100x faster than I need them to be. But I never hear fans. I never have to wait for the machine these days.

New comment by mg in "MacBook Air with M5"

mg — Tue, 03 Mar 2026 15:12:10 +0000

The one thing that interests me most when it comes to laptops these days is weight. So I jumped right into the tech specs section and looked it up. Since this is the "Air" laptop of the company that is popular for thin and lightweight devices, my hopes were high.

But ...

The 13 inch version is heavier than a ThinkPad X1 Carbon. Which has a 14 inch screen and can run Linux.