Hacker News: mark_undoio

New comment by mark_undoio in "Ask HN: DDD was a great debugger – what would a modern equivalent look like?"

mark_undoio — Mon, 26 Jan 2026 15:49:51 +0000

I've recently been thinking how AI agents could affect this.

If you're lucky enough to be able to code significant amounts with a modern agent (someone's paying, your task is amenable to it, etc) then you may experience development shifting (further) from "type in the code" to "express the concepts". Maybe you still write some code - but not as much.

What does this look like for debugging / understanding? There's a potential outcome of "AI just solves all the bugs" but I think it's reasonable to imagine that AI will be a (preferably helpful!) partner to a human developer who needs to debug.

My best guess is:

* The entities you manage are "investigations" (mapping onto agents) * You interact primarily through some kind of rich chat (includes sensibly formatted code, data, etc) * The primary artefact(s) of this workflow are not code but something more like "clues" / "evidence".

Managing all the theories and snippets of evidence is already core to debugging the old fashioned way. I think having agents in the loop gives us an opportunity to make that explicit part of the process (and then be able to assign agents to follow up gaps in the evidence, or investigate them yourself or get someone else to...).

New comment by mark_undoio in "How London cracked mobile phone coverage on the Underground"

mark_undoio — Sun, 18 Jan 2026 16:38:00 +0000

Maybe this?

https://sites.google.com/site/616cellnet/ct2

New comment by mark_undoio in "How London cracked mobile phone coverage on the Underground"

mark_undoio — Sun, 18 Jan 2026 16:35:50 +0000

Whilst I have seen the Rabbit fittings others have mentioned I do remember mention of a "zone phone" and suspect we're recalling the same thing...

New comment by mark_undoio in "Toad is a unified experience for AI in the terminal"

mark_undoio — Tue, 23 Dec 2025 22:04:12 +0000

Very excited to see this come out - though coding agents are impressive their UIs are a bit of a mixed bag.

Textual offers incredibly impressive terminal experiences so I'm very much looking forward to this.

I wonder how much agentic magic it'll be able to include though - Claude Code often seems like a lot of its intelligence comes from the scaffolding, not just the LLM. I'm excited to see!

New comment by mark_undoio in "Patagonian Welsh"

mark_undoio — Mon, 29 Sep 2025 10:25:58 +0000

I love the Patagonian Welsh. BBC Wales, which often has great comedy, has a sitcom based around the original emigration to Patagonia: https://www.bbc.co.uk/programmes/b060cd20

The whole thing feels very much like a Star Trek plot to me with a culture leaving on a ship to an unknown world to preserve their way of life - which later the crew would happen upon in some episode.

New comment by mark_undoio in "Vetinari's Clock (2011)"

mark_undoio — Fri, 05 Sep 2025 11:44:37 +0000

In Cambridge we've got a clock called the Chronophage which is intended to be a sinister "eater of time" - the designer has done a good job of making it feel uncomfortable to look at. There's some detail here: https://www.corpus.cam.ac.uk/articles/secrets-corpus-clock

My memories of what I've heard over time:

* The grasshopper escapement actually is the demonic insect that sits on the top, "walking" around the serrated ring.

* Although it's backlit electronically it's actually a fully mechanical design - including all of the weird things it does.

* The Chronophage itself blinks its eyes unnervingly.

* It sometimes pauses or ticks slightly backwards, then runs faster to catch up again.

* On certain special dates it does extra weird stuff.

* The "chime" is a metal chain dropping into a box.

There were three made in the series, this was the first one. I've always found it slightly unappealing aesthetically but also compelling - there's no arguing with the fact that there's always a crowd of fascinated observers looking at it.

New comment by mark_undoio in "BBC Micro, ancestor to ARM"

mark_undoio — Sun, 17 Aug 2025 18:10:03 +0000

Acorn was doing stuff in Cambridge UK until more recently than I'd realised - it effectively incubated a load of talent that went on to find other companies. Famously ARM span out of it but many others also went on to do cool things - my current company was founded by Acorn people.

New comment by mark_undoio in "QNX: The Incredible 1.44M Demo"

mark_undoio — Wed, 13 Aug 2025 10:01:29 +0000

This demo was so cool. There were lots of alternative OSes out there back then that felt very impressive.

Linux at the time was cool too but less polished than now. Lots of people were on the Win 9x series, which wasn't amazing - and Mac OS X was not yet fully baked.

These other OSes (QNX, BeOS) felt polished, amazingly fast - and slightly alien. The main sad thing from my perspective was that I couldn't get them online (my machine had a winmodem and nobody had open source drivers for those for ages).

New comment by mark_undoio in "The History of Windows XP"

mark_undoio — Tue, 12 Aug 2025 14:09:56 +0000

Encarta 95 had it - I remember thinking how cool it was that it looked like Win 95.

New comment by mark_undoio in "Cursor CLI"

mark_undoio — Fri, 08 Aug 2025 08:16:14 +0000

I'm pretty comfortable with the agent scaffolding just restricting directory access but I can see places it might not be enough...

If you were being really paranoid then I guess they could write a script in the local directory that then runs and accesses other parts of the filesystem.

I've not seen any evidence an agent would just do that randomly (though I suppose they are nondeterministic). In principle maybe a malicious or unlucky prompt found somewhere in the permitted directory could trigger it?

New comment by mark_undoio in "Claude Opus 4.1"

mark_undoio — Tue, 05 Aug 2025 22:57:41 +0000

Amp (ampcode.com) uses Sonnet as its main model and has GPT o3 as a special purpose tool / subagent. It can call into that when it needs particularly advanced reasoning.

Interestingly I found that prompting it to ask the o3 submodel (which they call The Oracle) to check Sonnet's working on a debugging solution was helpful. Extra interesting to me was the fact that Sonnet appeared to do a better job once I'd prompted that (like chain of thought prompting, perhaps asking it to put forward an explanation to be checked actually triggered more effective thinking).

New comment by mark_undoio in "Comparing the Glove80 and Maltron Keyboards"

mark_undoio — Wed, 23 Jul 2025 08:17:29 +0000

I've heard various online stories, over the years, about how nobody nails the ergonomics quite like Maltron does. Which is amazing given how long they've been about.

I have an old Maltron that I got cheap (many years ago and it was old then!) and it's remarkable how unlike a modern consumer product it was. Thin, vacuum formed plastic, point-to-point soldered wire keyboard matrix. But that classic shape, keys with full sized key caps and travel, etc are all present.

The Kinesis I also have is much more mainstream - it feels more solid and looks more like a consumer product. But I understand it's just not quite as good, ergonomically.

New comment by mark_undoio in "Roman Roads Research Association (UK)"

mark_undoio — Sun, 20 Jul 2025 10:35:56 +0000

It's quite weird having local footpaths and paved roads that turn out to have been constructed by the Romans originally - around here that also applies to canals, drainage ditches, etc. It just blends into modern reality.

I imagine some of the Roman stuff was built on even older roads and channels.

New comment by mark_undoio in "Mercury: Ultra-fast language models based on diffusion"

mark_undoio — Mon, 07 Jul 2025 14:27:15 +0000

> I don't understand this. Developer time is so much more expensive than machine time. Do companies not just double their CI workers after hearing people complain? It's just a throw-more-resources problem.

I'd personally agree. But this sounds like the kind of thing that, at many companies, could be a real challenge.

Ultimately, you can measure dollars spent on CI workers. It's much harder and less direct to quantify the cost of not having them (until, for instance, people start taking shortcuts with testing and a regression escapes to production).

That kind of asymmetry tends, unless somebody has a strong overriding vision of where the value really comes from, to result in penny pinching on the wrong things.

New comment by mark_undoio in "Building the Rust Compiler with GCC"

mark_undoio — Mon, 07 Jul 2025 09:51:47 +0000

Process recording by time travel debug seems like a good fit for this problem - then you can capture 100% of process execution and then go back and investigate further.

We (Undo.io) came up with a technique for following a tree of processes and initiating process recording based on a glob of program name. It's the `--record-on` flag in https://docs.undo.io/UsingTheLiveRecorderTool.html. You can grab a free trial from our website.

For open source, with rr (https://rr-project.org/) I think you'd just `rr record` the initial process and you'll end up capturing the whole process tree - then you can look at the one you're interested in.

As others have said you could also do some smart things with GDB's follow-fork settings but I think process recording is ideal for capturing complicated situations like this as you can go and review what happened later on.

New comment by mark_undoio in "Can Large Language Models Play Text Games Well? (2023)"

mark_undoio — Fri, 04 Jul 2025 22:31:19 +0000

That's the thing though - they're using logs. My theory is that LLMs are intrinsically quite good at that because they're good at sifting text.

Getting then to drive something like a debugger interface seems harder from my experience (although the ChatDBG people showed some success - my experiments did too, but it took the tweaks I described).

My experiments are with Claude Opus 4, in Claude Code, primarily.

New comment by mark_undoio in "Can Large Language Models Play Text Games Well? (2023)"

mark_undoio — Fri, 04 Jul 2025 17:43:07 +0000

I'm fascinated by this paper because it feels like it could be a good analogue for "can LLMs handle a stateful, text-based tool". A debugger is my particular interest but there's no reason why it couldn't be something else.

To use a debugger, you need:

* Some memory of where you've already explored in the code (vs rooms in a dungeon)

* Some wider idea of your current goal / destination (vs a current quest or a treasure)

* A plan for how to get there - but the flexibility to adapt (vs expected path and potential monsters / dead ends)

* A way for managing information you've learned / state you've viewed (vs inventory)

Given text adventures are quite well-documented and there are many of them out there, I'd also like to take time out to experiment (at some point!) with whether presenting a command-line tool as a text adventure might be a useful "API".

e.g. an MCP server that exposes a tool but also provides a mapping of the tools concepts into dungeon adventure concepts (and back). If nothing else, the LLM's reasoning should be pretty entertaining. Maybe playing "make believe" will even make it better at some things - that would be very cool.

New comment by mark_undoio in "Undo × MCP: Time Traveling with Your AI Code Assistant"

mark_undoio — Fri, 04 Jul 2025 17:32:45 +0000

> It will be interesting to know what challenges came up in nudging the model to work better with time travel debug data, since this data is novel and the models today might not be well trained for making use of it.

This is actually quite interesting - it's something I'm planning to make a future post about.

But basically the LLM seems to be fairly good at using this interface effectively so long as we tuned what tools we provide quite carefully:

* Where we would want the LLM to use a tool sparingly it was better not to provide it at all. When you have time travel debugging it's usually better to work backwards since that tells you the causality of the bug. If we gave Claude the ability to step forward it tended to use it for everything, even when appropriate.

* LLMs weren't great at managing state they've set up. Allowing the LLM to set breakpoints just confused it later when it forget they were there.

* Open ended commands were a bad fit. For example, a time travel debugger can usually jump around in time according to an internal timebase. If the LLM was given access to that, unconstrained, it tended to just waste lots of effort guessing timebases and looking to see what was there.

* Sometimes the LLM just wants to hold something the wrong way and you have to let it. It was almost impossible to get the AI to understand that it could step back into a function on the previous line. It would always try going to the line, then stepping back, resulting in an overshoot. We had to just adapt the tool so that it could use it the way it thought it should work.

The overall result is actually quite satisfactory but it was a bit of a journey to understand how to give the LLM enough flexibility to generate insights without letting it get itself into trouble.

New comment by mark_undoio in "Gremllm"

mark_undoio — Fri, 04 Jul 2025 16:51:00 +0000

I am appalled and delighted by this.

It feels like an AI cousin to the Python error steamroller (https://github.com/ajalt/fuckitpy).

Whenever I see this sort of thing I think that there might be a non-evil application for it. But then I think ... where's the fun in that?

Undo × MCP: Time Traveling with Your AI Code Assistant

mark_undoio — Fri, 04 Jul 2025 14:29:54 +0000

Article URL: https://undo.io/resources/time-travel-ai-code-assistant/

Comments URL: https://news.ycombinator.com/item?id=44464888

Points: 10

# Comments: 2