Hacker News: gnat

New comment by gnat in "Show HN: Azure DevOps TUI Management Style"

gnat — Wed, 17 Jun 2026 01:43:31 +0000

Nice! Claude and I made something for my use called 'azdo' that's the 'gh' CLI but for Azure DevOps. I figured agents know how to use gh, so give them what they know. I haven't yet tried aliasing so they don't (at first) know it's Azure DevOps but that's coming ...

New comment by gnat in "Beeper – All your chats in one app"

gnat — Sat, 13 Jun 2026 09:08:57 +0000

I like it a lot. AND it has MCP server so I can point my agent at it to search and research. Very impressive.

New comment by gnat in "Microsoft starts canceling Claude Code licenses"

gnat — Fri, 22 May 2026 19:22:52 +0000

Being able to mange context over long running sessions is a function of the harness, not the model. Are you using Claude Code with GPT5.5? Codex? piclaw? They’ll all have different context management strategies to let you keep going when you would otherwise have filled up context and be forced to stop.

New comment by gnat in "DeepClaude – Claude Code agent loop with DeepSeek V4 Pro, 17x cheaper"

gnat — Sun, 03 May 2026 22:58:58 +0000

This repo's README explains how it works and you can do it yourself. claude looks for environment variables that say which API endpoint to talk to, which key to pass, which model name to use for haiku/sonnet/opus-level workloads, etc.

New comment by gnat in "Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML"

gnat — Sun, 03 May 2026 08:02:40 +0000

Forgot to add: I get several benefits from doing this.

1. Specifications that live outside the code. We have a lot of code for which "what should this do?" is a subjective answer, because "what was this written to do?" is either oral legend or lost in time. As future Claude sessions add new features, this is how Claude can remember what was intentional in the existing code and what were accidents of implementation. And they're useful for documenters, support, etc.

2. Specifications that stay up to date as code is written. No spec survives first contact with the enemy (implementation in the real world). "Huh, there are TWO statuses for Missing orders, but we wrote this assuming just one. How do we display them? Which are we setting or is it configurable?" etc. Implementer finds things the specifier got wrong about reality, things the specifier missed that need to be specified/decided, and testing finds what they both missed.

I have a colleague working on saving architecture decisions, and his description of it feels like a higher-abstraction version of my saving and maintaining requirements.

New comment by gnat in "Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML"

gnat — Sun, 03 May 2026 07:52:14 +0000

Nice! Your spec-maxxing is very resonant. I've been doing working with explicit requirements: elicit them from conversation with me or introspecting another piece of software; one-shot from them; and keep them up-to-date as I do the "old man shouts at Claude" iterations after whatever one-shotting came up with.

Unlike you, I wish for the LLM to do as much of the work as possible -- but "as possible" is doing a lot of work in that sentence. I'm still trying to get clear on exactly where I am needed and where Opus and iterations will get there eventually.

It has really challenged me to get clearer on what a requirement is vs a constraint (e.g., "you don't get to reinvent the database schema, we're building part of a larger system"). And I still battle with when and how to specify UI behaviours: so much UI is implicit, and it seems quite daunting to have to specify so much to get it working. I have new respect for whoever wrote the undoubtedly bajillion tests for Flutter and other UI toolkits.

New comment by gnat in "Addicted to Claude Code–Help"

gnat — Sun, 08 Mar 2026 00:47:30 +0000

I'm not trying to convert you, just want to share process tips that I see working for me and others. We're using agents, not a chat, because they can do complex work in pursuit of a goal.

1. Make artifacts. If you're doing research into a tech, or a hypothesis, then fire off subagents to explore different parts of the problem space, each reporting back into a doc. Then another agent synthesizes the docs into a conclusion/report.

2. Require citations. "Use these trusted sources. Cite trusted sources for each claim. Cite with enough context that it's clear your citations supports the claim, and refuse to cite if the citation doesn't support the claim."

3. Review. This lets you then fire off a subagent to review the synthesis. It can have its own prompt: look for confirming and disconfirming evidence, don't trust uncited claims. If you find it making conflation mistakes, figure out at what stage and why, and adjust your process to get in front of them.

4. Manage your context. LLM only has a fixed context size ("chat length") and facts & instructions at the front of that tend to be better hewn to than things at the end. Subagents are a way of managing that context to get more from a single run. Artifacts like notebooks or records of subagent output move content outside the context so you can pick up in a new session ("chat") and continue the work.

It's less fun that just having a chat with ChatGPT. I find that I get much better quality results using these techniques. Hope this helps! If you're not interested in doing this (too much like work, and you already have something that works), it's no skin off my nose. All the best!

New comment by gnat in "ai;dr"

gnat — Thu, 12 Feb 2026 23:40:42 +0000

We're in the brief window of time when AI's writing style is the weirdness. It's an artifact of the production process, like JPG blur, MP3 distortion, autotune's rigidity. And it didn't take long for those things to become normalized, in fact for them to become artifacts that people proudly adopted and embraced. DJs release tracks built from MP3s samples instead of waves. Autotune is famously a 'sound' that was once something to be subtly added and never confessed to, but which now genres and artists lean into rather than away from.

Long story short: I think emoji in headings and lists, em dashes, and the vile TED Talk paragraph structure of "long sentence with lots of words asking a question or introducing a possibility. followed by. short sentences. rebutting. or affirming." are here to stay. My money is that it gets normalized and embraced as "well of course that's how you best communicate because I see it everywhere."

New comment by gnat in "Genetic Data from over 20k U.S. Children Misused for 'Race Science'"

gnat — Sat, 24 Jan 2026 14:07:02 +0000

https://archive.is/2026.01.24-103304/https://www.nytimes.com...

New comment by gnat in "I know we're in an AI bubble because nobody wants me"

gnat — Sat, 29 Nov 2025 13:43:47 +0000

(Hi, Tom!) Reread the article and look for “CPU”. The whole article is about doing deep learning on CPUs not GPUs. Moonshine, the open source project and startup he talks about, shows speech recognition and realtime translation on the device rather than on a server. My understanding is that doing The Math in parallel is itself a performance hack, but Doing Less Math is also a performance hack.

New comment by gnat in "Agent design is still hard"

gnat — Sat, 22 Nov 2025 18:38:10 +0000

What have you done to make Claude stronger on brownfields work? This is very interesting to me.

New comment by gnat in "GPT-5.1: A smarter, more conversational ChatGPT"

gnat — Wed, 12 Nov 2025 22:22:14 +0000

I hate its acknowledgement of its personality prompt. Try having a series of back and forth and each response is like “got it, keeping it short and professional. Yes, there are only seven deadly sins.” You get more prompt performance than answer.

New comment by gnat in "A Global Web of Chinese Propaganda Leads to a U.S. Tech Mogul"

gnat — Mon, 10 Nov 2025 16:22:29 +0000

Tl;Dr: ThoughtWorks founder is spending his millions portraying Chinese government policies, including Xinjian/Uighurs, in a positive light. His spending his heavily laundered but he’s now based in China, and working in the same offices as a propaganda company.

New comment by gnat in "Melvyn Bragg steps down from presenting In Our Time"

gnat — Thu, 04 Sep 2025 08:15:45 +0000

Calendar was brilliant. I think it was the first time I fully appreciated the misery of the human mind in the face of various orbit periods that aren't simple integer ratios of one another. https://www.bbc.co.uk/programmes/p00548m9

Great Fire of London too. Pepys burying his cheese! https://www.bbc.co.uk/programmes/b00ft63q

Politeness. Social barriers were coming down, you were interacting with people of different rank, how do you not get into a swordfight? Also, the letter from the wife complaining about her husband! https://www.bbc.co.uk/programmes/p004y29m

I think they did all the big interesting things in history and then struggled with a lot of minor events that were hard to find interesting angles on.

New comment by gnat in "Testosterone Didn't Rapidly Decline"

gnat — Sat, 16 Aug 2025 19:04:12 +0000

Thank you! Worth reading, if only for the phrase “global taint ruler”.

Tobin Tax

gnat — Tue, 12 Aug 2025 19:12:39 +0000

Article URL: https://en.wikipedia.org/wiki/Tobin_tax

Comments URL: https://news.ycombinator.com/item?id=44880613

Points: 3

# Comments: 0

New comment by gnat in "MCP overlooks hard-won lessons from distributed systems"

gnat — Sun, 10 Aug 2025 01:00:52 +0000

Are you two talking at cross-purposes because you don't have a shared understanding of control and data flow?

The pieces here are:

* Claude Code, a Node (Javascript) application that talks to MCP server(s) and the Claude API

* The MCP server, which exposes some tools through stdin or HTTP

* The Claude API, which is more structured than "text in, text out".

* The Claude LLM behind the API, which generates a response to a given prompt

Claude Code is a Node application. CC is configured in JSON with a list of MCP servers. When CC starts up, CC"s Javascript initialises each server and as part of that gets a list of callable functions.

When CC calls the LLM API with a user's request, it's not just "here is the user's words, do it". There are multiple slots in the request object, one of which is a "tools" block, a list of the tools that can be called. Inside the API, I imagine this is packaged into a prefix context string like "you have access to the following tools: tool(args) ...". The LLM API probably has a bunch of prompts it runs through (figure out what type of request the user has made, maybe using different prompts to make different types of plan, etc.) and somewhere along the way the LLM might respond with a request to call a tool.

The LLM API call then returns the tool call request to CC, in a structured "tool_use" block separate from the freetext "hey good news, you asked a question and got this response". The structured block means "the LLM wants to call this tool."

CC's JS then calls the server with the tool request and gets the response. It validates the response (e.g., JSON schemas) and then calls the LLM API again bundling up the success/failure of the tool call into a structured "tool_result" block. If it validated and was successful, the LLM gets to see the MCP server's response. If it failed to validate, the LLM gets to see that it failed and what the error message was (so the LLM can try again in a different way).

The idea is that if a tool call is supposed to return a CarMakeModel string ("Toyota Tercel") and instead returns an int (42), JSON Schemas can catch this. The client validates the server's response against the schema, and calls the LLM API with

  {
    "type": "tool_result",
    "tool_use_id": "abc123",
    "is_error": true,
    "content": [
      {
        "type": "text",
        "text": "Expected string, got integer."
      }
    ]
  }

So the LLM isn't choosing to call the validator, it's the deterministic Javascript that is Claude Code that chooses to call the validator.

There are plenty of ways for this to go wrong: the client (Claude Code) has to validate; int vs string isn't the same as "is a valid timestamp/CarMakeModel/etc"; if you helpfully put the thing that failed into the error message ("Expect string, got integer (42)") then the LLM gets 42 and might choose to interpret that as a CarMakeModel if it's having a particularly bad day; the LLM might say "well, that didn't work, but let's assume the answer was Toyota Tercel, a common car make and model", ... We're reaching here, yet these are possible.

But the basic flow has validation done in deterministic code and hiding the MCP server's invalid responses from the LLM. The LLM can't choose not to validate. You seemed to be saying that the LLM could choose not to validate, and your interlocutor was saying that was not the case.

I hope this helps!

New comment by gnat in "Virtual 6NF"

gnat — Sat, 09 Aug 2025 11:12:42 +0000

The newsletter this is from is full of very clear writing about SQL, practically applying theory without getting lost in a tangle of database theory jargon. If you need to read or write SQL then I think you’ll find it as interesting as I have.

New comment by gnat in "U.S. fires statistics chief after soft jobs report"

gnat — Sat, 02 Aug 2025 02:01:07 +0000

From the excellent "Why Nations Fail" by Daron Acemoglu and James A. Robinson:

> An example of what could happen if you took your job too seriously, rather than successfully second-guessing what the Communist Party wanted, is provided by the Soviet census of 1937. As the returns came in, it became clear that they would show a population of about 162 million, far less than the 180 million Stalin had anticipated and indeed below the figure of 168 million that Stalin himself announced in 1934. The 1937 census was the first conducted since 1926, and therefore the first one that followed the mass famines and purges of the early 1930s. The accurate population numbers reflected this. Stalin's response was to have those who organized the census arrested and sent to Siberia or shot. He ordered another census, which took place in 1939. This time the organizers got it right; they found that the population was actually 171 million.

New comment by gnat in "U.S. fires statistics chief after soft jobs report"

gnat — Sat, 02 Aug 2025 00:36:46 +0000

Thanks for that. https://archive.is/84YPk for folks without a Bloomberg subscription.

The jobs data comes from surveys of businesses and consumers. Fewer of each category are responding, continuing a long-term trend of declining response rates. Cuts affect their ability to collect data with about 15% of the sample "suspended" -- i.e. not done "to align survey workload with resource levels" in the words of the announcement linked from the Bloomberg article.

> "The more data you’re missing and comes in later, the higher the odds the revisions will be much larger," said Omair Sharif, president of Inflation Insights LLC. "Fifty percent is just not enough."