Hacker News: simonreiff

New comment by simonreiff in "Local privilege escalation via execve()"

simonreiff — Sun, 10 May 2026 04:39:43 +0000

I agree with explicit parentheses but please be careful about assuming associativity! The risk when handling floating-point arithmetic in particular is that associativity breaks, and suddenly a + (b + c) does NOT equal (a + b) + c. Not only can these lead to unexpected and hard-to-trace failure patterns, but depending on the details, they also can introduce memory overflow/underflow vulnerabilities.

New comment by simonreiff in "The React2Shell Story"

simonreiff — Sat, 09 May 2026 01:42:43 +0000

What a great write-up. Thanks for sharing how you found this fascinating vulnerability and exploit.

New comment by simonreiff in "Canvas online again as ShinyHunters threatens to leak schools’ data"

simonreiff — Fri, 08 May 2026 09:52:00 +0000

I'm sure you're right. Across tens (hundreds?) of thousands of institutions worldwide, each one is exercising its well-written incident runbook that not only gets updated regularly but also is rehearsed constantly, just in case something like this happens. After all, what university IT department DOESN'T prepare obsessively for the moment when they need to restore all grades on all assignments for all courses from backup and fall over to the backup system for final exam administration in any required format specified by any professor, in the second week of May, on a non-negotiable schedule? There's absolutely nothing to worry about here.

New comment by simonreiff in "AI file editing is broken"

simonreiff — Wed, 06 May 2026 14:38:54 +0000

I'm Simon, an attorney and partner at a boutique law firm in New York City, where I have been representing clients in high-stakes commercial and real estate disputes for almost 20 years. I've also been building software for many years, long before AI assistants existed, though these days, like most of you, I use AI coding agents regularly to boost productivity.

Last year, I hit a wall: I simply could not get my AI assistants to follow my instructions to edit my files accurately and reliably. I found this to be true regardless which model or client I chose, no matter how well I documented, and across all file types.

I began focusing on a specific scenario: The agent echoes back a perfectly reasonable plan to revise a file, they call their tools, and they announce completion; but, the file is corrupted. Even if not literally broken, the diff shows wholesale replacements, many unrelated to the underlying issue, when small, surgical modifications were warranted. I call this last-mile failure pattern "Execution Slop".

After investigation, I concluded that Execution Slop cannot be fixed through prompt engineering or by paying for more expensive tokens, because the AI file-editing tools themselves are broken. All major AI coding assistants use the same string-replacement strategy for editing under the hood. Agents can't visualize their changes before committing them to disk, get no warning that they're about to break something, can't roll back changes atomically if they realize they made a mistake, and often can't even insert or delete at a specific line or line range (let alone a particular column), without also echoing everything around it.

So I spent nearly a year building something completely different. HIC Mouse gives AI agents line- and coordinate-based editing through a natural syntax that allows agents to edit concisely by declaring region boundaries instead of forcing them to use string replacement. All multi-operation and large operations are automatically staged in memory before touching disk, triggering a Dialog Box mode, in which the agent can save, cancel, inspect, or refine. If something goes wrong, the agent can roll back edits atomically. If most of a batch succeeds but one operation fails, the agent can fix just the failure without discarding the rest. And agents are given embedded contextual guidance at every tool call.

To validate rigorously that HIC Mouse genuinely improves outcomes, I ran three preregistered confirmatory studies (N=67 paired runs) comparing Mouse-enabled AI assistants running in isolated Docker containers performing timed, realistic file-editing tasks ranging in difficulty, against identically configured agents using built-in editing tools. I've uploaded the technical report and statistical analysis with all the details, but the bottom line is that Mouse dramatically improved performance (Cohen's h > 2 or "massive" effect size on multiple metrics), across every dimension that I studied -- capability, speed, cost, reliability, and most importantly, accuracy.

We have now officially launched, and HIC Mouse is available for download through the VS Code Marketplace and Open VSX. Mouse works with VS Code, Cursor, and Kiro, and it's compatible with GitHub Copilot, Claude Code, and other MCP clients.

Please consider installing HIC Mouse, and let me know what you think! I really hope that it genuinely makes a positive difference for you.

Now let me ask you: How often do you encounter Execution Slop, and what are you doing to avoid it?

AI file editing is broken

simonreiff — Wed, 06 May 2026 14:38:54 +0000

Article URL: https://hic-ai.com

Comments URL: https://news.ycombinator.com/item?id=48036814

Points: 2

# Comments: 1

New comment by simonreiff in "Zugzwang"

simonreiff — Sat, 02 May 2026 20:10:05 +0000

The Wikipedia article goes on to say that other authors describe the second type as a "squeeze" -- I think Kemp uses that term -- and only the mutual or reciprocal kind as a true "zugzwang". I can't remember if it was GM Edmar Mednis or IM Rafael Klovsky who told me many years ago that it's only the mutual scenario that qualifies as a "true" zugzwang, but I'm pretty sure it was one or both of them. Either way, the subject has divided chess authors almost since inception of the term in the first place. You can see the Wikipedia article on Immortal Zugzwang, for instance, which is one of the earliest famous examples of "zugzwang" and is featured in Nimzovitch's classic treatise "My System", and at the same time, many other famous players like IM Andy Soltis and others disagreed with the use of the term for that game.

A great article with some really beautiful examples of zugzwang is: https://www.chesshistory.com/winter/extra/zugzwang.html. There's a very nice discussion at the end as well of a disagreement along just these lines as to what truly constitutes zugzwang, between Hooper and Myers.

New comment by simonreiff in "Zugzwang"

simonreiff — Sat, 02 May 2026 18:28:56 +0000

Interestingly, many people will refer to zugzwang when one player only has losing moves and would love to skip their turn altogether, but that's not zugzwang. As a non-example of zugzwang, consider the position with White having a Kb6 and Rc6, and Black just has Kb8. When White moves 1. Rc5, killing a move, Black has no choice but to move 1...Ka8 followed by 2. Rc8#. However, Black is not in zugzwang, because the position is not mutually bad for either player. As a true example of zugzwang, consider the example where White has a Kf5, pawn on e4, Black has a Kd4 and pawn on e5. Now this position is zugzwang because whichever player has to make the next move loses defense of their pawn and with it, the game. For instance, if it's White to move, the game could continue 1. Kf6 Ke4 2. Kg5 Kf3 3. Kf5 e4 and Black will simply march his e-pawn to the 1st rank, promote to a Queen, and checkmate shortly after.

New comment by simonreiff in "Our principles"

simonreiff — Mon, 27 Apr 2026 20:01:08 +0000

I believe Groucho Marx once said: "I'm a man of principles. If you don't like them, I have others!"

New comment by simonreiff in "Show HN: Smol machines – subsecond coldstart, portable virtual machines"

simonreiff — Fri, 17 Apr 2026 21:33:21 +0000

Hey this is pretty neat! I definitely would try using this for benchmarks and other places where I need strong isolation as Docker is just too bloated and slow, but sadly I don't think I can run this natively on my Windows laptop. I hope you extend to WSL! Good luck and congrats on launch.

New comment by simonreiff in "Middle schooler finds coin from Troy in Berlin"

simonreiff — Fri, 17 Apr 2026 21:22:00 +0000

Antiquity slop

New comment by simonreiff in "Claude Design"

simonreiff — Fri, 17 Apr 2026 19:28:14 +0000

Tailwind is fantastic precisely because the biggest benefit (tree-shaking to minimize the CSS that ships) massively outweighs the fact that Tailwind syntax "looks like" an anti-pattern and makes your code "look" ugly. Also, you get used to bundling your styling and JS code in one place with any component-driven framework like Next.js/React, and Tailwind works seamlessly with all of them. I guess I just prefer the benefits to the alternative, and I feel like the collateral damage of the alternative is definitely not worth trying to make front-end design code look simpler.

New comment by simonreiff in "We reproduced Anthropic's Mythos findings with public models"

simonreiff — Fri, 17 Apr 2026 15:25:29 +0000

I respectfully disagree that Mythos was important because of its findings of zero-day vulnerabilities. The problem is that Mythos apparently can fully EXPLOIT the vulnerabilities found by putting together the actual attack scripts and executing it, often by taking advantage of disparate issues spread across multiple libraries or files. Lots of tools can and do identify plausible attack vectors reliably, including SASTs and AI-assisted analysis. The whole challenge to replicate Mythos, in my view, should focus on determining whether, on the precise conditions of a particular code base and configuration, the alleged vulnerability actually is reachable and can be exploited; and then, not just to evaluate or answer that question of reachability in the abstract, but to build a concrete implementation of a proof of concept demonstrating the vulnerability from end to end. It is my understanding from the Project Glasswing post that the latter is what Mythos is exceptionally good at doing, and it is what distinguishes SASTs and asking AI from the work done up until now only by a handful of cybersecurity experts. Up to this point, the ability to generate an exploit PoC and not just ascertain that one might be possible is generally possible using existing tools but might not be very easy or achievable without a lot of work and oversight by a programmer experienced in cybersecurity exploits. I don't have any reason to doubt the conclusion that GPT-5.4 and Opus 4.6 can spot lots of the same issues that Mythos found. What I think would be genuinely interesting is if GPT-5.4 or Opus 4.6 also could be tested for their ability to generate a proof of concept of the attack. Generally, my experience has been that portions of the attack can be generated by those agents, but putting the whole thing together runs into two hurdles: 1. Guardrails, and 2. Overall difficulty, lack of imagination, lack of capability to implement all the disparate parts, etc. I don't know if Mythos is capable of what is being claimed, but I do think it's important to understand why their claims are so significant. It's definitely NOT the mere ability to find possible exploits.

New comment by simonreiff in "US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]"

simonreiff — Wed, 15 Apr 2026 17:35:54 +0000

It's not a communication if only one human person participates in the conversation. That's just enhanced note-taking and generating. I don't agree with the notion that talking to an LLM is disclosure to a third party because an LLM is neither a natural person nor even an artifical person recognized at law like a corporation, trust, LLC, etc.

New comment by simonreiff in "US v. Heppner (S.D.N.Y. 2026) no attorney-client privilege for AI chats [pdf]"

simonreiff — Wed, 15 Apr 2026 17:32:12 +0000

Attorney admitted in NY here. It's fascinating that Judge Rakoff likely would have come to the opposite conclusion if the Claude chat was at the attorney's request or suggestion. I am surprised the court placed so much reliance on the Terms of Service, which are probably not so different than those of Outlook, Gmail, etc., say, yet nobody disputes that attorney-client emails remain privileged notwithstanding the Terms of Service of those providers. At least I have never seen anyone argue in NY that privilege is waived by emailing. And unlike sending an email to another person, chatting with Claude is a solo conversation more like organizing one's notes, which if in contemplation of obtaining legal advice seems privileged to me. I think this is a very close question and am not sure it would come out the same way in other courts or on even slightly different facts. Very interesting legal question.

New comment by simonreiff in "Open Source Isn't Dead. Cal.com Just Learned the Wrong Lesson"

simonreiff — Wed, 15 Apr 2026 17:17:13 +0000

Is there any recent research on whether open or closed-source projects are more secure? I am genuinely curious if anyone has studied the question.

New comment by simonreiff in "One item purchased, ten emails"

simonreiff — Wed, 08 Apr 2026 19:41:02 +0000

Gmail (at least in Google Workspace) and Outlook 365 both do threaded emails.

New comment by simonreiff in "Eight years of wanting, three months of building with AI"

simonreiff — Sun, 05 Apr 2026 22:47:49 +0000

Just wanted to say thanks to @brlee for the nice write-up and congrats on the release

New comment by simonreiff in "Claude Code Unpacked : A visual guide"

simonreiff — Wed, 01 Apr 2026 06:08:34 +0000

Nice site. I might suggest moving SendMessage to the Hidden Features as they don't appear to have implemented a ReadMessage or ListMessages tools.

New comment by simonreiff in "OpenAI closes funding round at an $852B valuation"

simonreiff — Tue, 31 Mar 2026 21:45:06 +0000

It's the demonstration layer

New comment by simonreiff in "Deploytarot.com – tarot card reading for deployments"

simonreiff — Thu, 26 Mar 2026 22:25:42 +0000

I don't even understand this but it's very nice