Hacker News: summarity

New comment by summarity in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

summarity — Mon, 06 Apr 2026 16:11:49 +0000

Not claude code specific, but I've been noticing this on Opus 4.6 models through Copilot and others as well. Whenever the phrase "simplest fix" appears, it's time to pull the emergency break. This has gotten much, much worse over the past few weeks. It will produce completely useless code, knowingly (because up to that phrase the reasoning was correct) breaking things.

Today another thing started happening which are phrases like "I've been burning too many tokens" or "this has taken too many turns". Which ironically takes more tokens of custom instructions to override.

Also claude itself is partially down right now (Arp 6, 6pm CEST): https://status.claude.com/

New comment by summarity in "Herbie: Automatically improve imprecise floating point formulas"

summarity — Sat, 04 Apr 2026 18:43:51 +0000

This is somewhat in line with the approach taken by some softfloat libraries, e.g. https://bigfloat.org/architecture.html

New comment by summarity in "Claude Code Found a Linux Vulnerability Hidden for 23 Years"

summarity — Sat, 04 Apr 2026 11:14:43 +0000

Related work from our security lab:

Stream of vulnerabilities discovered using security agents (23 so far this year): https://securitylab.github.com/ai-agents/

Taskflow harness to run (on your own terms): https://github.blog/security/how-to-scan-for-vulnerabilities...

New comment by summarity in "Claude Code Found a Linux Vulnerability Hidden for 23 Years"

summarity — Sat, 04 Apr 2026 11:11:03 +0000

Already happend: https://arxiv.org/abs/2407.08708

New comment by summarity in "Herbie: Automatically improve imprecise floating point formulas"

summarity — Sat, 04 Apr 2026 10:58:13 +0000

I posted this and it picked up steam over night, so I thought I'd add how I'm using it:

I work on 3D/4D math in F#. As part of the testing strategy for algorithms, I've set up a custom agent with an F# script that instruments Roslyn to find FP and FP-in-loop hotspots across the codebase.

The agent then reasons through the implementation and writes core expressions into an FPCore file next to the existing tests, running several passes, refining the pres based on realistic caller input. This logs Herbie's proposed improvements as output FPCore transformations. The agent then reasons through solutions (which is required, Herbie doesn't know algorithm design intent, see e.g. this for a good case study: https://pavpanchekha.com/blog/herbie-rust.html), and once convinced of a gap, creates additional unit tests and property tests (FsCheck/QuickCheck) to prove impact. Then every once in a while I review a batch to see what's next.

Generally there are multiple types of issues that can be flagged:

a) Expression-level imprecision over realistic input ranges: this is Herbie's core strength. Usually this catches "just copied the textbook formula" instance of naive math. Cancellation, Inf/NaN propagation, etc. The fixes are consistently using fma for accumulation, biggest-factor scaling to prevent Inf, hypot use, etc.

b) Ill-conditioned algorithms. Sometimes the text books lie to you, and the algorithms themselves are unfit for purpose, especially in boundary regions. If there are multiple expressions that have a <60% precision and only a 1 to 2% improvement across seeds, it's a good sign the algo is bad - there's no form that adequately performs on target inputs.

c) Round-off, accumulation errors. This is more a consequence of agent reasoning, but often happens after an apparent "100% -> 100%" pass. The agent is able to, via failing tests, identify parts of an algorithm that can benefit from upgrading the context to e.g. double-word arithmetic for additional precision.

Herbie: Automatically improve imprecise floating point formulas

summarity — Tue, 31 Mar 2026 10:53:52 +0000

Article URL: https://herbie.uwplse.org/doc/latest/tutorial.html

Comments URL: https://news.ycombinator.com/item?id=47585469

Points: 197

# Comments: 44

New comment by summarity in "It Took Me 30 Years to Solve This VFX Problem – Green Screen Problem [video]"

summarity — Tue, 17 Mar 2026 23:53:20 +0000

Well sort of, the industry tried to go way beyond that by capturing the entire light field: https://techcrunch.com/2016/04/11/lytro-cinema-is-giving-fil...

New comment by summarity in "100 Jumps"

summarity — Fri, 13 Mar 2026 12:49:03 +0000

Finally, Desert Golf and Flappy Bird merged into one

New comment by summarity in "Thermal Grizzly was scammed twice on raw materials worth €40k"

summarity — Mon, 09 Mar 2026 07:13:41 +0000

This guys factory is just across the lake from where I live and this is painful to watch. Both Alibaba and the general local industry (metal fabs, train shops, etc) have high degrees of expertise in supply chain verification. You can hire (heck even bribe) experts along the way to reduce fuck ups. The video contained no mention of any audits, any additional paperwork beyond some pictures.

I once had a company that procured very simple electronics (fingerprint readers) from Taiwan and due diligence included travelling there, meeting every single person in the engineering office in person, then touring the contract factory where this would be built, then negotiating shipping and even driver development details.

This took all of one week and the price of a few plane tickets. We didn’t have the cash for professional auditors. In the end we got a product that worked, and even at a lower price (negotiating at a distance is not effective).

New comment by summarity in "Turn Dependabot off"

summarity — Sat, 21 Feb 2026 11:11:20 +0000

No engine can be 100% perfect of course, the original comment is broadly accurate though. CodeQL builds a full semantic database including types and dataflow from source code, then runs queries against that. QL is fundamentally a logic programming language that is only concerned with the satisfiably of the given constraint.

If dataflow is not provably connected from source to sink, an alert is impossible. If a sanitization step interrupts the flow of potentially tainted data, the alert is similarly discarded.

The end-to-end precision of the detection depends on the queries executed, the models of the libraries used in the code (to e.g., recognize the correct sanitizers), and other parameters. All of this is customizable by users.

All that can be overwhelming though, so we aim to provide sane defaults. On GitHub, you can choose between a "Default" and "Extended" suite. Those are tuned for different levels of potential FN/FP based on the precision of the query and severity of the alert.

Severities are calculated based on the weaknesses the query covers, and the real CVE these have caused in prior disclosed vulnerabilities.

QL-language-focused resources for CodeQL: https://codeql.github.com/

New comment by summarity in "Turn Dependabot off"

summarity — Sat, 21 Feb 2026 10:55:39 +0000

Heyo, I'm the Product Director for detection & remediation engines, including CodeQL.

I would love to hear what kind of local experience you're looking for and where CodeQL isn't working well today.

As a general overview:

The CodeQL CLI is developed as an open-source project and can run CodeQL basically anywhere. The engine is free to use for all open-source projects, and free for all security researchers.

The CLI is available as release downloads, in homebrew, and as part of many deployment frameworks: https://github.com/advanced-security/awesome-codeql?tab=read...

Results are stored in standard formats and can be viewed and processed by any SARIF-compatible tool. We provide tools to run CodeQL against thousands of open-source repos for security research.

The repo linked above points to dozens of other useful projects (both from GitHub and the community around CodeQL).

Opus 4.6 Fast Mode – 2.5x better token throughput in Copilot

summarity — Mon, 09 Feb 2026 11:42:41 +0000

Article URL: https://github.blog/changelog/2026-02-07-claude-opus-4-6-fast-is-now-in-public-preview-for-github-copilot/

Comments URL: https://news.ycombinator.com/item?id=46944190

Points: 1

# Comments: 0

New comment by summarity in "Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering"

summarity — Wed, 04 Feb 2026 11:29:12 +0000

That's why I put it in quotes. In no way am I equating anything. Making the inner workings visible is what I was referring to.

New comment by summarity in "Show HN: Ghidra MCP Server – 110 tools for AI-assisted reverse engineering"

summarity — Wed, 04 Feb 2026 09:45:49 +0000

Ive been using it (the original 15 tool version) for months now. It’s amazing. Any app's inner workings are suddenly transparent. I can track down bugs. Get a deeper understanding of any tool, and even write plug-ins or preload shims that mod any app. It’s like I finally actually _own_ the software I bought years ago.

For objective C heavy code, I also use Hopper Disassembler (which now has a built in MCP server).

Some related academic work (full recompilation with LLMs and Ghidra): https://dl.acm.org/doi/10.1145/3728958

TKO – Knockout.js Revived for 4.0

summarity — Sun, 01 Feb 2026 13:29:25 +0000

Article URL: https://www.tko.io/

Comments URL: https://news.ycombinator.com/item?id=46846079

Points: 2

# Comments: 1

New comment by summarity in "Heathrow scraps liquid container limit"

summarity — Tue, 27 Jan 2026 10:57:06 +0000

Not just the presence but the material itself: https://www.smithsdetection.com/products/sdx-10060-xdi/

It’s X-ray diffraction

New comment by summarity in "Heathrow scraps liquid container limit"

summarity — Tue, 27 Jan 2026 10:54:36 +0000

They’re multi wavelength CT. Basically whenever you see a 4:3 box with a “smiths” logo over the belt it’s going to be a pretty painless process (take nothing out except analog film)

Adrian Kosmaczewski – Being a Developer After 40 [video]

summarity — Sat, 24 Jan 2026 20:48:09 +0000

Article URL: https://www.youtube.com/watch?v=GQx_beRMHVg

Comments URL: https://news.ycombinator.com/item?id=46747509

Points: 2

# Comments: 0

New comment by summarity in "A Year of 3D Printing"

summarity — Fri, 23 Jan 2026 00:10:22 +0000

Flash forge is releasing theirs in the next few weeks.

Aphros-WASM: Computational fluid dynamics solver in the browser

summarity — Thu, 01 Jan 2026 13:41:48 +0000

Article URL: https://cselab.github.io/aphros/wasm/aphros.html

Comments URL: https://news.ycombinator.com/item?id=46454032

Points: 1

# Comments: 0