Hacker News: moyix

New comment by moyix in "Project Glasswing: An Initial Update"

moyix — Fri, 22 May 2026 20:24:20 +0000

I think you're confusing CVEs and vulnerabilities here? Mozilla (per their longstanding practice) grouped multiple vulnerabilities found internally under a small number of CVEs.

Reverse Engineering SimTower

moyix — Wed, 29 Apr 2026 12:45:14 +0000

Article URL: https://phulin.me/blog/simtower

Comments URL: https://news.ycombinator.com/item?id=47947552

Points: 4

# Comments: 1

New comment by moyix in "The zero-days are numbered"

moyix — Wed, 22 Apr 2026 19:14:21 +0000

On hardened targets and Firecracker specifically, here's a recent vulnerability found by "Anthropic": https://aws.amazon.com/security/security-bulletins/2026-015-...

Unfortunately it's unclear whether it was Mythos, an earlier model, or even an eagle-eyed employee.

I tend to agree that bug squashing your way to perfectly secure software is unlikely, but there are plenty of projects that managed to fuzz/test/audit their way to making it much harder to find serious vulnerabilities. If we can do the same again with LLMs in a way that leaves the remaining vulnerabilities out of reach of anyone except extremely skilled humans (perhaps with LLM assistance) then that's still an OK outcome that buys us time to build stronger foundations.

New comment by moyix in "Vulnerability research is cooked"

moyix — Mon, 30 Mar 2026 21:31:50 +0000

It's limiting from the PoV of a developer who wants to ensure that their own code is free of all security issues. It is not limiting from the point of view of an attacker who just needs one good memory safety vuln to win.

New comment by moyix in "Vulnerability research is cooked"

moyix — Mon, 30 Mar 2026 21:16:01 +0000

This is true for a lot of things but for low-level code you can always fall back to "the intention is to not violate memory safety".

New comment by moyix in "Letting Claude play text adventures"

moyix — Wed, 21 Jan 2026 22:11:57 +0000

Also, unlike OpenAI, Anthropic's prompt caching is explicit (you set up to 4 cache "breakpoints"), meaning if you don't implement caching then you don't benefit from it.

New comment by moyix in "The coming industrialisation of exploit generation with LLMs"

moyix — Mon, 19 Jan 2026 23:20:06 +0000

There is filtering mentioned, it's just not done by a human:

> I have written up the verification process I used for the experiments here, but the summary is: an exploit tends to involve building a capability to allow you to do something you shouldn’t be able to do. If, after running the exploit, you can do that thing, then you’ve won. For example, some of the experiments involved writing an exploit to spawn a shell from the Javascript process. To verify this the verification harness starts a listener on a particular local port, runs the Javascript interpreter and then pipes a command into it to run a command line utility that connects to that local port. As the Javascript interpreter has no ability to do any sort of network connections, or spawning of another process in normal execution, you know that if you receive the connect back then the exploit works as the shell that it started has run the command line utility you sent to it.

It is more work to build such "perfect" verifiers, and they don't apply to every vulnerability type (how do you write a Python script to detect a logic bug in an arbitrary application?), but for bugs like these where the exploit goal is very clear (exec code or write arbitrary content to a file) they work extremely well.

New comment by moyix in "'World Models,' an old idea in AI, mount a comeback"

moyix — Tue, 02 Sep 2025 18:54:31 +0000

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

New comment by moyix in "Passkeys are just passwords that require a password manager"

moyix — Tue, 05 Aug 2025 02:12:08 +0000

There's also a FIDO standard in the works for how to export passkeys: https://blog.1password.com/fido-alliance-import-export-passk...

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:25:42 +0000

The main difference is that all of the vulnerabilities reported here are real, many quite critical (XXE, RCE, SQLi, etc.). To be fair there were definitely a lot of XSS, but the main reason for that is that it's a really common vulnerability.

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:22:31 +0000

All of these reports came with executable proof of the vulnerabilities – otherwise, as you say, you get flooded with hallucinated junk like the poor curl dev. This is one of the things that makes offensive security an actually good use case for AI – exploits serve as hard evidence that the LLM can't fake.

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:19:24 +0000

Wait a sec, I thought they were optional?

> White Paper/Slide Deck/Supporting Materials (optional)

> • If you have a completed white paper or draft, slide deck, or other supporting materials, you can optionally provide a link for review by the board.

> • Please note: Submission must be self-contained for evaluation, supporting materials are optional.

> • PDF or online viewable links are preferred, where no authentication/log-in is required.

(From the link on the BHUSA CFP page, which confusingly goes to the BH Asia doc: https://i.blackhat.com/Asia-25/BlackHat-Asia-2025-CFP-Prepar... )

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:12:19 +0000

Yeah, it's been very strange being on the other side of that after 10 years in academia! But it's totally reasonable for people to be skeptical when there's a bunch of money sloshing around.

I'll see if I can get time to do a paper to accompany the BH talk. And hopefully the agent traces of individual vulns will also help.

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:09:24 +0000

This is discussed in the post – many came down to individual programs' policies e.g. not accepting the vulnerability if it was in a 3rd party product they used (but still hosted by them), duplicates (another researcher reported the same vuln at the same time; not really any way to avoid this), or not accepting some classes of vuln like cache poisoning.

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 19:04:07 +0000

We've got a bunch of agent traces on the front page of the web site right now. We also have done writeups on individual vulnerabilities found by the system, mostly in open source right now (we did some fun scans of OSS projects found on Docker Hub). We have a bunch more coming up about the vulns found in bug bounty targets. The latter are bottlenecked by getting approval from the companies affected, unfortunately.

Some of my favorites from what we've released so far:

- Exploitation of an n-day RCE in Jenkins, where the agent managed to figure out the challenge environment was broken and used the RCE exploit to debug the server environment and work around the problem to solve the challenge: https://xbow.com/#debugging--testing--and-refining-a-jenkins...

- Authentication bypass in Scoold that allowed reading the server config (including API keys) and arbitrary file read: https://xbow.com/blog/xbow-scoold-vuln/

- The first post about our HackerOne findings, an XSS in Palo Alto Networks GlobalProtect VPN portal used by a bunch of companies: https://xbow.com/blog/xbow-globalprotect-xss/

New comment by moyix in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

moyix — Tue, 24 Jun 2025 18:52:01 +0000

You should come to my upcoming BlackHat talk on how we did this while avoiding false positives :D

https://www.blackhat.com/us-25/briefings/schedule/#ai-agents...

New comment by moyix in "Too Many Open Files"

moyix — Fri, 06 Jun 2025 19:58:51 +0000

I made a CTF challenge based on that lovely feature of select() :D You could use the out-of-bounds bitset memory corruption to flip bits in an RSA public key in a way that made it factorable, generate the corresponding private key, and use that to authenticate.

https://threadreaderapp.com/thread/1723398619313603068.html

New comment by moyix in "I used o3 to find a remote zeroday in the Linux SMB implementation"

moyix — Sat, 24 May 2025 22:16:44 +0000

With security vulnerabilities, you don't give the agent the ability to modify the potentially vulnerable software, naturally. Instead you make them do what an attacker would have to do: come up with an input that, when sent to the unmodified program, triggers the vulnerability.

How do you know if it triggered the vulnerability? Luckily for low-level memory safety issues like the ones Sean (and o3) found we have very good oracles for detecting memory safety, like KASAN, so you can basically just let the agent throw inputs at ksmbd until you see something that looks kind of like this: https://groups.google.com/g/syzkaller/c/TzmTYZVXk_Q/m/Tzh7SN...

New comment by moyix in "I used o3 to find a remote zeroday in the Linux SMB implementation"

moyix — Sat, 24 May 2025 18:11:51 +0000

He did do exactly what you say – except right after that, while reviewing the outputs, he found that it had also discovered a different 0day.

New comment by moyix in "AI is stifling new tech adoption?"

moyix — Fri, 14 Feb 2025 14:22:18 +0000

One thing that is interesting is that this was anticipated by the OpenAI Codex paper (which led to GitHub Copilot) all the way back in 2021:

> Users might be more inclined to accept the Codex answer under the assumption that the package it suggests is the one with which Codex will be more helpful. As a result, certain players might become more entrenched in the package market and Codex might not be aware of new packages developed after the training data was originally gathered. Further, for already existing packages, the model may make suggestions for deprecated methods. This could increase open-source developers’ incentive to maintain backward compatibility, which could pose challenges given that open-source projects are often under-resourced (Eghbal, 2020; Trinkenreich et al., 2021).

https://arxiv.org/pdf/2107.03374 (Appendix H.4)