Hacker News: mufeedvh

New comment by mufeedvh in "N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?"

mufeedvh — Mon, 13 Apr 2026 22:37:41 +0000

This is a good idea.

Will incorporate false-positive rates into the rubric from the next run onwards.

At winfunc, we spent a lot of research time taming these models to eradicate false-positive rates (it's high!) so this does feel important enough to be documented. Thanks!

N-Day-Bench – Can LLMs find real vulnerabilities in real codebases?

mufeedvh — Mon, 13 Apr 2026 21:54:03 +0000

N-Day-Bench tests whether frontier LLMs can find known security vulnerabilities in real repository code. Each month it pulls fresh cases from GitHub security advisories, checks out the repo at the last commit before the patch, and gives models a sandboxed bash shell to explore the codebase.

Static vulnerability discovery benchmarks become outdated quickly. Cases leak into training data, and scores start measuring memorization. The monthly refresh keeps the test set ahead of contamination — or at least makes the contamination window honest.

Each case runs three agents: a Curator reads the advisory and builds an answer key, a Finder (the model under test) gets 24 shell steps to explore the code and write a structured report, and a Judge scores the blinded submission. The Finder never sees the patch. It starts from sink hints and must trace the bug through actual code.

Only repos with 10k+ stars qualify. A diversity pass prevents any single repo from dominating the set. Ambiguous advisories (merge commits, multi-repo references, unresolvable refs) are dropped.

Currently evaluating GPT-5.4, Claude Opus 4.6, Gemini 3.1 Pro, GLM-5.1, and Kimi K2.5. All traces are public.

Methodology: https://ndaybench.winfunc.com/methodology

Live Leaderboard: https://ndaybench.winfunc.com/leaderboard

Live Traces: https://ndaybench.winfunc.com/traces

Comments URL: https://news.ycombinator.com/item?id=47758347

Points: 78

# Comments: 26

New comment by mufeedvh in "I made a programming language with M&Ms"

mufeedvh — Mon, 09 Mar 2026 06:00:39 +0000

thank you! :)

New comment by mufeedvh in "I made a programming language with M&Ms"

mufeedvh — Mon, 09 Mar 2026 05:16:04 +0000

Funny you mention that, because yes, a combinator-style encoding is probably a cleaner fit for the “only six colors constraint than my stack machine. I hacked together a tiny SKI-flavored M&M reducer as a proof of concept: B=S, G=K, R=I, Y=(, O=), and N... is a free atom, so `B G G NNN` reduces to `a2`.

Gist: https://gist.github.com/mufeedvh/db930a423fdce8c1d8e495c7a3f...

New comment by mufeedvh in "I made a programming language with M&Ms"

mufeedvh — Mon, 09 Mar 2026 04:35:18 +0000

Yes, for messy real-world photos a lightweight CNN would probably outperform the deterministic decoder, but I’d still use it in a hybrid pipeline with classic CV for blob detection and deterministic logic for reconstructing the actual program.

New comment by mufeedvh in "I made a programming language with M&Ms"

mufeedvh — Mon, 09 Mar 2026 01:08:51 +0000

Yes! Just make sure to take a photo on a plain white surface is all.

With:

  uv run mnm decompile path/to/photo.png --mode photo

New comment by mufeedvh in "I made a programming language with M&Ms"

mufeedvh — Sun, 08 Mar 2026 21:47:37 +0000

Author of this silly project here!

Sharing a bit of backstory on why I decided to work on this; Firstly, “for fun” but primarily because I felt like I started losing the childlike wonder/whimsy I once had with programming.

So I started this new hobby where I ask myself “can I hack on this?” upon getting/seeing something.

For instance, I got this new Aula F75 keyboard (really good keyboard for the price btw, it sounds good too!) and it only has dedicated control software for Windows. So I downloaded the driver files, software executable, and manual sheet and reverse engineered the full protocol/packets and rebuilt it for my Mac. Then played snake with the backlights. Fun.

Anywho, happy to see my blog on the front page. Would love to hear if anyones going through something similar or working on silly little projects! :)

I made a programming language with M&Ms

mufeedvh — Sat, 07 Mar 2026 02:13:13 +0000

Article URL: https://mufeedvh.com/posts/i-made-a-programming-language-with-mnms/

Comments URL: https://news.ycombinator.com/item?id=47283723

Points: 5

# Comments: 0

How a single typo led to RCE in Firefox

mufeedvh — Fri, 13 Feb 2026 19:55:16 +0000

Article URL: https://kqx.io/post/firefox0day/

Comments URL: https://news.ycombinator.com/item?id=47006974

Points: 3

# Comments: 0

The Recent 0-Days in Node.js and React Were Found by an AI

mufeedvh — Tue, 03 Feb 2026 04:04:42 +0000

Article URL: https://winfunc.com/blog/recent-0-days-in-nodejs-and-react-were-found-by-an-ai

Comments URL: https://news.ycombinator.com/item?id=46866345

Points: 2

# Comments: 0

New Vulnerability in React Server Components – CVE-2026-23864

mufeedvh — Mon, 26 Jan 2026 21:11:39 +0000

Article URL: https://vercel.com/changelog/summary-of-cve-2026-23864

Comments URL: https://news.ycombinator.com/item?id=46771593

Points: 3

# Comments: 0

New comment by mufeedvh in "Show HN: Code2prompt – Generate LLM prompts from your codebase"

mufeedvh — Wed, 13 Mar 2024 00:39:52 +0000

Yes, this just depends on the model you're using. Small-medium size codebases would fit inside Claude's 200K context window and Gemini 1.5 has a 1M context window which would essentially fit 99% of codebases.

For reference:

- The Flask web framework for Python: 131880 tokens

- The Spring Framework for Java: 11070559 tokens

New comment by mufeedvh in "Show HN: Code2prompt – Generate LLM prompts from your codebase"

mufeedvh — Tue, 12 Mar 2024 20:55:09 +0000

Merged, thank you! :)

New comment by mufeedvh in "Show HN: Code2prompt – Generate LLM prompts from your codebase"

mufeedvh — Tue, 12 Mar 2024 16:22:13 +0000

For small codebases, you can run this tool on the entire directory and it would generate a well-formatted Markdown prompt detailing the source tree structure, and all the code. You can then upload this document to either GPT or Claude models with higher context windows and ask it to:

- Rewrite the code to another language.

- Find bugs/security vulnerabilities.

- Document the code.

- Implement new features.

You can customize the prompt template to achieve any of the desired use cases. It essentially traverses a codebase and creates a prompt with all source files combined. In short, it automates copy-pasting multiple source files into your prompt and formatting them along with letting you know how many tokens your code consumes.

Show HN: Code2prompt – Generate LLM prompts from your codebase

mufeedvh — Mon, 11 Mar 2024 20:32:28 +0000

I made code2prompt, a CLI tool to generate LLM prompts from your codebase with support for prompt templating and token counting.

I initially wrote this for personal use to utilize Claude 3.0's 200K context window and it has proven to be pretty useful so I decided to open-source it. Let me know what you think.

Comments URL: https://news.ycombinator.com/item?id=39672932

Points: 4

# Comments: 7

New comment by mufeedvh in "Krita fund has no corporate support"

mufeedvh — Thu, 05 Oct 2023 13:38:12 +0000

I have made a simple CLI utility[0] with this purpose in mind. It scans your entire filesystem for README.md and FUNDING.yml files for a set of donation/sponsor links and tag it with the associated repo (No HTTP calls, just the assumption that most repos link their support URL in either of these files). The output is a CSV sheet containing the open-source dependencies/libraries you use in your system that accepts donations.

I have plans to expand/plug this into a donation aggregator platform like you mentioned if time permits. But if there is an existing effort for the same, I am happy to contribute. :)

[0] - https://github.com/mufeedvh/paydept

Nobody’s on the Ball on AGI Alignment

mufeedvh — Wed, 29 Mar 2023 19:28:55 +0000

Article URL: https://www.forourposterity.com/nobodys-on-the-ball-on-agi-alignment/

Comments URL: https://news.ycombinator.com/item?id=35362080

Points: 3

# Comments: 2

Security in the age of LLMs

mufeedvh — Fri, 09 Dec 2022 05:53:53 +0000

Article URL: https://www.mufeedvh.com/llm-security/

Comments URL: https://news.ycombinator.com/item?id=33918026

Points: 42

# Comments: 6

New comment by mufeedvh in "Show HN: Binserve – Fast single-binary static web server"

mufeedvh — Mon, 13 Jun 2022 17:31:41 +0000

Thanks! :)

It currently does gzip compression by default. Compression modes for specific files sounds interesting, I will definitely get around to implementing that.

New comment by mufeedvh in "Show HN: Binserve – Fast single-binary static web server"

mufeedvh — Mon, 13 Jun 2022 16:43:09 +0000

Backstory: What started as a personal project to quickly host some web pages turned into a rabbit hole of yak shaving and that is how I ended up making Binserve. I automated the steps I usually take to host static pages into this project which is tweakable via the configuration file. And its also pretty fast.

Let me know your feedback/suggestions!