Hacker News: chromaton

New comment by chromaton in "EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages"

chromaton — Fri, 20 Mar 2026 03:16:27 +0000

I did something very similar last year, but with programming languages that were REALLY out of distribution; they were generated specifically for the benchmark. I call it TiānshūBench (天书Bench): https://jeepytea.github.io/general/introduction/2025/05/29/t...

Some models were OK at solving very simple problems, but nearly all of them would, for example, hallucinate control structures that did not exist in the target language.

New comment by chromaton in "Verification debt: the hidden cost of AI-generated code"

chromaton — Sat, 07 Mar 2026 18:36:19 +0000

Historically, the cycle has been requirements -> code -> test, but with coding becoming much faster, the bottlenecks have changed. That's one of the reasons I've been working on Spark Runner to help automate testing for web apps: https://https://github.com/simonarthur/spark-runner

New comment by chromaton in "Spark Runner: Easily Automate Front End Tests"

chromaton — Sat, 07 Mar 2026 02:43:35 +0000

I've recently found that my ability to add new features and squash bugs has outpaced my ability to do full end-to-end tests. To help with this, I created Spark Runner for automated website testing. It will create and execute a plan for tasks you give it in plain text like "add an item to the shopping cart" or you just point it at your front end code and have Spark Runner create the tests for you. It also makes nice reports telling you what's working and what's not.

New project, so feedback is welcome.

Spark Runner: Easily Automate Front End Tests

chromaton — Sat, 07 Mar 2026 02:43:35 +0000

Article URL: https://github.com/simonarthur/spark-runner/

Comments URL: https://news.ycombinator.com/item?id=47283917

Points: 1

# Comments: 1

New comment by chromaton in "When AI writes the software, who verifies it?"

chromaton — Tue, 03 Mar 2026 22:15:08 +0000

TFA seems to be big on mathematical proof of correctness, but how do you ever know you're proving the right thing?

New comment by chromaton in "The programmers who live in Flatland"

chromaton — Sun, 07 Dec 2025 17:12:45 +0000

Lisp has been around for 65 years (not 50 as in the author believes), and is one of the very first high-level programming languages. If it was as great as its advocates say, surely it would have taken over the world by now. But it hasn't, and advocates like PG and this article author don't understand why or take any lessons from that.

New comment by chromaton in "GPT-5: "How many times does the letter b appear in blueberry?""

chromaton — Sun, 10 Aug 2025 00:38:10 +0000

Moravec strikes again.

New comment by chromaton in "GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it"

chromaton — Sun, 10 Aug 2025 00:33:10 +0000

For my benchmarking suite, it turns out that it's about 1/5 the price of Claude Sonnet 4.1, with roughly comparable results.

New comment by chromaton in "How I code with AI on a budget/free"

chromaton — Sun, 10 Aug 2025 00:31:53 +0000

If you're looking for free API access, Google offers access to Gemini for free, including for gemini-2.5-pro with thinking turned on. The limit is... quite high, as I'm running some benchmarking and haven't hit the limit yet.

Open weight models like DeepSeek R1 and GPT-OSS are also made available with free API access from various inference providers and hardware manufacturers.

New comment by chromaton in "Open models by OpenAI"

chromaton — Tue, 05 Aug 2025 18:33:29 +0000

This has been available (20b version, I'm guessing) for the past couple of days as "Horizon Alpha" on Openrouter. My benchmarking runs with TianshuBench for coding and fluid intelligence were rate limited, but the initial results show worse results that DeepSeek R1 and Kimi K2.

New comment by chromaton in "François Chollet: The Arc Prize and How We Get to AGI [video]"

chromaton — Mon, 07 Jul 2025 13:55:21 +0000

Current AI systems don't have a great ability to take instructions or information about the state of the world and produce new output based upon that. Benchmarks that emphasize this ability help greatly in progress toward AGI.

TiānshūBench Intermediate Release 0.0.X

chromaton — Sun, 08 Jun 2025 23:50:34 +0000

Article URL: https://jeepytea.github.io/general/update/2025/06/08/update00x.html

Comments URL: https://news.ycombinator.com/item?id=44220222

Points: 1

# Comments: 0

New comment by chromaton in "Introducing TiānshūBench (天书Bench)"

chromaton — Sun, 01 Jun 2025 22:11:29 +0000

Yes, it would be fantastic to have more languages to test off of. I picked the base language I did (Mamba) because it was easy to modify and integrate into Python.

New comment by chromaton in "Introducing TiānshūBench (天书Bench)"

chromaton — Sun, 01 Jun 2025 22:10:25 +0000

Generating the problems: I just thought up a few simple things that the computer might be able to do. In the future, I hope to expand to more complex problems, based upon common business situations: reading CSVs, parsing data, etc. I'll probably add new tests once I get multi-shot and reliability working correctly.

New base programming languages would be great, but what would be even better is some sort of meta-language where many features can be turned on or off, rather than just scrambling the keywords like I do now.

I did some vibe testing with a current frontier model, and it gets quite confused and keeps insisting that there's a control structure that definitely doesn't exist in the TiānshūBench language with seed=1.

Introducing TiānshūBench (天书Bench)

chromaton — Sun, 01 Jun 2025 03:42:37 +0000

Article URL: https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html

Comments URL: https://news.ycombinator.com/item?id=44148522

Points: 4

# Comments: 4

New comment by chromaton in "Reading "Business" Books Is a Waste of Time"

chromaton — Mon, 19 May 2025 12:34:10 +0000

I find that these books have to be read by the right person at the right time. Think and Grow Rich by Napoleon Hill did nothing for me when I first was exposed to it, but later on, helped me greatly.

BTW, the business book that helped me the most is barely known: Making Money is Killing Your Business by Chuck Blakeman.

New comment by chromaton in "Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic)"

chromaton — Wed, 02 Apr 2025 16:52:08 +0000

The PDF conversions I've tried in Firefox and Chromium don't work that well.

New comment by chromaton in "A liar who always lies says "All my hats are green.""

chromaton — Mon, 09 Dec 2024 19:34:14 +0000

> Another on that really irritates me is the kind that presents a series of integers and asks which integer comes next. Any integer will do, you just have to fit the appropriate polynomial.

This one bugs me to no end because it's part of the standard elementary school curriculum, for example here: https://byjus.com/maths/patterns-questions/

But surely someone with a strong imagination could come up with a pattern to fit any number as the next in the sequence. I doubt most elementary educators even grasp the issue.

New comment by chromaton in "Hofstadter on Lisp (1983)"

chromaton — Wed, 16 Oct 2024 17:27:02 +0000

AutoCAD automation?

New comment by chromaton in "Shapeways Files for Bankruptcy"

chromaton — Fri, 05 Jul 2024 10:44:20 +0000

Xometry, though it's also US and EU based.