<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: chromaton</title><link>https://news.ycombinator.com/user?id=chromaton</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 22 Apr 2026 11:51:57 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=chromaton" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by chromaton in "EsoLang-Bench: Evaluating Genuine Reasoning in LLMs via Esoteric Languages"]]></title><description><![CDATA[
<p>I did something very similar last year, but with programming languages that were REALLY out of distribution; they were generated specifically for the benchmark. I call it TiānshūBench (天书Bench): <a href="https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html" rel="nofollow">https://jeepytea.github.io/general/introduction/2025/05/29/t...</a><p>Some models were OK at solving very simple problems, but nearly all of them would, for example, hallucinate control structures that did not exist in the target language.</p>
]]></description><pubDate>Fri, 20 Mar 2026 03:16:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47450054</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=47450054</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47450054</guid></item><item><title><![CDATA[New comment by chromaton in "Verification debt: the hidden cost of AI-generated code"]]></title><description><![CDATA[
<p>Historically, the cycle has been requirements -> code -> test, but with coding becoming much faster, the bottlenecks have changed. That's one of the reasons I've been working on Spark Runner to help automate testing for web apps: https://<a href="https://github.com/simonarthur/spark-runner" rel="nofollow">https://github.com/simonarthur/spark-runner</a></p>
]]></description><pubDate>Sat, 07 Mar 2026 18:36:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47290240</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=47290240</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47290240</guid></item><item><title><![CDATA[New comment by chromaton in "Spark Runner: Easily Automate Front End Tests"]]></title><description><![CDATA[
<p>I've recently found that my ability to add new features and squash bugs has outpaced my ability to do full end-to-end tests. To help with this, I created Spark Runner for  automated website testing. It will create and execute a plan for tasks you give it in plain text like "add an item to the shopping cart" or you just point it at your front end code and have Spark Runner create the tests for you. It also makes nice reports telling you what's working and what's not.<p>New project, so feedback is welcome.</p>
]]></description><pubDate>Sat, 07 Mar 2026 02:43:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47283918</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=47283918</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47283918</guid></item><item><title><![CDATA[Spark Runner: Easily Automate Front End Tests]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/simonarthur/spark-runner/">https://github.com/simonarthur/spark-runner/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47283917">https://news.ycombinator.com/item?id=47283917</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Sat, 07 Mar 2026 02:43:35 +0000</pubDate><link>https://github.com/simonarthur/spark-runner/</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=47283917</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47283917</guid></item><item><title><![CDATA[New comment by chromaton in "When AI writes the software, who verifies it?"]]></title><description><![CDATA[
<p>TFA seems to be big on mathematical proof of correctness, but how do you ever know you're proving the right thing?</p>
]]></description><pubDate>Tue, 03 Mar 2026 22:15:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47239843</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=47239843</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47239843</guid></item><item><title><![CDATA[New comment by chromaton in "The programmers who live in Flatland"]]></title><description><![CDATA[
<p>Lisp has been around for 65 years (not 50 as in the author believes), and is one of the very first high-level programming languages. If it was as great as its advocates say, surely it would have taken over the world by now. But it hasn't, and advocates like PG and this article author don't understand why or take any lessons from that.</p>
]]></description><pubDate>Sun, 07 Dec 2025 17:12:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46183237</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=46183237</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46183237</guid></item><item><title><![CDATA[New comment by chromaton in "GPT-5: "How many times does the letter b appear in blueberry?""]]></title><description><![CDATA[
<p>Moravec strikes again.</p>
]]></description><pubDate>Sun, 10 Aug 2025 00:38:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44851751</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44851751</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44851751</guid></item><item><title><![CDATA[New comment by chromaton in "GPT-5: Overdue, overhyped and underwhelming. And that's not the worst of it"]]></title><description><![CDATA[
<p>For my benchmarking suite, it turns out that it's about 1/5 the price of Claude Sonnet 4.1, with roughly comparable results.</p>
]]></description><pubDate>Sun, 10 Aug 2025 00:33:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44851711</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44851711</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44851711</guid></item><item><title><![CDATA[New comment by chromaton in "How I code with AI on a budget/free"]]></title><description><![CDATA[
<p>If you're looking for free API access, Google offers access to Gemini for free, including for gemini-2.5-pro with thinking turned on. The limit is... quite high, as I'm running some benchmarking and haven't hit the limit yet.<p>Open weight models like DeepSeek R1 and GPT-OSS are also made available with free API access from various inference providers and hardware manufacturers.</p>
]]></description><pubDate>Sun, 10 Aug 2025 00:31:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=44851704</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44851704</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44851704</guid></item><item><title><![CDATA[New comment by chromaton in "Open models by OpenAI"]]></title><description><![CDATA[
<p>This has been available (20b version, I'm guessing) for the past couple of days as "Horizon Alpha" on Openrouter. My benchmarking runs with TianshuBench for coding and fluid intelligence were rate limited, but the initial results show worse results that DeepSeek R1 and Kimi K2.</p>
]]></description><pubDate>Tue, 05 Aug 2025 18:33:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=44802247</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44802247</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44802247</guid></item><item><title><![CDATA[New comment by chromaton in "François Chollet: The Arc Prize and How We Get to AGI [video]"]]></title><description><![CDATA[
<p>Current AI systems don't have a great ability to take instructions or information about the state of the world and produce new output based upon that. Benchmarks that emphasize this ability help greatly in progress toward AGI.</p>
]]></description><pubDate>Mon, 07 Jul 2025 13:55:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=44490410</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44490410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44490410</guid></item><item><title><![CDATA[TiānshūBench Intermediate Release 0.0.X]]></title><description><![CDATA[
<p>Article URL: <a href="https://jeepytea.github.io/general/update/2025/06/08/update00x.html">https://jeepytea.github.io/general/update/2025/06/08/update00x.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44220222">https://news.ycombinator.com/item?id=44220222</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 08 Jun 2025 23:50:34 +0000</pubDate><link>https://jeepytea.github.io/general/update/2025/06/08/update00x.html</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44220222</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44220222</guid></item><item><title><![CDATA[New comment by chromaton in "Introducing TiānshūBench (天书Bench)"]]></title><description><![CDATA[
<p>Yes, it would be fantastic to  have more languages to test off of. I picked the base language I did (Mamba) because it was easy to modify and integrate into Python.</p>
]]></description><pubDate>Sun, 01 Jun 2025 22:11:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=44154119</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44154119</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44154119</guid></item><item><title><![CDATA[New comment by chromaton in "Introducing TiānshūBench (天书Bench)"]]></title><description><![CDATA[
<p>Generating the problems: I just thought up a few simple things that the computer might be able to do. In the future, I hope to expand to more complex problems, based upon common business situations: reading CSVs, parsing data, etc. I'll probably add new tests once I get multi-shot and reliability working correctly.<p>New base programming languages would be great, but what would be even better is some sort of meta-language where many features can be turned on or off, rather than just scrambling the keywords like I do now.<p>I did some vibe testing with a current frontier model, and it gets quite confused and keeps insisting that there's a control structure that definitely doesn't exist in the TiānshūBench language with seed=1.</p>
]]></description><pubDate>Sun, 01 Jun 2025 22:10:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44154111</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44154111</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44154111</guid></item><item><title><![CDATA[Introducing TiānshūBench (天书Bench)]]></title><description><![CDATA[
<p>Article URL: <a href="https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html">https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44148522">https://news.ycombinator.com/item?id=44148522</a></p>
<p>Points: 4</p>
<p># Comments: 4</p>
]]></description><pubDate>Sun, 01 Jun 2025 03:42:37 +0000</pubDate><link>https://jeepytea.github.io/general/introduction/2025/05/29/tianshubenchintro.html</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44148522</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44148522</guid></item><item><title><![CDATA[New comment by chromaton in "Reading "Business" Books Is a Waste of Time"]]></title><description><![CDATA[
<p>I find that these books have to be read by the right person at the right time. Think and Grow Rich by Napoleon Hill did nothing for me when I first was exposed to it, but later on, helped me greatly.<p>BTW, the business book that helped me the most is barely known: Making Money is Killing Your Business by Chuck Blakeman.</p>
]]></description><pubDate>Mon, 19 May 2025 12:34:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=44029164</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=44029164</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44029164</guid></item><item><title><![CDATA[New comment by chromaton in "Circuit Tracing: Revealing Computational Graphs in Language Models (Anthropic)"]]></title><description><![CDATA[
<p>The PDF conversions I've tried in Firefox and Chromium don't work that well.</p>
]]></description><pubDate>Wed, 02 Apr 2025 16:52:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=43558697</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=43558697</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43558697</guid></item><item><title><![CDATA[New comment by chromaton in "A liar who always lies says "All my hats are green.""]]></title><description><![CDATA[
<p>> Another on that really irritates me is the kind that presents a series of integers and asks which integer comes next. Any integer will do, you just have to fit the appropriate polynomial.<p>This one bugs me to no end because it's part of the standard elementary school curriculum, for example here:
<a href="https://byjus.com/maths/patterns-questions/" rel="nofollow">https://byjus.com/maths/patterns-questions/</a><p>But surely someone with a strong imagination could come up with a pattern to fit any number as the next in the sequence. I doubt most elementary educators even grasp the issue.</p>
]]></description><pubDate>Mon, 09 Dec 2024 19:34:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=42369625</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=42369625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42369625</guid></item><item><title><![CDATA[New comment by chromaton in "Hofstadter on Lisp (1983)"]]></title><description><![CDATA[
<p>AutoCAD automation?</p>
]]></description><pubDate>Wed, 16 Oct 2024 17:27:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=41861541</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=41861541</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41861541</guid></item><item><title><![CDATA[New comment by chromaton in "Shapeways Files for Bankruptcy"]]></title><description><![CDATA[
<p>Xometry, though it's also US and EU based.</p>
]]></description><pubDate>Fri, 05 Jul 2024 10:44:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=40881649</link><dc:creator>chromaton</dc:creator><comments>https://news.ycombinator.com/item?id=40881649</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40881649</guid></item></channel></rss>