Hacker News: 4corners4sides

New comment by 4corners4sides in "MuJoCo – Advanced Physics Simulation"

4corners4sides — Wed, 22 Apr 2026 09:36:29 +0000

People have made cool racing education simulators with this too: https://github.com/FT-Autonomous/ft_grandprix.

New comment by 4corners4sides in "Tracking when Trump chickens out"

4corners4sides — Mon, 20 Apr 2026 11:15:04 +0000

Ah okay, thanks. We've come a long way since the purple gradients and glowy buttons but I suppose there is still a tell.

New comment by 4corners4sides in "Tracking when Trump chickens out"

4corners4sides — Mon, 20 Apr 2026 09:37:46 +0000

This is a pretty unique aesthetic. Does the author have other works / a blog? Can v0 / Claude Design do this these days?

New comment by 4corners4sides in "Epic Games to cut more than 1k jobs as Fortnite usage falls"

4corners4sides — Tue, 24 Mar 2026 15:46:58 +0000

Roblox is the elephant in the room here which fills the niche for freemium, fun 3D experiences that run on basically any platform or device.

New comment by 4corners4sides in "Experts Have World Models. LLMs Have Word Models"

4corners4sides — Mon, 09 Feb 2026 19:13:50 +0000

This article is a really good summary of current thinking on the “world model” conundrum that a lot of people are talking about, either directly or indirectly with respect to current day deployments of LLMs.

It synthesizes comments on “RL Environments” (https://ankitmaloo.com/rl-env/), “World Models” (https://ankitmaloo.com/world-models/) and the real reason that the “Google Game Arena” (https://blog.google/innovation-and-ai/models-and-research/go...) is so important to powering LLMs. In a sense it also relates to the notion of “taste” (https://wangcong.org/2026-01-13-personal-taste-is-the-moat.h...) and how / if it’s moat-worthiness can be eliminated by models.

New comment by 4corners4sides in "Advancing finance with Claude Opus 4.6"

4corners4sides — Sat, 07 Feb 2026 13:12:04 +0000

The benchmarks look good. Slide decks and spreadsheets look better. The people must use Claude Cowork and have their Claude Code moment and figure out the consequences. It will be really interesting to see articles like this (https://mitchellh.com/writing/my-ai-adoption-journey) written by people who actually care about accuracy in places like KPMG to get their perspective on things.

I remember over hearing some normal people on the bus talking about essentially orchestrating some agent scraper to pull and summarise news from 40 different sites he identified as important which put him quite ahead of his peers. These were non-technical people orchestrating an agent workflow to make them better at work.

Though there’s not much that tickles my software brain here. But the agents are coming for us all.

New comment by 4corners4sides in "OpenAI Frontier"

4corners4sides — Sat, 07 Feb 2026 13:11:10 +0000

This is OpenAI taking the concept of AI coworkers seriously down to the level of “identity” for these agents.

This reminded me of Kairos which came up a few days ago (https://www.kairos.computer/) however I actually feel much better and more inspired at the angle OpenAI took than the angle kairos took. OpenAI’s genuinely feels like a platform for a coworker while Kairos is yet another cool landing page, yet another agent platform with X amount of data integrations. The use cases in OpenAI’s article also felt more concrete and impressive to be honest.

The fact that “as agents have gotten more capable, the opportunity gap between what models can do and what teams can actually deploy has grown.” is definitely true. I think the analogy whose source I have forgot commented that we have F1 cars driving at 60/kmh so for a lot of enterprises they are not even at the deployment limit where improving benchmarks matter. They are still at the level of not being able to provide the right info, not having the right evaluation and improvement frameworks .etc.

Using “Opening the AI Frontier” as a heading would be in really poor taste before OpenAI released their OSS models (earning their ClosedAI moniker) but I guess it’s a bit less offensive now. I think this product combined with OpenAI FDEs is going to make a lot of large industries inaccessible to startups but there may still be value in companies like Kairos watching what OpenAI does in this space and copying them.

New comment by 4corners4sides in "Stay Away from My Trash"

4corners4sides — Sat, 07 Feb 2026 13:10:21 +0000

Large scale AI deployment has led to a complete change in what signal code actually conveys and what it means for maintainers. Code is no longer a yardstick for effort, care or expertise. If anything a large amount of it can be the opposite.

I read an article a while ago about how “taste is a moat” (https://wangcong.org/2026-01-13-personal-taste-is-the-moat.h...) and it kind of applies here. In that article a technically correct kernel patch was rejected since it actually just re-implemented functionality htat was available elsewhere. In the tldraw repo, users seem to clone the repo, spin up claude and then make a PR without any kind of “taste” involved.

What confuses me is the fact that tldraw is actually very good for trying to get the best out of models, and indeed, internal to tldraw, models are expected to be used and the author gets value out of them. And yet, people leave sloppy unvetted PRs. This is a social issue that we didn’t really have before since it was producing code was the difficult part. Now producing code and PRs is easy the signal v.s. Noise ratio has collapsed completely and it’s just not worth it for people to actually review this stuff.

It would be better for people to leave one line issues with video demonstrations and allow the internal team to /fix them: “In a world of AI coding assistants, is code from external contributors actually valuable at all? If writing the code is the easy part, why would I want someone else to write it?”. Is code really needed to convey problems with open source repos or is it something unnecessary that we are now unshackled from? In the case of tldraw a lot of the PRs are just the result of people running claude on issues and therefore they add absolutely zero value.

New comment by 4corners4sides in "We tasked Opus 4.6 using agent teams to build a C Compiler"

4corners4sides — Sat, 07 Feb 2026 13:08:55 +0000

A compiler is another thing whose honor and pride that the models have taken from the nerds. In the past, people would debate for hours about the “dragon book” v.s. “writing interpreters” and present their cool bespoke compilers in Show HN articles. Now models can produce 100,000 lines of code over two weeks with no human intervention that actually work and can compile significant project. Which way now nerd? The models are getting better, are you?

The article has some really odd low level descriptions of bash orchestration which I suppose are important to illustrate how barebones it was. However I always feel it odd when we’re talking about agents that are lauded as borderline super intelligence and there is still low level bash being slung around – feels like we’re talking about things at the wrong level.

The point about writing extremely high quality tests reminds me a bit of the “hot mess theory of AI” (https://alignment.anthropic.com/2026/hot-mess-of-ai/) also made by anthropic where they essentially say that long horizon tasks are more likely to fall to incoherency than for a model to purposefully pursue incorrect results. This is phrased in the article as “Claude will work autonomously to solve whatever problem I give it. So it’s important that the task verifier is nearly perfect, otherwise Claude will solve the wrong problem”.

The author also observes something that I’ve realised after the initial joy of seeing an agent one shot a task wore off – for a 30 minute agent task, 25 minutes may be spent doing exploration of the environment. While it would be an offence to give a human unvetted model generated documentation and runbooks (I’m looking at you emoji ridden README.md files becoming more common across Show HN), models should commit things like this to memory for themselves to avoid repeatedly paying the “discovery tax” on every new action. Errors, hallucinations or changes cause the generated docs to fail create more busywork for the agent but agent time is less valuable than finite human life.

New comment by 4corners4sides in "My AI Adoption Journey"

4corners4sides — Sat, 07 Feb 2026 13:06:29 +0000

The author makes a point that you should redo every manual commit with AI to align you mental model of actions with how models work. This is something that I’m going to need to try. It’s related to my desire to reduce things like “discovery tax” (the phenomenon whereby a 5 minute agent task is 4 minutes of environment exploration and 1 minute of execution) and makes sure that models get things right the first time around, however, my AI improvement plan didn’t really account for how to improve the model in cases where I ended up manually resolving issues or implementing features.

Some arguments are made about retaining focus and single-mindedness while working on AI. I think these points are important. It’s related to the article on cutting out over-eager orchestration and focusing on validation work (https://sibylline.dev/articles/2026-01-27-stop-orchestrating...). There are a few sides to this covered in the article. You should always have high value task to switch to when the agent is working (instead of scrolling tiktok, instagram,X, youtube, facebook, hackernews .etc). In my case I might try start to read some books that I have on the backburner like Ghost in the Wires. You should disable agent notifications and take control of when you return to check the model context to be less ADHD ridden when programming with agents and actually make meaningful progress on the side task since you only context switch when you are satisfied. The final one is to always have at least one agent and preferably only one agent running in the background. The idea is that always having an agent results in a slow burn of productivity improvements and a process where you can slowly improve the background agent performance. Generally, always having some agent running is a good way to stay on top of what current model capabilities are.

I also really liked the idea of overnight agents for library research, redevelopment of projects to test out new skills, tests and AGENTS.md modifications.

New comment by 4corners4sides in "GPT-5.3-Codex"

4corners4sides — Sat, 07 Feb 2026 13:05:35 +0000

A 77% score on terminal-bench 2 is really impressive. I remember reading the article about the pi coding agent (https://mariozechner.at/posts/2025-11-30-pi-coding-agent/) getting into the top ten percent of agents on that benchmark. It got about 50%. While it may still be in the top ten, that category just turned into one champion and a long of inferior offerings.

I was shocked to see that in the prompt for one of the landing pages the text “lavender to blue gradient” was included as if that’s something that anybody actually wants. It’s like going to the barber and saying “just make me look awful”.

This was my first time actually seeing what the GDPval benchmark looked like. Essentially they benchmark for all the artifacts that HR/finance might make or work on (onboarding documents, accounting spreadsheets, powerpoint presentations .etc). I think it’s good that models are trained to generate things like this well since people are going to use AI to do such anyway. If the middlemen passing AI ouputs around are going to be lazy I’m grateful that at least OpenAI researchers are cooking something behind the scenes.

New comment by 4corners4sides in "A sane but bull case on Clawdbot / OpenClaw"

4corners4sides — Wed, 04 Feb 2026 21:30:25 +0000

This article convinced me to try to set up OpenClaw locally on the my raspberry pi but I realised that it had no micro SD card installed AND it used micro HDMI instead of a regular HDMI for display which I didn't have…

Some of the takes in this article relate to the "Agent Native Architecture" (https://every.to/guides/agent-native), an article that I critiqued quite heavily for being AI generated. This article presents many of the concepts explored there in a real-world, pragmatic lens. In this case, the author brings up how initially they wanted their agent to invoke specific pre-made scripts but ultimately found out that letting go of the process is where the inner model intelligence was able to really shine. In this case, parity, the property whereby anything a human can do an agent can do was achieved most powerfully buy simply giving the agent a browser-use agent which cracked open the whole web for the agent to navigate through.

The gradual improvement property of agent native architectures was also directly mentioned by the article, where the author commented on giving the model more and more context allowed him to “feel the AGI”.

ClawdBot is often reduced to “just AI and cron” but that might be overly reductive in the same way that one could call it a “GPT wrapper” in the same way that one could call a laptop an “electricity wrapper”. It seems like the scheduler is a significant aspect of what makes ClawdBot so powerful. For example the author, instead of looking for sophisticated scraper apps online to monitor prices of certain items will simply ask ClawdBot something like: “Hey, monitor hotel prices” and ClawdBot will handle the rest asynchronously and communicate back with the author over slack. Any performance issues due to repeated agent invocations are ameliorated by problem context and runbooks that are automatically generated and probably cost less time than maintaining pipelines written in plain code for a single individual who wants a hands-off agent solution.

Also, the article actually explains the obsessions with Mac Mini’s which I thought was some kind of convoluted scam (though apple doesn’t need scams to sell Macs…). Essentially you need it to run a browser or multiple browsers for your agents. Unfortunately that’s the state of the modern web.

I actually have my own note taking system and a pipeline to give me an overview of all of the concepts, blogs and daily events that have happened over the past week for me to look at. But it is much more rigid than ClawdBot: 1) I can only access it from my laptop, 2) it only supports text at the moment, 3) the actions that I can take are hard coded as opposed to agent-refined and naturally occuring (e.g. tweet pipeline, lessons pipeline, youtube video pipeline), 4) there’s no intelligent scheduler logic or agent at all so I manually run the script every evening. Something like ClawdBot could replace this whole pipeline.

Long story short, I need to try this out at some point.

New comment by 4corners4sides in "Claude is a space to think"

4corners4sides — Wed, 04 Feb 2026 21:24:49 +0000

This is one of those “don’t be evil” like articles that companies remove when the going gets tough but I guess we should be thankful that things are looking rosy enough for Anthropic at the moment that they would release a blog like this.

The point about filtering signal vs. noise in search engines can’t really be stated enough. At this point using a search engine and the conventional internet in general is an exercise in frustration. It’s simply a user hostile place – infinite cookie banners for sites that shouldn’t collect data at all, auto play advertisements, engagement farming, sites generated by AI to shill and produce a word count. You could argue that AI exacerbates this situation but you also have to agree that it is much more pleasant to ask perplexity, ChatGPT or Claude a question than to put yourself through the torture of conventional search. Introducing ads into this would completely deprive the user of a way of navigating the web in a way that actually respects their dignity.

I also agree in the sense that the current crop of AIs do feel like a space to think as opposed to a place where I am being manipulated, controlled or treated like some sheep in flock to be sheared for cash.

New comment by 4corners4sides in "Stop screwing around with agent orchestration, your bottleneck is validation"

4corners4sides — Wed, 04 Feb 2026 21:22:47 +0000

Despite having read articles discussing when to delegate to AI discussing agent completion time, agent success probability and human verification time the thought of genuinely systematising and solving the problem of verification and QA never occurred to me. My mind is still in the mode where “building” and “shipping” are noble goals that are to be sought after even though that era is dead due to how low the difficulty bar has dropped (the bar is six feet deep). We should build and we should ship faster, but only considering those aspects is irresponsible and childish. With these new automated reasoning systems we ought to validate in as much as possible before presenting anything to the user.

Possibly the most salient point in the article is the following: “for the love of god, put [...] whatever tool du jour you're using to blow up your codebase, and make sure every claim in your README, every claim in your docs (you have docs, right?), every claim on your website is 100% tested and validated. Run actual rigorous benchmarks. Set up E2E tests driven by behavioral specs. Take your users seriously enough to deliver a good experience out of the box rather than trying to use hype to drive uptake then hoping they'll provide you with free QA”.

Personally this really resonated with the absolute fatigue I feel inside when I see a new “Show HN” to a GitHub repository in the year of our lord 2026. I’ve been burned by “slop” repos so much that my I already feel the Claude emoji drivel coming and sure enough a lot of the time that’s all a repo is, the abandoned and uncared for orphan child born of a passionate one night stand with Claude Code. Not a single screenshot or demo video in sight, just plausible promises dumped into a file for end users to figure out.

New comment by 4corners4sides in "Kiki – Accountability monster for people who are easily distracted"

4corners4sides — Tue, 03 Feb 2026 20:07:43 +0000

Cool addiction blocker product. Also ties into a strange concept whereby something has value because you pay for it as opposed to the other way around. This idea probably generalises to any kind of self help app that people normally get for free and ignore. If you can provide enough polish to justify a luxury price tag people may invest!

New comment by 4corners4sides in "Guix System First Impressions as a Nix User"

4corners4sides — Tue, 03 Feb 2026 20:06:36 +0000

From my perspective he seemed to go through a lot of trauma to end up with an otherwise regular linux install (NVIDIA driver failure, KDE plasma failure, extremely slow downloads, solution speeds). Though at some point I should trial one of these declarative OSes for development be it Nix or guix.

As a recovering emacs user I had a pretty visceral reaction to seeing a “cons” cell in code. In all my ramblings about a post-syntax world and chasing higher and higher abstractions seeing the level of abstractions of “cons” cells, a linked list implementation detail, mingled in with the orchestration of a whole operating system just feels like a despicable and cruel joke. It’s like studying system design and hearing some peanut brain in the other room say something like “let’s use a for loop” - not the level that one should be thinking at.

New comment by 4corners4sides in "Automatic Programming"

4corners4sides — Tue, 03 Feb 2026 20:03:36 +0000

This one touches on the metaphysical union of all prior minds and the implicit “natural” element to “artificial” intelligence. Yes, indeed sometimes reading LLM written text you cannot help but wince and cringe at the “contrastive framing” writing style (it’s not X, it’s Y) but these models did not arise from some kind man-hating void in space. They arose by learning from everything that can possibly be learned from that humans have made available digitally. Antirez says that “Pre-training is, actually, our collective gift that allows many individuals to do things they could otherwise never do, like if we are now linked in a collective mind, in a certain way”. This is borderline cultish but I can’t help but agree with it.

New comment by 4corners4sides in "What I learned building an opinionated and minimal coding agent"

4corners4sides — Tue, 03 Feb 2026 20:02:00 +0000

Mario’s enthusiasm for his coding agent here and the sheer surface area of the pi project reminds me of how in a bygone era people used to fawn over their own text editors, the features that they implemented, their macros and speed of editing, lightweightness .etc. Coding agents may be the new “text editor” and the “emacs” vs. “vim” vs. “vscode” wars can finally be put to rest. The ubiquity of vscode already made the “emacs” vs. “vim” debate more of a shouting match between a bundle of contrarian geriatrics. The power of coding agents and the new era of “automatic programming” may be the final nail in the coffin for these lumbering piles of legacy inertia which we now have all the power to replace but no real need. For why should we use the means of mass production to support the rickety home shops of the past?

Anyway, more on the actual article what he’s done is really cool and features a lot of stuff that has proven to work at the forefront of automatic programming – he has a massive test suite against all major model providers, he runs his agent against known eval suites as well.

New comment by 4corners4sides in "Agent-Native Architectures"

4corners4sides — Tue, 03 Feb 2026 19:57:32 +0000

The number one principle of the author’s philosophy is that anything that a user can do through a UI an agent should be able to do through a sequence of tool calls. I agree with this and I think if every old and bespoke UI exposed such an atomic API the world would be a better place. Maybe the AI gold rush can force companies to make their apps more programmable.

The optimisation hierarchy in the article is also cool going from:

Agent composes atomic tools to achieve actions

Agent invokes domain specific skills

Agent flow is translated into code

Code is optimised in a lower level language

…

Assembly?

I like how it extends our normal notion of optimisation with new “agentic programs” which are nothing but model intelligence, simple but powerful tools and user intent.

Ultimately I couldn’t finish the article. “This isn't about a one-to-one mapping of UI buttons to tools—it's about achieving the same outcome”. Everytime I read contrast framing I take psychological damage and it really really hurts. Sometimes I feel like contrast framing is left in as a conspiracy by AI companies to easily make known people who are so lazy that they can’t even be bothered asking their agents to omit it. Like a watermark hidden in plain sight….

The article in fairness says that it was authored by Claude. And you can tell. When I first saw that the article was co-written by Claude I actually thought that it would be better thought out and structured. Surely, if somebody is brave enough to admit that they used AI for a blog, where the author’s voice is valued so highly, it must mean that they have cracked the code and produced something truly magnificent and they want to show it off! How wrong I was.