Hacker News: ceroxylon

New comment by ceroxylon in "System Card: Claude Mythos Preview [pdf]"

ceroxylon — Wed, 08 Apr 2026 01:19:28 +0000

I have been thinking that these SWE benchmarks will continue to improve since these companies hire very intelligent software engineers, they can task a multitude of them to solve problems, and then train the model on those answers.

Data has always been the core of it all, onward to the next abstraction, I suppose.

New comment by ceroxylon in "Google releases Gemma 4 open models"

ceroxylon — Thu, 02 Apr 2026 16:39:16 +0000

Even with search grounding, it scored a 2.5/5 on a basic botanical benchmark. It would take much longer for the average human to do a similar write-up, but they would likely do better than 50% hallucination if they had access to a search engine.

New comment by ceroxylon in "I'm glad the Anthropic fight is happening now"

ceroxylon — Wed, 11 Mar 2026 21:41:24 +0000

I think a lot of Dwarkesh's mentality about AI being inevitable / ubiquitous comes from the same part of him that thinks that artificial things are "good enough", e.g. the way he allows his production team to use fake plastic plants on set. Is he correct? I'm not sure, but I know there are at least a few people who notice the difference.

New comment by ceroxylon in "I put my whole life into a single database"

ceroxylon — Tue, 10 Mar 2026 16:34:17 +0000

I stopped reading at "San Francisco was always scary to walk"...

New comment by ceroxylon in "We might all be AI engineers now"

ceroxylon — Sat, 07 Mar 2026 02:20:15 +0000

> Honestly?

oh no... this is one of my "uncanny valley" AI tropes

New comment by ceroxylon in "Chaos and Dystopian news for the dead internet survivors"

ceroxylon — Thu, 05 Mar 2026 06:00:52 +0000

engagement bot on overdrive

New comment by ceroxylon in "Chaos and Dystopian news for the dead internet survivors"

ceroxylon — Thu, 05 Mar 2026 03:56:22 +0000

Hallucinations galore, the 'daily digest' provided me with this gem: "Apple's supposedly revolutionary $1,199 MacBook Neo is getting schooled by $500 Windows machines that do basically the same thing without the premium"

There is no way to build a Macbook Neo for $1,199 and this is obviously snarky, auto-generated slop.

New comment by ceroxylon in "Google Workspace CLI"

ceroxylon — Thu, 05 Mar 2026 03:15:04 +0000

The readme is AI generated, so I am assuming the lack of effort and hand-off to the bots extends to the rest of this repository.

The contributors are a Google DRE, 5 bots / automating services, and a dev in Canada.

New comment by ceroxylon in "We Will Not Be Divided"

ceroxylon — Sat, 28 Feb 2026 02:19:21 +0000

That's what taking a stand looks like... if any of these employees lose their job, they are welcome to come crash at my place for as long as they would like; they will have a roof over their head and I will cook them 3 meals a day.

New comment by ceroxylon in "New accounts on HN more likely to use em-dashes"

ceroxylon — Wed, 25 Feb 2026 21:36:20 +0000

That was my reaction when LLMs first started getting "good"

I turned to my friend and said "They've co-opted the structure of effective language!"

New comment by ceroxylon in "OpenAI, the US government and Persona built an identity surveillance machine"

ceroxylon — Wed, 25 Feb 2026 00:38:19 +0000

There is a play/pause button in the lower right corner.

New comment by ceroxylon in "Anthropic announces proof of distillation at scale by MiniMax, DeepSeek,Moonshot"

ceroxylon — Mon, 23 Feb 2026 19:28:07 +0000

I personally have stopped publishing publicly, since my research is still on the fuzzy boundary of AI's current knowledge, my website gets scraped daily, and I don't want to contribute to paid models for zero acknowledgement or compensation.

New comment by ceroxylon in "Claws are now a new layer on top of LLM agents"

ceroxylon — Sun, 22 Feb 2026 15:00:43 +0000

All of this, plus you can plug in an openrouter API key and test a plethora of models for all use cases. You can assign different models to different sub-agents, you can put it in /auto mode, and you can test the latest SOTA models the minute they're released...

It can also edit its own config files, monitor system processes, and even... check and harden its own system security. I still don't have it connected to my personal accounts, but as a standalone system it is very fun.

People ask me "what would I even do with it?", when I think of dozens of things every day. I've been working on modding an open source software synth, the patch files are XML so it was trivial to set up a workflow where I can add new knobs that combine multiple effects, add new ones, etc from just sending a it a message when I get inspired in the middle of the day.

A cron job scans my favorite sites twice a day and curates links based on my preferences, and creates a different list for things that are out of my normal interests to explore new areas.

I am amazed at how stubborn and un-creative people can be when presented with something like this... I thought we were hackers...?

New comment by ceroxylon in "I verified my LinkedIn identity. Here's what I handed over"

ceroxylon — Sat, 21 Feb 2026 16:37:32 +0000

I also find AI trope-ification articles exhausting to read, there's a reason I've fine tuned my system prompts to wipe all of it away. This reads like "Hey Gemini, I verified my passport on LinkedIn, write an impassioned exposé on Persona's privacy policy".

When people leave in things like staccato language and Blogspot era emphasis, I feel like I might as well copy the Persona privacy policy and prompt my own AI(s) on the topic and read that instead.

New comment by ceroxylon in "Gemini 3.1 Pro"

ceroxylon — Fri, 20 Feb 2026 13:38:16 +0000

It also has some strange bugs between versions. There was an update a month or two ago that caused the app to be unable to quit normally, and I would have to 'force quit' it. Thankfully it was resolved, but it was unnerving to not be able to close the app normally.

New comment by ceroxylon in "Gemini 3.1 Pro"

ceroxylon — Thu, 19 Feb 2026 23:26:53 +0000

I once saw "now that I've slept on it" in Gemini's CoT... baffling.

New comment by ceroxylon in "Show HN: Rebrain.gg – Doom learn, don't doom scroll"

ceroxylon — Wed, 18 Feb 2026 23:10:08 +0000

This was a couple of years ago, but I remember using ChatGPT to try and study for a certification by generating quiz questions.

It would always start to make every correct answer option "C" over time, no matter what I tried. Eventually I was so focused on whether or not it was stuck in a "C" loop that I started overthinking all of the questions and wasting time.

Flash forward to testing Sonnet 4.6 recently to try and see if it could effectively teach me something new, I got about 5 prompts in before I had to point out an oversight, and it gave me the classic "you're absolutely right, ignore that suggestion".

This is anecdotal of course, but at least LLMs are helping to build my skills of fact verification and citation checking!

New comment by ceroxylon in "Garment Notation Language: Formal descriptive language for clothing construction"

ceroxylon — Wed, 18 Feb 2026 19:38:22 +0000

It is not working on Firefox 147.0.4 either.

New comment by ceroxylon in "Claude Sonnet 4.6"

ceroxylon — Tue, 17 Feb 2026 18:57:40 +0000

Strangely enough, my first test with Sonnet 4.6 via the API for a relatively simple request was more expensive ($0.11) than my average request to Opus 4.6 (~$0.07), because it used way more tokens than what I would consider necessary for the prompt.

New comment by ceroxylon in "A sane but bull case on Clawdbot / OpenClaw"

ceroxylon — Wed, 04 Feb 2026 19:03:54 +0000

Reminds me of Dan Harumi

> Tech people are always talking about dinner reservations . . . We're worried about the price of lunch, meanwhile tech people are building things that tell you the price of lunch. This is why real problems don't get solved.