Hacker News: cube2222

New comment by cube2222 in "Anthropic's Safety Superpower"

cube2222 — Mon, 15 Jun 2026 11:07:28 +0000

Relatedly, I think it's worth noting that Anthropic models have consistently been top-scoring in BullshitBench[0], in a league of their own, really.

Not affiliated with the bench in any way, but I think it surfaces important differences between the behavior of the models from different labs.

TLDR: The benchmark is measuring pushback in response to nonsensical requests and questions, as opposed to going with it and hallucinating a nonsensical answer.

[0]: https://petergpt.github.io/bullshit-benchmark/viewer/index.v...

New comment by cube2222 in "Ask HN: Who is hiring? (June 2026)"

cube2222 — Mon, 01 Jun 2026 22:47:38 +0000

Hey, I responded to you two months ago[0], the job post has been up for probably 4-5 years by now (in a continuously evolving form).

We’re doing well, are steadily growing, and have been actively hiring backend engineers ever since!

Is there any specific concern you have?

[0]: https://news.ycombinator.com/item?id=47797034

New comment by cube2222 in "Ask HN: Who is hiring? (June 2026)"

cube2222 — Mon, 01 Jun 2026 18:16:27 +0000

Spacelift | Remote (Europe) | Full-time | Senior Software Engineer | $80k-$110k+ (can go higher)

We're a VC-funded startup (recently raised $51M Series C) building an infrastructure orchestrator and collaborative management platform for Infrastructure-as-Code – from OpenTofu, Terraform, Terragrunt, CloudFormation, Pulumi, Kubernetes, to Ansible.

On the backend we're using 100% Go with AWS primitives. We're looking for backend developers who like doing DevOps'y stuff sometimes (because in a way it's the spirit of our company), or have experience with the cloud native ecosystem. Ideally you'd have experience working with an IaC tool, i.e. Terraform, Pulumi, Ansible, CloudFormation, Kubernetes, or SaltStack.

Overall we have a deeply technical product, trying to build something customers love to use, and have a lot of happy and satisfied customers. We promise interesting work, the ability to open source parts of the project which don't give us a business advantage, as well as healthy working hours.

If that sounds like fun to you, please apply at https://careers.spacelift.io/jobs/3006934-software-engineer-...

You can find out more about the product we're building at https://spacelift.io and also see our engineering blog for a few technical blog posts of ours: https://spacelift.io/blog/engineering

New comment by cube2222 in "Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML"

cube2222 — Sun, 03 May 2026 10:44:08 +0000

Love the writing style!

> Nothing beats an organic, pasture-raised, hand-written spec.

Hah, I strongly empathize with the wording. I’ve been starting my design docs for fellow humans with “100% hand-written, organic content”, I might steal a part of yours.

Overall, cool idea. I don’t see myself using your SaaS, but the approach of tagging the requirements and constraints to make them easier to find sounds good.

One project you didn’t mention which I think is also, I think, a cool perspective on this is codespeak.dev , but I haven’t given it a go yet.

All in all, I feel like maintaining specs, and having agents translate spec diffs into code diffs is a promising area for the future. Good thing I enjoy writing!

New comment by cube2222 in "Ask HN: Who is hiring? (May 2026)"

cube2222 — Fri, 01 May 2026 16:33:41 +0000

Spacelift | Remote (Europe) | Full-time | Senior Software Engineer | $80k-$110k+ (can go higher)

If that sounds like fun to you, please apply at https://careers.spacelift.io/jobs/3006934-software-engineer-...

You can find out more about the product we're building at https://spacelift.io and also see our engineering blog for a few technical blog posts of ours: https://spacelift.io/blog/engineering

New comment by cube2222 in "GnuPG – post-quantum crypto landing in mainline"

cube2222 — Sun, 26 Apr 2026 10:02:22 +0000

It’s worth noting that e.g. the Go stdlib has this hybrid construction built-in via crypto/hpke.

New comment by cube2222 in "GPT-5.5"

cube2222 — Thu, 23 Apr 2026 19:35:04 +0000

Small tip, at least for now you can switch back to Opus 4.6, both in the ui and in Claude Code.

New comment by cube2222 in "An update on recent Claude Code quality reports"

cube2222 — Thu, 23 Apr 2026 18:57:51 +0000

I’ve never been one to complain about new models, and also didn’t experience most of the issues folks were citing about Claude Code over the last couple months. I’ve been using it since release, happy with almost each new update.

Until Opus 4.7 - this is the first time I rolled back to a previous model.

Personality-wise it’s the worst of AI, “it’s not x, it’s y”, strong short sentences, in general a bulshitty vibe, also gaslighting me that it fixed something even though it didn’t actually check.

I’m not sure what’s up, maybe it’s tuned for harnesses like Claude Design (which is great btw) where there’s an independent judge to check it, but for now, Opus 4.6 it is.

New comment by cube2222 in "Claude Opus 4.7"

cube2222 — Thu, 16 Apr 2026 18:27:06 +0000

I would risk a guess that people have a wrong intuition about the long-context pricing and are complaining because of that.

Yeah, the per-token price stays the same, even with large context. But that still means that you're spending 4x more cache-read tokens in a 400k context conversation, on each turn, than you would be in a 100k context conversation.

New comment by cube2222 in "Ask HN: Who is hiring? (April 2026)"

cube2222 — Thu, 16 Apr 2026 17:51:54 +0000

Hey! I believe I've been posting this (not verbatim, it's been evolving) almost every month since 2020 - we've been actively hiring since, and we are still hiring.

New comment by cube2222 in "Claude Opus 4.7"

cube2222 — Thu, 16 Apr 2026 14:52:39 +0000

I've been using it with `/effort max` all the time, and it's been working better than ever.

I think here's part of the problem, it's hard to measure this, and you also don't know in which AB test cohorts you may currently be and how they are affecting results.

New comment by cube2222 in "Claude Opus 4.7"

cube2222 — Thu, 16 Apr 2026 14:50:12 +0000

Seems like it's not in Claude Code natively yet, but you can do an explicit `/model claude-opus-4-7` and it works.

New comment by cube2222 in "Ask HN: Who is hiring? (April 2026)"

cube2222 — Wed, 01 Apr 2026 20:29:50 +0000

Spacelift | Remote (Europe) | Full-time | Senior Software Engineer | $80k-$110k+ (can go higher)

If that sounds like fun to you, please apply at https://careers.spacelift.io/jobs/3006934-software-engineer-...

You can find out more about the product we're building at https://spacelift.io and also see our engineering blog for a few technical blog posts of ours: https://spacelift.io/blog/engineering

New comment by cube2222 in "Astral to Join OpenAI"

cube2222 — Thu, 19 Mar 2026 14:19:35 +0000

Honestly, for now they seem to be buying companies built around Open Source projects which otherwise didn't really have a good story to pay for their development long-term anyway. And it seems like the primary reason is just expertise and tooling for building their CLI tools.

As long as they keep the original projects maintained and those aren't just acqui-hires, I think this is almost as good as we can hope for.

(thinking mainly about Bun here as the other one)

New comment by cube2222 in "Reviewing Large Changes with Jujutsu"

cube2222 — Mon, 16 Mar 2026 13:09:50 +0000

A reasonably cool part about this approach (duplicating the commits, though I suppose you could just add your own bookmark on the existing commit, too) is that you can easily diff the current pr state with what you last reviewed, even across rebases, squashes, fixups, etc. Will have to give that a go.

Unfortunately GitHub still doesn't make that easy, and branch `push --force`'s make it really hard to see what changed, would be amazing if they ever fixed that.

In general, I think with the rise of agentic coding, and more review work, I hope we see some innovation in the "code review tooling" space. Not AI reviewers (that's useful too but already works well enough)! I want tools that help the human review code faster, more effectively, and in a more pleasant way.

Of course can't end the comment without the obligatory "jj is great, big recommend, am not affiliated, check out the blog post I wrote a year ago for getting started with it[0]", ha! I'm still very happy with it, no going back.

[0]: https://kubamartin.com/posts/introduction-to-the-jujutsu-vcs...

New comment by cube2222 in "Kotlin creator's new language: talk to LLMs in specs, not English"

cube2222 — Thu, 12 Mar 2026 19:56:04 +0000

This is actually... pretty cool?

Definitely won't use it for prod ofc but may try it out for a side-project.

It seems that this is more or less:

  - instead of modules, write specs for your modules
  - on the first go it generates the code (which you review)
  - later, diffs in the spec are translated into diffs in the code (the code is *not* fully regenerated)

this actually sounds pretty usable, esp. if someone likes writing. And wherever you want to dive deep, you can delve down into the code and do "microoptimizations" by rolling something on your own (with what seems to be called here "mixed projects").

That said, not sure if I need a separate tool for this, tbh. Instead of just having markdown files and telling cause to see the md diff and adjust the code accordingly.

New comment by cube2222 in "Ask HN: Who is hiring? (March 2026)"

cube2222 — Mon, 02 Mar 2026 18:46:13 +0000

Spacelift | Remote (Europe) | Full-time | Senior Software Engineer | $80k-$110k+ (can go higher)

If that sounds like fun to you, please apply at https://careers.spacelift.io/jobs/3006934-software-engineer-...

You can find out more about the product we're building at https://spacelift.io and also see our engineering blog for a few technical blog posts of ours: https://spacelift.io/blog/engineering

New comment by cube2222 in "Claude Sonnet 4.6"

cube2222 — Wed, 18 Feb 2026 11:35:23 +0000

This seems to agree with my own previous tests of Sonnet vs Opus (not on this version). If I give them a task with a large list of constraints ("do this, don't do this, make sure of this"), like 20-40, Sonnet will forget half of it, while Opus correctly applies all directives.

My intuition is this is just related to model size / its "working memory", and will likely neither be fixed by training Sonnet with Opus nor by steadily optimizing its agentic capabilities.

New comment by cube2222 in "Claude Sonnet 4.6"

cube2222 — Tue, 17 Feb 2026 18:40:23 +0000

Attention is, at its core, quadratic wrt context length. So I'd believe that to be the case, yeah.

New comment by cube2222 in "Claude Sonnet 4.6"

cube2222 — Tue, 17 Feb 2026 18:14:07 +0000

So tldr it seems like it's

- a reasonable improvement over sonnet 4.5, esp. with agentic tool use

- generally worse than opus 4.6

Probably not worth it for coding, but a win for anybody building agentic ai assistants of any sort with Sonnet.