Hacker News: noxa

New comment by noxa in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

noxa — Mon, 06 Apr 2026 17:40:36 +0000

I'm the author of the report in there. The stop-phrase-guard didn't get attached but here it is: https://gist.github.com/benvanik/ee00bd1b6c9154d6545c63e06a3...

You can watch for these yourself - they are strong indicators of shallow thinking. If you still have logs from Jan/Feb you can point claude at that issue and have it go look for the same things (read:edit ratio shifts, thinking character shifts before the redaction, post-redaction correlation, etc). Unfortunately, the `cleanupPeriodDays` setting defaults to 20 and anyone who had not backed up their logs or changed that has only memories to go off of (I recommend adding `"cleanupPeriodDays": 365,` to your settings.json). Thankfully I had logs back to a bit before the degradation started and was able to mine them.

The frustrating part is that it's not a workflow _or_ model issue, but a silently-introduced limitation of the subscription plan. They switched thinking to be variable by load, redacted the thinking so no one could notice, and then have been running it at ~1/10th the thinking depth nearly 24/7 for a month. That's with max effort on, adaptive thinking disabled, high max thinking tokens, etc etc. Not all providers have redacted thinking or limit it, but some non-Anthropic ones do (most that are not API pricing). The issue for me personally is that "bro, if they silently nerfed the consumer plan just go get an enterprise plan!" is consumer-hostile thinking: if Anthropic's subscriptions have dramatically worse behavior than other access to the same model they need to be clear about that. Today there is zero indication from Anthropic that the limitation exists, the redaction was a deliberate feature intended to hide it from the impacted customers, and the community is gaslighting itself with "write a better prompt" or "break everything into tiny tasks and watch it like a hawk same you would a local 27B model" or "works for me " - sucks :/

New comment by noxa in "Ask HN: Has Claude Code quality level degraded lately?"

noxa — Fri, 27 Mar 2026 15:46:23 +0000

Yeah, I went through a week or two of configuration changes trying to figure out what I could have done to make it behave that way, and it wasn't until it repaired itself and then the next morning went back to idiot-mode mid-response that I finally knew it was not me. Same task, same session, same cc version, same prompts, same context, so I'm confident it was a configuration change on their end.

In case anyone can correlate, the recovery happened on March 24th and then re-regressed at approximately 3:09 PM PST (23:09 UTC) on March 25. Flipped right back into "simplest" solutions, and "You're right, I'm sorry" mode:

> "You're right. That was lazy and wrong. I was trying to dodge a code generator issue instead of fixing it."

> "You're right — I rushed this and it shows. Let me be deliberate about the structure before writing."

> "You're right, and I was being sloppy. The CPU slab provider's prefault is real work."

New comment by noxa in "Ask HN: Has Claude Code quality level degraded lately?"

noxa — Fri, 27 Mar 2026 01:43:40 +0000

It's gotten very bad. It was degrading since late Feb and since March 8th has become unusable. "Simplest fix" and "You're right, I'm sorry" are strong indicators. It went from senior engineer to entitled intern, and I went from having a team of peers to a lazy jerk who only tries to cut corners. I've got quantitative analytics of it, too. Briefly the other day for about 24 hours it returned to normal, and then someone flipped the switch again mid-session. I was a massive proponent of Claude/Opus, and for the last several weeks have felt rug-pulled. It's such an obvious degradation that even non-technical friends have noticed it. It's optimizing for minimum effort instead of correct and clean solutions. It sucks, because had I experienced it like this from the start I'd have bounced from agentic coding and never looked back - unfortunately, I thought it'd only get better and adjusted my workflow around it. When my Qwen3.5 27B local model gets into fewer reasoning loops than Opus does, it makes me wonder if anyone there cares or if they are just chasing IPO energy from scaling.

I had to build a stop hook to catch it's garbage, and even then it's not enough. I had 30min-1hr uninterrupted sessions (some slipstreamed comments), and now I can't get a single diff that I can accept without comment. Half of the work it does is more destructive than helpful (removing comments from existing code, ignoring directives and wandering off into nowhere, etc).

From 2 weeks after installing the stop hook (around March 8th): ``` Breakdown of the 173 violations:

    73x ownership dodging (caught saying variants of "not caused by my changes")
    40x unnecessary permission-seeking ("should I continue?", "want me to keep going?")
    18x premature stopping ("good stopping point", "natural checkpoint")
    14x "known limitation" dodging
    14x "future work" / "known issue" labeling
    Various: "next session", "pause here", etc.

Peak day: March 18 with 43 violations in a single day. ```

Other one is loops in reasoning, which are something I'm familiar with on small local models, not frontier ones: ``` Sessions containing 5+ instances of reasoning-loop phrases ("oh wait", "actually,", "let me reconsider", "I was wrong"): Period Sessions with 5+ loops Before March 8 0 After March 8 7 (up to 23 instances in one session) ``` (I've even had it write code where it has "Wait, actually, we should do X" in comments in the code!)

The worst is the dodging; it said, literally, "not my code, not my problem" to a build failure it created 5 messages ago in the same session. ``` I had to tell Claude "there's no such thing as [an issue that existed before your changes]" on average:

    Once per week in January
    2-3 times per week in February
    Nearly daily from March 8 onward

```

Honestly, just venting, because I'm extremely depressed. I had the equivalent of a team of engineers I could trust, and overnight someone at Anthropic flicked a switch and killed them. I'm getting better results from random models on OpenRouter now (and OmniCoder 9B! 9B!). They aren't _good_ results, mind you, but they aren't idiotic.

Sad. Very sad.

New comment by noxa in "Windows 11 will ask consent before sharing personal files with AI after outrage"

noxa — Wed, 17 Dec 2025 02:02:14 +0000

No joke - I've used Windows (and a bit of OS X) my entire life and am old enough now that I didn't think I'd ever be able to switch. A few weeks back I hit the point where I had to upgrade from Windows 10 to 11 and just could not stomach the UX so in frustration I setup Kubuntu w/ Plasma... and it's been amazing. I've tried switching before without the same luck and I think agents like Claude/Codex/etc are the only reason it has stuck this time. Something that's always been unique to Linux is that if there's something I want to change I can generally do that, but now when I want something customized I can _actually_ do it instead of just slotting it into the infinite "if only I had time" bucket. There are quirks for sure (I'm looking at you, PipeWire) but the tinkery-ness of Linux on the desktop went from being friction to a super power for me just this month - maybe others will catch on next year.

New comment by noxa in "Arcee Trinity Mini: US-Trained Moe Model"

noxa — Tue, 02 Dec 2025 01:40:55 +0000

I hate that I laughed at this. Thanks ;)

New comment by noxa in "Serflings is a remake of The Settlers 1"

noxa — Mon, 24 Nov 2025 17:37:05 +0000

As a Settlers 1/2 fan I spent quite a bit of time in The Colonists - can recommend it if you liked the road building/flag mechanics and the chill gameplay.

New comment by noxa in "Google Antigravity"

noxa — Tue, 18 Nov 2025 18:23:28 +0000

+1 - it also doesn't support remote ssh (the open vsx variant), so is probably only focused at local web design development vibe coding ;(

Should have just been an extension with a paid plan.

New comment by noxa in "Kvcached: Virtualized, elastic KV cache for LLM serving on shared GPUs"

noxa — Tue, 21 Oct 2025 20:27:13 +0000

Neat! As someone working in this space and feeling like I've been taking crazy pills from how these "duh, CPU solved this 30 years ago" things keep slipping it's great to see more people bridging the gap! Unfortunately CUDA/HIP (and the entire stack beneath them) virtual memory management ops are very expensive host APIs (remapping a big block of pages can be O(n^2) with page count and fully synchronize host/device (forced wait idle), take kernel locks, etc) so it hasn't been viable in all cases. If your workloads are submit/wait with host in the loop the VM tricks are ok but if you are trying to never block the GPU (pipeline depth > 0) you really want to avoid anything that does a page table modification (until we get GPUs that can pipeline those). vkQueueBindSparse is one of the few async APIs I've seen, and CUDA has cuMemMapArrayAsync but I haven't yet used it (because arrays are annoying and without being able to inspect the driver I'm sure it's probably doing the wrong thing).

I've had good luck with indirection tables used during lookup inside of the kernels consuming/producing the kvcache data - it's essentially user-mode remapping like they do here: you can publish a buffer offset table and threads are uniform, have coalesced reads to the table, and cache the offsets no problem. You have the same memory locality issues as VM (contiguous virtual but potentially random physical) but are not limited to device page sizes and since you can update while work is in-flight you can be much more aggressive about reuse and offload (enqueue DMA to cold storage to evict from VRAM, enqueue DMA to copy from cold memory into reused VRAM, enqueue offset table update, enqueue work using them, repeat - all without host synchronization). You can also defrag in-flight if you do want to try to restore the physical locality. It's nothing crazy and fairly normal in CPU land (or even classic virtual texturing), but in ML GPU land I could write a big paper on it and call it SuperDuperFancyAttention4 and publish press releases...

New comment by noxa in "Metropolis 1998 lets you design every building in an isometric, pixel-art city (2024)"

noxa — Fri, 17 Oct 2025 15:17:23 +0000

As an old school TT/TTD fan this gives me so many good vibes :) Been fun watching the progress and I do recommend people check out the demos on Steam if you just want to have a good nostalgia break even if the game isn't fully there yet.

New comment by noxa in "Float Exposed"

noxa — Fri, 12 Sep 2025 16:39:03 +0000

Would be cool if this supported the various fp8 formats that have been shipped on GPUs recently!

New comment by noxa in "Show HN: Hyvector – A fast and modern SVG editor"

noxa — Sat, 10 May 2025 02:47:27 +0000

Agreed! Looks great, but I did immediately click the pencil to doodle and was disappointed nothing happened. When I created a new document and tried to use the pencil nothing happened. I never figured out how to use it. I tried the Bezier tool and was able to add some nodes but was not able to manipulate them with any of the tools. Maybe dragging is entirely broken on Chrome/Windows?

New comment by noxa in "Silicon Valley crosswalk buttons apparently hacked to imitate Musk, Zuck voices"

noxa — Sun, 20 Apr 2025 00:59:33 +0000

I love that it played the Bo Burnham "jeff bezos" song - such incredible art.

New comment by noxa in "Show HN: Game Bub – open-source FPGA retro emulation handheld"

noxa — Wed, 12 Feb 2025 21:10:10 +0000

Fantastic project and great writeup! The screen tradeoff with needing triple buffering but getting integer scaling was interesting to hear about - any feeling as to whether it adds human-noticeable latency vs. original hardware?

New comment by noxa in "Using Libc for GPUs"

noxa — Sun, 15 Dec 2024 13:57:50 +0000

Just wanted to say thanks for pushing on this front! I'm not using the libc portion but the improvements to clang/llvm that allow this to work have been incredible. When I was looking a few months back the only options that felt practical for writing large amounts of device code were cuda/hip or opencl and a friend suggested I just try C _and it worked_. Definitely made my "most practical/coolest that it actually works" list for 2024 :)

New comment by noxa in "Eric Schmidt Walks Back Claim Google Is Behind on AI Because of Remote Work"

noxa — Thu, 15 Aug 2024 17:36:43 +0000

Just seconding this as someone who'd been there for over 10yrs and left this last year for those precise reasons. Many of my peers left at the same time and we're quite frequently getting incoming interviews from others who still remain there. Very disappointing :(

New comment by noxa in "SimCity in the web browser using WebAssembly and OpenGL"

noxa — Sun, 16 Jun 2024 18:44:20 +0000

my favorite map too :)

New comment by noxa in "BlockTube, a YouTube Content Blocker"

noxa — Tue, 30 Jan 2024 15:44:04 +0000

Yeah! Seeing blackhead popping thumbnails for this stuff in my crafting/hacking/cooking feed of videos is absolutely terrifying, especially as someone with mild trypophobia. I cannot believe that these are surfaced for me given the amount of data they have or that they are surfaced for anyone without explicit searches given the content - even if considered medical/educational in nature I wouldn't expect surgery or animal dissection videos to be surfaced likewise.

New comment by noxa in "Fantasy Map Brushes"

noxa — Thu, 21 Dec 2023 20:49:36 +0000

beautiful - also recommend his novels: https://kmalexander.com/home/

New comment by noxa in "Azure announces new AI optimized VM series featuring AMD's flagship MI300X GPU"

noxa — Wed, 15 Nov 2023 20:29:08 +0000

challenge accepted ;)

(hello!)

New comment by noxa in "Tracy: A hybrid frame and sampling profiler for games and other applications"

noxa — Sat, 28 Jan 2023 21:12:59 +0000

Having written similar tools like this in the past I pretty much exclusively use Tracy nowadays and let all my old stuff rot. The UI is snappy/fluid even with a tremendous amount of events, instrumentation is rich (zones/plots/memory tracking with timing, GPU events, logging, etc) and sampling is robust across all major platforms (Windows/Android/Linux/Mac) and the disassembly/analysis is as solid as most vendor tools. If wanting to integrate/use any tool like this I strongly recommend Tracy and then occasionally using vendor tooling where better (but platform/hardware limited). Having a single swiss-army knife perf tool makes it much easier for most of the team to learn and use it and leave the specialized knowledge of vendor tooling to those who need it.

There's also superluminal (https://superluminal.eu/), but it's closed source/paid - wolfpld (the Tracy author) is responsive and there's a decent set of contributors so switching to a black box on a part of the engineering process that is designed to avoid black boxes isn't very enticing. It also is sketchy/misleading about saying things like "Superluminal is the only sampling profiler that displays the profiling data in a visual UI." which is a big turn-off for someone who's been using/building tools like that for decades :)