Hacker News: Majromax

New comment by Majromax in "Six (and a half) intuitions for KL divergence"

Majromax — Thu, 09 Apr 2026 16:46:18 +0000

The math doesn't need a 'true' or 'false' distribution; that just falls out of the use of a model ('false') to approximate reality ('true'). When the bard says "there are more things in heaven and earth, Horatio, than are dreamt of in your philosophy," he's also saying that the KL Divergence between Horatio's beliefs and reality is infinite.

We can also apply the concept between two subjective distributions. If I'm indifferent to sports teams (very broad distribution) and you're a rabid fan of A (sharp, narrow distribution), then it might take you a long time to express a point in a way I'll understand – but conversely I might be able to express "team B is good actually" in a way that just does not compute for you.

New comment by Majromax in "Škoda DuoBell: A bicycle bell that penetrates noise-cancelling headphones"

Majromax — Wed, 08 Apr 2026 19:52:38 +0000

There's also another difference: beeps can reflect coherently off of surfaces, causing directionality confusion in a dense environment. White noise is much less likely to have odd interference patterns, maximizing our ability to localize the sound.

New comment by Majromax in "We found an undocumented bug in the Apollo 11 guidance computer code"

Majromax — Tue, 07 Apr 2026 14:46:21 +0000

Since the advantage of standards is that there are so many to choose from, one lesser-used but still regionally acceptable approach (e.g. https://www.alberta.ca/web-writing-style-guide-punctuation#j...) is to use en-dashes offset with spaces.

New comment by Majromax in "Germany Power Prices Turn Deeply Negative on Renewables Surge"

Majromax — Tue, 07 Apr 2026 12:00:05 +0000

If you're buying and selling the power, then you don't care much about the absolute price, but rather the delta between the purchase and sale prices.

Below -100€/MWh, you don't need to sell the power to profit; you'd make at least some money just using it to heat up a big resistor.

New comment by Majromax in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

Majromax — Mon, 06 Apr 2026 19:02:00 +0000

I'm curious about your subscription/API comparison with respect to thinking. Do you have a benchmark for this, where the same set of prompts under a Claude Code subscription result in significantly different levels of effective thinking effort compared to a Claude Code+API call?

Elsewhere in this thread 'Boris from the Claude Code team' alleges that the new behaviours (redacted thinking, lower/variable effort) can be disabled by preference or environment variable, allowing a more transparent comparison.

New comment by Majromax in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

Majromax — Mon, 06 Apr 2026 18:56:21 +0000

If I'm reading that page correctly, then the benchmark results don't cover the interesting "mid February" inflection point noted in the article/report. The numbers appear to begin after the quality drop began. Moreover, the daily confidence interval seems to be stupidly wide, with a confidence interval between 42% and 69%?

The "Other metrics" graphs extend for a longer period, and those do seem to correlate with the report. Notably, the 'input tokens' (and consequently API cost) roughly halve (from 120M to 60M) between the beginning of February and mid-March, while the number of output tokens remains similar. That's consistent with the report's observation that new!Opus is more eager to edit code and skips reading/research steps.

New comment by Majromax in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"

Majromax — Mon, 06 Apr 2026 18:49:03 +0000

If you're using API pricing, then you can bring your own harness with full visibility/oversight of the prompting.

New comment by Majromax in "Costco sued for seeking refunds on tariffs customers paid"

Majromax — Sun, 05 Apr 2026 12:56:03 +0000

> this is a wild fantasy when dealing with any commercial entity with a fiduciary responsibility to shareholders.

"Fiduciary duty" is less strict than you'd expect. Courts generally recognize a "business judgment rule," where executives are offered broad discretion in strategy subject to some basic reasonability tests.

This would allow Costco to say "in order to cultivate goodwill and maintain our reputation, after we receive refunds we will distribute them to our customers based on purchased goods with refunded tariffs." It would also allow the directors to book the refund as profits, or use it for later incentives or marketing, or a variety of other actions.

The 'fiduciary duty' aspect here is mostly a myth. Directors do indeed have a fiduciary duty, but that duty is towards the corporation as a whole – including its long-term interests – rather than strictly towards short-term profit maximization. The fiduciary duty doctrine exists more to prevent graft and self-dealing, where managers and directors 'loot' the company by smuggling out profits in ways that benefit themselves personally rather than the company as a whole.

New comment by Majromax in "Data centers are transitioning from AC to DC"

Majromax — Wed, 25 Mar 2026 02:05:47 +0000

With modern technologies, that's power over ethernet or USB-C. Other comments in this thread point out that the telephone service also routinely used 48V for the ring signal.

However, higher DC voltage is riskier, and it's not at all standard for electrical and building code reasons. In particular, breaking DC circuits is more difficult because there's no zero-crossing point to naturally extinguish an arc, and 170V (US/120VAC) or 340V (Europe/240VAC) is enough to start a substantial arc under the right circumstances.

Unfortunately for your lighting, it's also both simple and efficient to stack enough LEDs together such that their forward voltage drop is approximately the rectified peak (i.e. targeting that 170/340V peak). That means that the bulb needs only one serial string of LEDs without parallel balancing, making the rest of the circuitry (including voltage regulation, which would still be necessary in DC world) simpler.

New comment by Majromax in "Arm AGI CPU"

Majromax — Tue, 24 Mar 2026 22:07:14 +0000

> My definition of AGI hasn't changed - it's something that can perform, or learn to perform, any intellectual task that a human can.

Wait, could you make your qualifiers specific here? Is your definition of AGI that it be able to perform/learn any intellectual task that is achievable by every human, or by any human?

Those are almost incomparably different standards. For the first, a nascent AGI would only need to perform a bit better than a "profound intellectual disability" level. For the second, AGI would need to be a real "Renaissance AGI," capable of advancing the frontiers of thought in every discipline, but at the same time every human would likely fail that bar.

New comment by Majromax in "Cursor Composer 2 is just Kimi K2.5 with RL"

Majromax — Fri, 20 Mar 2026 13:23:48 +0000

> this may also be in fact enough evidence for Anthropic to ban Cursor from using their models, just like they are doing to OpenCode.

The Anthropic ban on OpenCode isn't an Anthropic ban on OpenCode, it's a ban on using a Calude Code subscription with OpenCode. That's justified (or not) under various ToS arguments, but one can still use OpenCode with the more expensive API access.

Anthropic's complaint about distillation attacks is a distinct prong, one not levied against OpenCode. Additionally, the distillation activities described in your link don't describe Cursor's routine use of Anthropic's models. There, the model outputs are a primary product (e.g. the autocompleted code), and any learning signals provided are incidental.

New comment by Majromax in "Cursor Composer 2 is just Kimi K2.5 with RL"

Majromax — Fri, 20 Mar 2026 13:18:49 +0000

> The worthwhile question AIUI is whether AI weights are even protected by human copyright.

I'm also deeply curious about this legal question.

As I see it, model weights are the result of a mechanistic and lossy translation between training data and the final output weights. There is some human creativity involved, but that creativity is found exclusively in the model's code and training data, which are independently covered by copyright. Training is like a very expensive compilation process, and we have long-established that compiled artifacts are not distinct acts of creation.

In the case of a proprietary model like Kimi, copyright might survive based on 'special sauce' training like reinforcement learning – although that competes against the argument that pretraining on copyrighted data is 'fair use' transformation. However, I can't see a good argument that a model trained on a fully public domain dataset (with a genuinely open-source architecture) could support a copyright claim.

New comment by Majromax in "No, it doesn't cost Anthropic $5k per Claude Code user"

Majromax — Tue, 10 Mar 2026 13:56:45 +0000

> 1) Anthropic doesn't want their subsidised plans being used outside of CC, which would imply that the money their making off it isn't enough, a

Claude Code use-cases also differ somewhat from general API use, where the former is engineered for high cache utilization. We know from overall API costs (both Anthropic and OpenRouter) that cached inputs cost an order of magnitude less than uncached inputs, but OpenCode/pi/OpenClaw don't necessarily have the same kind of aggressive cache-use optimizations.

Vertically integrated stacks might also be able to have a first layer of globally shared KV cache for the system prompts, if the preamble is not user specific and changes rarely.

> 2) last time I checked, API spending is capped at $5000 a month

Per https://platform.claude.com/docs/en/api/rate-limits, that seems to only be true for general credit-funded accounts. If you contact Anthropic's sales team and set up monthly invoicing, there's evidently no fixed spending limit.

New comment by Majromax in "The surprising whimsy of the Time Zone Database"

Majromax — Sun, 08 Mar 2026 13:31:57 +0000

As far as the timezone file is concerned, it's two changes but one shift. This is covered more fully in the complete news blob rather than the snippet shown at the top. Today, British Columbia moved from Pacific Standard Time (UTC-8) to Pacific Daylight Time (UTC-7); tomorrow the timezone is renamed to Pacific Time.

Unfortunately, the "PT" abbreviation is too short for the timezone database, so while they decide on another form they will temporarily use a bare -7 offset.

New comment by Majromax in "Relicensing with AI-Assisted Rewrite"

Majromax — Thu, 05 Mar 2026 17:50:37 +0000

> “Changing the equation” by boldly breaking the law.

Is it? I think the law is truly undeveloped when it comes to language models and their output.

As a purely human example, suppose I once long ago read through the source code of GCC. Does this mean that every compiler I write henceforth must be GPL-licensed, even if the code looks nothing like GCC code?

There's obviously some sliding scale. If I happen to commit lines that exactly replicate GCC then the presumption will be that I copied the work, even if the copying was unconscious. On the other hand, if I've learned from GCC and code with that knowledge, then there's no copyright-attaching copy going on.

We could analogize this to LLMs: instructions to copy a work would certainly be a copy, but an ostensibly independent replication would be a copy only if the work product had significant similarities to the original beyond the minimum necessary for function.

However, this is intuitively uncomfortable. Mechanical translation of a training corpus to model weights doesn't really feel like "learning," and an LLM can't even pinky-promise to not copy. It might still be the most reasonable legal outcome nonetheless.

New comment by Majromax in "GPT‑5.3 Instant"

Majromax — Wed, 04 Mar 2026 03:02:14 +0000

En-dashes, set off with spaces, are an acceptable substitute for unspaced em-dashes in some style guides. See for example this Canadian government guide: https://nos-langues.canada.ca/en/writing-tips-plus/en-dash.

The use seems to be more common in British than in American English.

New comment by Majromax in "AI Made Writing Code Easier. It Made Being an Engineer Harder"

Majromax — Sun, 01 Mar 2026 18:34:01 +0000

Prompting all the way down? Have the AI create tests that document existing, known-good behaviours, then refactor while ensuring those tests pass.

New comment by Majromax in "Pi – A minimal terminal coding harness"

Majromax — Wed, 25 Feb 2026 15:21:43 +0000

For all of the recent talk about how Anthropic relies on heavy cache optimization for claude-code, it certainly seems like session-specific information (the exact datestamp, the pid-specific temporary directory for memory storage) enters awfully early in the system prompt.

New comment by Majromax in "Writing code is cheap now"

Majromax — Tue, 24 Feb 2026 15:01:44 +0000

> Have we not learned anything about technical debt and how it bites back hard?

I think LLMs are changing the nature of technical debt in weird ways, with trends that are hard to predict.

I've found LLMs surprisingly useful in 'research mode', taking an old and badly-documented codebase and answering questions like "where does this variable come from, and what are its ultimate consumers?" Its answers won't be as natural as a true expert's, but its answers are nonetheless useful. Poor documentation is a classic example of technical debt, and LLMs make it easier to manage.

They're also useful at making quick-and-dirty code more robust. I'm as guilty as anyone else of writing personal-use bash scripts that make all kinds of unjustified assumptions and accrete features haphazardly, but even in "chat mode" LLMs are capable of reasonable rewrites for these small problems.

More systematically, we also see now-routine examples of LLMs being useful at code de-obfuscation and even decompilation. These forward processes maximize technical debt compared to the original systems, yet LLMs can still extract meaning.

Of course, we're not now immune to technical debt. Vibe coding will have its own hard-to-manage technical debt, but I'm not quite sure that we have the countours well defined. Anecdotally, LLMs seem to have their biggest problem in the design space, missing the forest of architecture for the trees of implementation such that they don't make the conceptual cuts between units in the best place. I would not be so confident as to call this problem inherent or structural rather than transitory.

New comment by Majromax in "Evaluating AGENTS.md: are they helpful for coding agents?"

Majromax — Tue, 17 Feb 2026 14:34:36 +0000

Thinking happens in latent space, but the thinking trace is then the projection of that thinking onto tokens. Since autoregressive generation involves sampling a specific token and continuing the process, that sampling step is lossy.

However, it is a genuine question whether the literal meanings of thinking blocks are important over their less-observable latent meanings. The ultimate latent state attributable to the last-generated thinking token is some combination of the actual token (literal meaning) and recurrent thinking thus far. The latter does have some value; a 2024 paper (https://arxiv.org/abs/2404.15758) noted that simply adding dots to the output allowed some models to perform more latent computation resulting in higher-skill answers. However, since this is not a routine practice today I suspect that genuine "thinking" steps have higher value.

Ultimately, your thesis can be tested. Take the output of a reasoning model inclusive of thinking tokens, then re-generate answers with:

1. Different but semantically similar thinking steps (i.e. synonyms, summarization). That will test whether the model is encoding detailed information inside token latent space.

2. Meaningless thinking steps (dots or word salad), testing whether the model is performing detailed but latent computation, effectively ignoring the semantic context of

3. A semantically meaningful distraction (e.g. a thinking trace from a different question)

Look for where performance drops off the most. If between 0 (control) and 1, then the thinking step is really just a trace of some latent magic spell, so it's not meaningful. If between 1 and 2, then thinking traces serve a role approximately like a human's verbalized train of thought. If between 2 and 3 then the role is mixed, leading back to the 'magic spell' theory but without the 'verbal' component being important.