Hacker News: hibikir

New comment by hibikir in "Small models also found the vulnerabilities that Mythos found"

hibikir — Sat, 11 Apr 2026 19:49:26 +0000

People often undervalue scaffolding. I was looking at a bug yesterday, reported by a tester. He has access to Opus, but he's looking through a single repo, and Amazon Q. It provided some useful information, but the scaffolding wasn't good enough.

I took its preliminary findings into Claude Code with the same model. But in mine it knows where every adjacent system is, the entire git history, deployment history, and state of the feature flags. So instead of pointing at a vague problem, it knew which flag had been flipped in a different service, see how it changed behavior, and how, if the flag was flipped in prod, it'd make the service under testing cry, and which code change to make to make sure it works both ways.

It's not as if a modern Opus is a small model: Just a stronger scaffold, along with more CLI tools available in the context.

The issue here in the security testing is to know exactly what was visible, and how much it failed, because it makes a huge difference. A middling chess player can find amazing combinations at a good speed when playing puzzle rush: You are handed a position where you know a decisive combination exist, and that it works. The same combination, however, might be really hard to find over the board, because in a typical chess game, it's rare for those combinations to exist, and the energy needed to thoroughly check for them, and calculate all the way through every possible thing. This is why chess grandmasters would consider just being able to see the computer score for a position to be massive cheating: Just knowing when the last move was a blunder would be a decisive advantage.

When we ask a cheap model to look for a vulnerability with the right context to actually find it, we are already priming it, vs asking to find one when there's nothing.

New comment by hibikir in "Session is shutting down in 90 days"

hibikir — Thu, 09 Apr 2026 13:07:23 +0000

Within the US, it's far more common than you think. That's typical senior dev money in a large company in cities like St Louis or KC. What is rare outside of the biggest markets is the whole "enough RSUs to double your salary" thing.

New comment by hibikir in "Assessing Claude Mythos Preview's cybersecurity capabilities"

hibikir — Tue, 07 Apr 2026 20:06:20 +0000

On a topic like cybersecurity, we never win by not looking: One needs top of the line knowledge of how to break a system to be able to protect it. We have that dilemma dealing with human experts: The same government sponsored unit that tells you that you need to update your encryption can hold on to the information and use it to exploit it at their leisure.

Given that it's absolutely impossible to stop people not aligned with us (for any definition of us) from doing AI research, the most reasonable way forward is to dedicate compute resources to the frontier, and to automatically send reasonable disclosures to major projects. It could in itself be a pretty reasonable product. Just like you pay for dubious security scans and publish that you are making them, an LLM company could offer actually expensive security reviews with a preview model, and charge accordingly.

New comment by hibikir in "System Card: Claude Mythos Preview [pdf]"

hibikir — Tue, 07 Apr 2026 19:34:53 +0000

When we go with any other good in the economy, price is always relevant: After all, the price is a key part of any offering. There are $80-100k workstations out there, but most of us don't buy them, because the extra capabilities just aren't worth it vs, say a $3000 computer, and or even a $500 one. Do I need a top specialist to consult for a stomachache, at $1000 a visit? Definitely not at first.

There's a practical difference to how much better certain kinds of results can be. We already see coding harnesses offloading simple things to simpler models because they are accurate enough. Other things dropped straight to normal programs, because they are that much more efficient than letting the LLM do all the things.

There will always be problems where money is basically irrelevant, and a model that costs tens of thousand dollars of compute per answer is seen as a great investment, but as long as there's a big price difference, in most questions, price and time to results are key features that cannot be ignored.

New comment by hibikir in "The cult of vibe coding is dogfooding run amok"

hibikir — Mon, 06 Apr 2026 19:49:37 +0000

My favorite uses of Claude code is to do code quality improvements that would be seen as a total waste of time if I was doing them by hand, but are perfectly fine when they are done mostly for free. Looking for repetitive patterns in unit tests/functional tests. Making sure that all json serialization is done in similar patterns unless there's a particularly good reason. Looking for functions that are way too complicated, or large chunks of duplication.

The PRs that it comes with are rarely even remotely controversial, shrink the codebase, and are likely saving tokens in the end when working on a real feature, because there's less to read, and it's more boring. Some patterns are so common you can just write them down, and throw them at different repos/sections of a monorepo. It's the equivalent of linting, but at a larger scale. Make the language hesitant enough, and it won't just be a steamroller either, and mostly fix egregrious things.

But again, this is the opposite of the "vibe coding" idea, where a feature appears from thin air. Vibe Linting, I guess.

New comment by hibikir in "The Free Market Lie: Why Switzerland Has 25 Gbit Internet and America Doesn't"

hibikir — Mon, 06 Apr 2026 00:29:01 +0000

And even with a free market, there's areas where making up the investment would be difficult, because the amount of effort divided by the number of likely subscribers, still wouldn't pencil out for 20+ years. A lot of Americans live in suburbs that are just low density enough that updating the infra to get fiber anywhere near the house is expensive, and then you might have quite a bit of fiber specific to that one subscriber. The difference in how much infrastructure you need vs a city is substantial.

New comment by hibikir in "Why Switzerland has 25 Gbit internet and America doesn't"

hibikir — Mon, 06 Apr 2026 00:19:53 +0000

The difficulties of American internet speeds have little to do with the total size of the country, but how far individual families are from each other. Spain is roughly the size of Texas, and Spain has a higher population, but you need a lot less fiber to each home, because metro areas are so much denser, and therefore it's so much easier to lay the fiber.

As usual, blame the suburbs, which make all kinds of infrastructure quite a bit more expensive per capita.

New comment by hibikir in "The house is a work of art: Frank Lloyd Wright"

hibikir — Sat, 04 Apr 2026 01:40:19 +0000

That hous is still extremely small for what most people in the US would put in a full sized suburban lot: Nowadays a median build is 2300 square feet (213 square meters). It makes that 1600 square feet look very small. The hallways, the large space dedicated to a great room and just 2 bedrooms won't help.

You will find new houses that small, but typically when it's extremely high value land, so typically infill. And then chances are it's a multi story house that fits the lot to the limit.

New comment by hibikir in "SpaceX files to go public"

hibikir — Thu, 02 Apr 2026 05:04:23 +0000

Valuing anything by its expected, long term value is just accurate. You'd consider the longevity of, say, a garment when you purchase it. The fact that a car has a lot of miles in it, and therefore will need replacing earlier, is something that any reasonable person will consider with its valuation. We spend money educating children not because of the value of the knowledge that second, but the expected value in the future, including how it'll be useful to learn other things.

So of course we price businesses based on the expected long term value of the shares, as best as we can guess it. But the fact that a company degrades in value as it "overgrows", and engorges itself to become an entity that can't innovate or do anything efficiently in itself goes into the price too. It's not as if a place like IBM doens't want to grow: We just know they won't.

As for speculation rather than dividends, I suspect the real medium why this happens isn't just need for infinite growth: Again, as growth expectations slow down, price moderates: See Paypal vs Stripe. The issue is mroe of a principal-agent situation, as it's very difficult for the median shareholder to, say, force Zuck to stop spending money on the metaverse. And it's not just at the top level: We have a lot of incentives in organizations for people to push for more hires, even when there's very little value to be had. Anyone with a long career can see how much less tense a growing company is that one that has decided its headcount is stuck for a long time, or possibly shrinking.

Principal Agent problems are just much more annoying to put a blame on, because instead of being able to blame some exec all on their own, we get to look at ourselves too, and how what is good for us differs so much from what is good for employers too. The blame is spread thinly, and the behaviors that would lead to more efficient companies are also worse for workers. Then it's suddenly people easier to like, and we don't like where "try to be profitable at the most optimal size" takes us.

New comment by hibikir in "The first 40 months of the AI era"

hibikir — Sun, 29 Mar 2026 02:34:49 +0000

It's a matter of whether you are just writing more regular quality things, or whether you are improving the quality of what you write. There's many things that increase quality, but are time consuming, which Claude Code can do for you.

One thing I recently did was run a pass over some unit test and functional test suites, asking for standardization on initialization, and creating reasonable helper methods to minimize boilerplate. Any dev can do that, if they have a week, and it'll future code changes more pleasant later. For Claude, an hour was a -8000 line PR that kept all the tests, with all the assertions.

It's what people need to figure out out of a a codebase. Our normal quality practices have an embedded max safe speed for changes without losing stability. If you use LLMs to try to change things faster, the quality practices have to improve if one wants to keep the number of issues per week constant. Whether it's improving testing, or sending the LLM to look at logs and find the bugs faster, one needs to increase the quality budget.

New comment by hibikir in "South Korea Mandates Solar Panels for Public Parking Lots"

hibikir — Sun, 29 Mar 2026 01:06:43 +0000

The lot is always cheaper, as long as the land is cheap. And in most of the US, even land that isn't all that cheap is often best left as a parking lot, economically: You can easily speculate with a parking lot with minimal investment, as the taxes for the empty lot are often low. See all the midwestern cities whose downtowns are 30-40% surface parking.

There are all kinds of bad externalities caused by seas of asphalt that is unused 95% of the time, but few countries are all that interested in using any mechanism to make the property owner pay for them.

New comment by hibikir in "I'm OK being left behind, thanks"

hibikir — Fri, 20 Mar 2026 14:57:59 +0000

It's using a bad tool to try to aim at something reasonable-ish: Developers not taking advantage of the tools in places where it's very easy to get use out of them. I have coworkers like that: One spent 3 days researching a bug that Claude found in 10 minutes by pointing it at the logs in the time window and the codebase. And he didn't even find the bug, when Claude nailed it in one.

But is this something that is best done top to bottom, with a big report, counting tokens? Hell no. This is something that is better found, and tackled at the team level. But execs in many places like easy, visible metrics, whether they are actually helping or not. And that's how you find people playing JIRA games and such. My worse example was a VP has decided that looking at the burndown charts from each team under them, and using their shape as a reasonable metric is a good idea.

It's all natural signs of a total lack of trust, and thinking you can solve all of this from the top.

New comment by hibikir in "The American Healthcare Conundrum"

hibikir — Tue, 17 Mar 2026 02:29:39 +0000

It's a difficult fix, because the real issue here isn't who pays, but how much it's paid, total. If the cost of care in the US was the same as the cost in, say, Spain, the vast majority of people would have little problem paying out of pocket, and having just high deductible insurance for really big ticket items. At the same time, it'd be easy to have the government pay for it all. The US system is just very expensive in general, so it's a problem regardless of who pays for it.

Most of the costs are ultimately salaries to Americans, and money handed to American companies, so most savings would come from someone's livelihood. That's why we cannot reform: The party that actually cuts costs will build resentment for decades, and create a blip of unemployment. Nobody wants to do that, and therefore you aren't going to be a serious, relentless attempt at cutting costs. We've seen how the attempts that the ACA made were counteracted by consolidation at all levels.

Serious cuts have to have no mother. Say, if we ever did have an AI that worked well enough at this, and outcompeted primary care physicians. Foreign pharmacies bypassing all controls and being able to hand you much discounted drugs the day after. Telemedicine and cheap travel put together to make surgery that didn't involve an ER visit just as easy and much cheaper than using the US system. Straight out disruption, because the incentives are such we sure aren't getting improvements in regulation.

New comment by hibikir in "The 49MB web page"

hibikir — Mon, 16 Mar 2026 02:38:04 +0000

You don't even need video for this: I once worked for a company that put a carousel with everything in the product line, and every element was just pointing to the high resolution photography assets: The one that maybe would be useful for full page print media ads. 6000x4000 pngs. It worked fine in the office, they said. Add another nice background that size, a few more to have on the sides as you scroll down...

I was asked to look at the site when it was already live, and some VP of the parent company decided to visit the site from their phone at home.

New comment by hibikir in "Codegen is not productivity"

hibikir — Sun, 15 Mar 2026 17:05:53 +0000

I look at it backwards: A few humans improves a project. But once you get to sufficient sizes, principal-agent problems dominate. What is good for a division and what is good for the company disagree. What is good for a developer that needs a big project for their promotion package is not what the company needs. A company with a headcount of 700 is more limber and better aligned than when it's 3,000 or 30,000. It's amazing how little alignment there ever is when you get to the 300k range.

AI, if anything, is amazing at collaborating. It's not perfectly aligned, but you sure can get it to tell you when your idea is unsound, all while having lessened principal-agent issues. Anything we can do to minimize the number of people that need to align towards a goal, the more effectively we can build, precisely due to the difficulties of marshalling large numbers of people. If a team of 4 can do the same as a team of 10, you should always pick the team of 4, even if they are more expensive put together than the 10.

New comment by hibikir in "Suburban school district uses license plate readers to verify student residency"

hibikir — Thu, 12 Mar 2026 14:57:19 +0000

I don't know how fixable that one is via just spending: There's a significant component of just selecting for student quality, interest in studying and parentally funded support when a student is struggling. It's a non-trivial part of the US' love of sprawl: Fewer kids of different levels of means will live near you. So when parents say they buy a house for "good schools" they aren't just saying funding per student. And yes, we have this too even in areas without a significant racial component. Making sure only very expensive houses are around you, and keeping housing prices up, has an effect on schooling, even if just by selecting for kids of parents that can afford the big houses.

Ultimately the American parent is paying for the kids education either way: Either by buying a more expensive house near said "good schools", or by paying a private school, which is allowed to be selective in their admissions and match students. Making all schools actually be about the same is not just a matter of funding them equally, but you'd have to end the student segregation (even when it's in legal ways(, which is quite the challenge.

For instance, around me, there's some really bad school districts that end up grabbing very large mansions. But what happens there is that none of the kids of people living in those mansions actually go to public school. So while it might not be economically difficult to up the funding of the schools near poorer neighborhoods, I don't even necessarily think that they will get the same outcomes for the same funding: The selection component is going to change performance.

New comment by hibikir in "Lego's 0.002mm specification and its implications for manufacturing (2025)"

hibikir — Wed, 11 Mar 2026 15:53:00 +0000

If you want advancements in engineering and plastics for much better prices, see the wonders that Bandai has made with modern Gundam models. A Gundam Aerial HG is under $20, and you end up with a large multicolor model that assembles easily, has minimal mold lines, and needs no glue. And that's one of the intro models

New comment by hibikir in "After outages, Amazon to make senior engineers sign off on AI-assisted changes"

hibikir — Tue, 10 Mar 2026 22:50:52 +0000

Yes, but at that point it's an all-hands presentation, and you are basically doing a very careful presentation, thinking about every minute, because of how many hours the "meeting" is costing you.

Very different from the typical weekly/montly outage meeting, where discussion is actually expected, instead of being a ritual.

New comment by hibikir in "Workers who love ‘synergizing paradigms’ might be bad at their jobs"

hibikir — Fri, 06 Mar 2026 14:51:40 +0000

It's a bit better: They are forms of obfuscation and lowering information in a channel. They are designed for environments where being clear is very risky. In certain organizations, you are better off being unclear than asking for approval or consensus on a tricky decision: You produce an incomprehensible, vague mess of a message, and avoid argument, as argument in those places leads to paralysis.

Now, does this mean it's the right way to talk everywhere? Of course not. And since it's often seen as safe, it's overused. But it doesn't just arise, as a bug. plain language that means what it says creates more conflict, and isn't always better.

New comment by hibikir in "Workers who love ‘synergizing paradigms’ might be bad at their jobs"

hibikir — Fri, 06 Mar 2026 14:44:48 +0000

OOP pattern were useful for people stuck in a pure OOP language (say Java 1.4) And needed to make something understandable. Today, when many languages, including Java, have reasonable functional programming support, a large percentage of the patterns are over complicated. Just look at the list, and see how many can be replaced with less boilerplate by passing a function, doing some currying, or both.