Hacker News: eis

New comment by eis in "Arena AI Model ELO History"

eis — Thu, 14 May 2026 08:21:27 +0000

Advantage for what exactly though? I'm not saying Elo Ranking doesn't give any information. It just doesn't give the information that the OP's project claims to be able to give: that models get nerfed over time. You could extract this kind of information from the raw results of each evaluation round between two models, ignoring any new model entries and compare these over time but not from the resulting Elo scores with an ever changing list of models.

New models are on average better than older models, the average skill of the population of models increases over time and so you are mathematically guaranteed that any existing model will over time degrade in Elo score even though it didn't change itself in any way.

It's like benchmarking a model against a list of challenges that over time are made more and more difficult and then claiming the model got nerfed because its score declined.

Elo is good at establishing an overall ranking order across models but that's not what this is about.

To detect nerfing of a model, projects like https://marginlab.ai/trackers/claude-code/ are much much better (I'm not affiliated in any way).

New comment by eis in "Arena AI Model ELO History"

eis — Thu, 14 May 2026 05:38:15 +0000

The Elo rating system measures relative performance to the other models. As the other models improve or rather newer better models enter the list, the Elo score of a given existing model will tend to decrease even though there might be no changes whatsoever to the model or its system prompt.

You can't use Elo scores to measure decay of a models performance in absolute terms. For that you need a fixed harness running over a fixed set of tests.

New comment by eis in "Cloudflare to cut about 20% workforce"

eis — Fri, 08 May 2026 09:37:45 +0000

Interestingly NET is down 15%-ish in extended hours trading and was even down 20% at some point. Many times a stock will make a positive move when layoffs are announced.

Cloudflare is a growing company by most metrics so if efficiencies through AI were the reason for the layoffs they'd just take the boost and grow even faster.

It all doesn't check out and I think the real reason for the layoffs and the negative sentiment by the market on the news is that their revenue growth was not as fast as their expenses and they realized they overhired. Leadership doesn't want to dive too much into the red even if it would mean bigger growth down the line. They are now beholden to the near and mid term stock performance.

I've had the chance to talk to some SWEs working at Cloudflare off the record in recent months and the one concensus I heard was that there was many times some tension between the boots on the ground and the decisions from senior managment but of course nothing they could do and especially after this they'll make sure to be quiet should they remain. There seemed to be a lot of pressure to deliver features and new products but quality has been left behind which means the SWEs felt pressure to deliver while also having to deal with the ensuing issues to resolve.

Either way I wish everyone affected the best and a speedy job hunt - there'll be quite a few really good people on the market now for no fault of their own.

New comment by eis in "The Prompt API"

eis — Mon, 27 Apr 2026 13:28:26 +0000

Google will soon release Gemini Nano 4 based on Gemma 4. A "Fast" version based on Gemma 4 E2B and a "Full" version based on E4B.

https://android-developers.googleblog.com/2026/04/AI-Core-De...

New comment by eis in "Zero-Copy GPU Inference from WebAssembly on Apple Silicon"

eis — Sun, 19 Apr 2026 14:19:27 +0000

I don't think people are crediting Apple with inventing unified memory - I certainly did not. There have been similar systems for decades. What Apple did is popularize this with widely available hardware with GPUs that don't totally suck for inference in combination with RAM that has decent speed at an affordable price. You either had iGPUs which were slow (plus not exactly the fastest DDR memory) but at least sitting on the same die or you had fast dGPUs which had their own limited amount of VRAM. So the choice was between direct memory access but not powerfull or powerfull but strangled by having to go through the PCIE subsystem to access RAM.

The article is talking about one particular optimization that one can implement with Apple Silicon and I at least wasn't aware that it is now possible to do so from WebAssembly - so to completely dismiss it as if it had nothing to do with Apple Silicon is imho not fair.

New comment by eis in "Zero-Copy GPU Inference from WebAssembly on Apple Silicon"

eis — Sun, 19 Apr 2026 07:54:57 +0000

Apple Silicon uses unified memory where the CPU and GPU use the exact same memory and no copies from RAM to VRAM are needed. The article opens with mentioning just that and indeed it is the whole point of the article.

New comment by eis in "Cloudflare's AI Platform: an inference layer designed for agents"

eis — Fri, 17 Apr 2026 07:33:59 +0000

Yes but that is just a tiny part of the whole CF worker ecosystem. The other services are not open source and so the lock-in is very very real. There are no API compatible alternatives that cover a good chunk of the services. If you build your application around workers and make use of the integrated services and APIs there is no way for you to switch to another provider because well, there is none.

New comment by eis in "Cloudflare's AI Platform: an inference layer designed for agents"

eis — Fri, 17 Apr 2026 07:30:11 +0000

And now you've put everything on the equivalent of a single NodeJS process running on a tiny VM. Next step: spread out over multiple durable objects but that means implementing a sharding logic. Complexity escalates very fast once you leave toy project territory.

New comment by eis in "Cloudflare's AI Platform: an inference layer designed for agents"

eis — Fri, 17 Apr 2026 07:14:01 +0000

> How did you work around this problem? As in, how do you monitor for hung queries and cancel them?

You just wrap your DB queries in your own timeout logic. You can then continue your business logic but you can't truly cancel the query because well, the communication layer for it is stuck and you can't kill it via a new connection. Your only choice is to abandon that query. Sometimes we could retry and it would immediately succeed suggesting that the original query probably had something like packetloss that wasn't handled properly by CF. Easy when it's a read but when you have writes then it gets complicated fast and you have to ensure your writes are idempotent. And since they don't support transactions it's even more complex.

Aphyr would have a field day with D1 I'd imagine.

> What about reads? We use D1 in prod & our traffic pattern may not be similar to yours (our workload is async queue-driven & so retries last in order of weeks), nor have we really observed D1 erroring out for extended periods or frequently.

We have reads and writes which most of the time are latency sensitive (direct user feedback). A user interaction can usually involve 3-5 queries and they might need to run in sequence. When queries take 500ms+ the system starts to feel sluggish. When they take 2-3s it's very frustrating. The high latencies happened for both reads and writes, you can do a simple "SELECT 123" and it would hang. You could even reproduce that from the Cloudflare dashboard when it's in this degradated state.

From the comments of others who had similar issues I think it heavily depends on the CF locations or D1 hosts. Most people probably are lucky and don't get one of the faulty D1 servers. But there are a few dozen people who were not so lucky, you can find them complaining on Github, on the CF forum etc. but simply not heard. And you can find these complaints going back years.

This long timeframe without fixes to their network stack (networking is CF's bread and butter!), the refusal to implement transactions, the silence in their forum to cries for help, the absurdly low 10GB limit for databases... it just all adds up. We made the decision to not implement any new product on D1 and just continue using proper databases. It's a shame because workers + a close-by read replica could be absolutely great for latency. Paradoxically it was the opposite outcome.

New comment by eis in "Cloudflare's AI Platform: an inference layer designed for agents"

eis — Thu, 16 Apr 2026 18:48:16 +0000

D1 reliability has been bad in our experience. We've had queries hanging on their internal network layer for several seconds, sometimes double digits over extended periods (on the order of weeks). Recently I've seen a few times plain network exceptions - again, these are internal between their worker and the D1 hosts. And many of the hung queries wouldn't even show up under traces in their observability dashboard so unless you have your own timeout detection you wouldn't even know things are not working. It was hard to get someone on their side to take a look and actually acknowledge and understand the problem.

But even without network issues that have plagued it I would hesitate to build anything for production on it because it can't even do transactions and the product manager for D1 openly stated they wont implement them [0]. Your only way to ensure data consistency is to use a Durable Object which comes with its own costs and tradeoffs.

https://github.com/cloudflare/workers-sdk/issues/2733#issuec...

The basic idea of D1 is great. I just don't trust the implementation.

For a hobby project it's a neat product for sure.

New comment by eis in "GLM-5.1: Towards Long-Horizon Tasks"

eis — Tue, 07 Apr 2026 18:17:45 +0000

The blog post has a benchmark comparison table with these two in it

New comment by eis in "Qwen3.6-Plus: Towards real world agents"

eis — Thu, 02 Apr 2026 15:01:57 +0000

Quite strong results in the benchmarks but why Gemini 3 Pro instead of 3.1? Why only for a few of the benchmarks? Why is OpenAI not there in the coding benchmarks? Why Opus 4.5 and not 4.6? Just jumps out into my eye as a bit strange.

As always, we'll have to try and see how it performs in the real world but the open weight models of Qwen were pretty decent for some tasks so still excited to see what this brings.

New comment by eis in "What Gödel Discovered (2020)"

eis — Thu, 02 Apr 2026 10:11:28 +0000

> These theorems apply to any system of axioms that are rich enough to state the liar's paradox.

Isn't that circular reasoning or tautological though? Rephrased: any system that can state something that these theorems apply to, can have the theorems applied to.

I think the word "rich" is too inaccurate in this context. It is not clear why there can't be a more "rich" system which does not suffer from this issue and can't state the liars paradox.

New comment by eis in "EmDash – A spiritual successor to WordPress that solves plugin security"

eis — Wed, 01 Apr 2026 19:37:23 +0000

After all the AI slop from Cloudflare in recent months and the embarrassment that came with it, they dare to launch this vibe coded project with THAT name on April 1st? I'm really not sure what to think anymore. Reality became too absurd.

New comment by eis in "EmDash – a spiritual successor to WordPress that solves plugin security"

eis — Wed, 01 Apr 2026 19:33:20 +0000

After https://news.ycombinator.com/item?id=46781516 ? Yes, one can unfortunately put that into the realm of reality.

New comment by eis in "Tell HN: Chrome says "suspicious download" when trying to download yt-dlp"

eis — Tue, 31 Mar 2026 15:43:23 +0000

Which link exactly did you try to use? Or what specific version on the Github releases page? I checked both the latest windows and macos versions against Google Safe Browsing and all were fine.

New comment by eis in "Britain today generating 90%+ of electricity from renewables"

eis — Sat, 28 Mar 2026 14:44:15 +0000

Why would you classify nuclear as renewable? You can say it's clean energy but it's not renewable.

New comment by eis in "A better streams API is possible for JavaScript"

eis — Fri, 27 Feb 2026 15:49:58 +0000

People are understandably a bit sensitized and sceptical after the last AI generated blog post (and code slop!) by Cloudflare blew up. Personally I'm fine with using AI to help write stuff as long as everything is proof-read and actually represents the authors thoughts. I would have opted to be a bit more careful and not use AI for a few blog posts after the last incident though if I was working at Cloudflare...

US Supreme Court rejects Trump's global tariffs

eis — Fri, 20 Feb 2026 15:18:34 +0000

Article URL: https://www.reuters.com/legal/government/us-supreme-court-rejects-trumps-global-tariffs-2026-02-20/

Comments URL: https://news.ycombinator.com/item?id=47089070

Points: 24

# Comments: 1

New comment by eis in "Deno Sandbox"

eis — Wed, 04 Feb 2026 03:07:10 +0000

What's with the pricing of these sandbox offerings recently? I assume just trying to milk the AI trend.

It's about 10x what a normal VM would cost at a more affordable hoster. So you better have it run only 10% of the time or you're just paying more for something more constrained.

A full month of runtime would be about $50 bucks for a 2vCPU 1GB RAM 10GB SSD mini-VM that you can get easily for $5 elsewhere.