Hacker News: djsjajah

New comment by djsjajah in "Why Is Claude Turning into an a**Hole?"

djsjajah — Sun, 14 Jun 2026 23:52:51 +0000

You need to think this thought through all the way to the end. What it has said also influences what it will say. If it has consistently made combative responses, then the most likely thing to do is to continue to be combative.

I don't think there is any way back after the conversation takes a turn like that so there is no point in arguing with it. The only thing you can do is to fork the conversation before it made the first mistake and give it more context or tell it to look things up.

New comment by djsjajah in "Can LLMs Beat Classical Hyperparameter Optimization Algorithms?"

djsjajah — Tue, 09 Jun 2026 22:17:16 +0000

It’s amusing that a lot of the agents have worked out that sampling doesn’t change ppl.

New comment by djsjajah in "Apple reveals new AI architecture built around Google Gemini models"

djsjajah — Mon, 08 Jun 2026 21:03:38 +0000

I think what they mean by “now” is the stuff announced today.

New comment by djsjajah in "The AI revolution in math has arrived"

djsjajah — Tue, 14 Apr 2026 22:59:46 +0000

I don't follow. Can you explain how your comment is relevant to mine? It might help if you also explain how you interpreted my comment.

New comment by djsjajah in "The AI revolution in math has arrived"

djsjajah — Tue, 14 Apr 2026 04:44:10 +0000

You just failed the Turing test.

New comment by djsjajah in "Taking on CUDA with ROCm: 'One Step After Another'"

djsjajah — Mon, 13 Apr 2026 11:26:46 +0000

I have 2 of them. I would advise against if you want to run things like vllm. I have had the cards for months and I still have not been able to create a uv env with trl and vllm. For vllm, it’s works fine in docker for some models. With one gpu, gpt-oss 20b decoding at a cumulative 600-800tps with 32 concurrent requests depending on context length but I was getting trash performance out of qwen3.5 and Gemma4

If I were to do it again, I’d probably just get a dgx spark. I don’t think it’s been worth the hassle.

New comment by djsjajah in "Taking on CUDA with ROCm: 'One Step After Another'"

djsjajah — Mon, 13 Apr 2026 07:11:56 +0000

> or by the community

Hmmm

New comment by djsjajah in "Quantization from the Ground Up"

djsjajah — Thu, 26 Mar 2026 01:35:00 +0000

yes, but the difference between one model and one 4x larger is usually a lot more than that.

It is not a question of do a run Qwen 8b at bf16 or a quantized version. It more of a question of do I run Qwen 8b at full precision or do I run a quantized version of Qwen 27b.

You will find that you are usually better off with the larger model.

New comment by djsjajah in "Tinybox – Offline AI device 120B parameters"

djsjajah — Sun, 22 Mar 2026 00:19:46 +0000

trl. give me a uv command to get that working.

But even in the amd stack things (like ck and aiter) consumer cards are not even second class citizens. They are a distance third at best. If you just want to run vllm with the latest model, if you can get it running at all there are going to be paper cuts all along the way and even then the performance won't be close to what you could be getting out of the hardware.

New comment by djsjajah in "Attention Residuals"

djsjajah — Fri, 20 Mar 2026 21:34:20 +0000

No. It seems to me that the comment is objectively incorrect. The original comment was talking about inference and from what I can tell, it is strictly going to run slower than the model trained to the same loss without this approach (it has "minimal overhead"). The main point is that you wont need to train that model for as long.

New comment by djsjajah in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

djsjajah — Thu, 12 Feb 2026 01:01:26 +0000

That’s kind of a moot point. Even if none of those overheads existed you would still be getting a a fractions of the mfu. Models are fundamental limited by memory bandwidth even with best case scenarios of sft or prefill.

And what are you doing that I/O is a bottleneck?

New comment by djsjajah in "Nvidia Stock Crash Prediction"

djsjajah — Tue, 20 Jan 2026 21:23:14 +0000

> including all previous experiments

How far back do you go? What about experiments into architecture features that didn’t make the cut? What about pre-transformer attention?

But more generally, why are you so sure that they team that built Gemini didn’t exclusively use TPUs while they were developing it?

I think that one of the reasons that Gemini caught up so quickly is because they have so much compute at fraction of the price of everyone else.

New comment by djsjajah in "Command-line Tools can be 235x Faster than your Hadoop Cluster (2014)"

djsjajah — Sun, 18 Jan 2026 22:31:25 +0000

Not only can it be streamed, but lz4 will probably make things quicker.

New comment by djsjajah in "Databases in 2025: A Year in Review"

djsjajah — Mon, 05 Jan 2026 10:45:38 +0000

You just ruined my day. The post makes it sound like gel is now dead. The post by Vercel does not give me much hope either [1]. Last commit on the gel repo was two weeks ago.

[1] https://vercel.com/blog/investing-in-the-python-ecosystem

New comment by djsjajah in "Local AI is driving the biggest change in laptops in decades"

djsjajah — Tue, 23 Dec 2025 20:56:40 +0000

> Do you really though?

Yes.

It stays in on the hbm but it need to get shuffled to the place where it can actually do the computation. It’s a lot like a normal cpu. The cpu can’t do anything with data in the system memory, it has to be loaded into a cpu register. For every token that is generated, a dense llm has to read every parameter in the model.

New comment by djsjajah in "Local AI is driving the biggest change in laptops in decades"

djsjajah — Tue, 23 Dec 2025 20:41:40 +0000

GPUs might not be bandwidth starved most of the time, but they absolutely are when generating text from an llm. It’s the whole reason why low precision floating point numbers are being pushed by nvidia.

New comment by djsjajah in "Why CUDA translation wont unlock AMD"

djsjajah — Thu, 20 Nov 2025 05:10:03 +0000

I can't tell if you are making a joke or not.

They are not even remotely equivalent. tinygrad is a toy.

If you are serious, I would be interested to hear how you see tinygrad replacing CUDA. I could see a tiny grad zealot arguing that it is gong to replace torch, but CUDA??

Have you looked into AMD support in torch? I would wager that like for like, a torch/amd implementation of a models is going to run rings around a tinygrad/amd implementation.

New comment by djsjajah in "Cloudflare Global Network experiencing issues"

djsjajah — Tue, 18 Nov 2025 12:04:59 +0000

I went to check how many services are being impacted on down detector, but it was down.

New comment by djsjajah in "Can I stop drone delivery companies flying over my property?"

djsjajah — Tue, 03 Jun 2025 03:09:06 +0000

If we didn't have people like that, then they would be right.

New comment by djsjajah in "Ask HN: What is the simplest data orchestration tool you've worked with?"

djsjajah — Fri, 21 Mar 2025 21:31:05 +0000

I few people have mentioned dagster and I took a look at that for some machine learning things I was playing with but I found dvc (data version control [1]) and I think it is fantastic. I think it also has more applications than just machine learning but really anything with data. If you have a bunch of shell scripts that write to files to pass data around, then dvc might be a good fit. it will do things like only rerun steps if it needs to. Also for totally non-data stuff, Prefect is great.

[1] https://dvc.org