Hacker News: eoerl

New comment by eoerl in "Lossless LLM compression for efficient GPU inference via dynamic-length float"

eoerl — Fri, 25 Apr 2025 22:53:33 +0000

Optimization is post hoc here : you have to train first to be able to huffman en ode, so it's not a pure format question

New comment by eoerl in "Romanian court annuls result of presidential election first round"

eoerl — Sat, 07 Dec 2024 18:48:17 +0000

In a lot of countries there are rules, for instance limitations in terms of spending or similar time on air for all candidates. I don't know whether that's the case in Romania, but it is completely possible to rule an election out even if people voted "freely". I know that typically doesn't apply to the US, but there's a world outside of it

New comment by eoerl in "Tenstorrent unveils Grayskull, its RISC-V answer to GPUs"

eoerl — Mon, 11 Mar 2024 22:04:26 +0000

These can also be used for machine learning actually (see Dali for data loading for instance)

New comment by eoerl in "If you're interested in eye-tracking, I'm interested in funding you"

eoerl — Sun, 27 Aug 2023 20:18:38 +0000

We (The Eye Tribe folks) sold one at 99$ years ago. 1k-3k is mostly lack of competition I believe.

New comment by eoerl in "Non-determinism in GPT-4 is caused by Sparse MoE"

eoerl — Sun, 06 Aug 2023 05:54:04 +0000

There are flags[1] for that indeed. It feels like half of the people commenting here don't know all that much about the topic they're commenting upon

1: https://pytorch.org/docs/stable/generated/torch.use_determin...

New comment by eoerl in "Google doesn’t want employees working remotely anymore"

eoerl — Thu, 08 Jun 2023 12:28:08 +0000

> It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

There's no data on this, at the very least you could mention that it's only your personal impression ?

IMO (and this is clearly a personal take) there are two competing effects: - higher bandwidth and easier to align face to face - more distractions, interruptions, more complicated to get things done

If you're in a business or position where you have no IP or nothing hard to do per say, you'll see the first one dominate. If you're somewhere with IP and competitive advantages through smarts then I'd say (personal again) the second effect can come to dominate.

Google pulling a "no remote" move means to me that their competitive advantage in terms of engineering and smarts is not a priority + using the fact that the market swung back towards employers vs. employees. But not general comment about "this take is obviously so much better", this is just intellectual lazyness I believe

New comment by eoerl in "Stable Diffusion 2.0"

eoerl — Thu, 24 Nov 2022 06:16:48 +0000

It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.

New comment by eoerl in "Fast-stable-diffusion colabs, +25% speed increase and memory efficient"

eoerl — Tue, 27 Sep 2022 07:09:49 +0000

identical outputs, up to float computation shenanigans (not computed in the same order, strictly speaking)

New comment by eoerl in "Fast-stable-diffusion colabs, +25% speed increase and memory efficient"

eoerl — Tue, 27 Sep 2022 07:05:31 +0000

yep, same approach but it arrived 3 days later and there's no mention of the [original PR](https://github.com/huggingface/diffusers/pull/532#issuecomme...), nice. Else the kernels used in that case -upstream flash attention- are not compatible with all nvidia GPU generations, FYI (xformers' cover a wider range and are generally faster or just pull Flash')

New comment by eoerl in "Fast-stable-diffusion colabs, +25% speed increase and memory efficient"

eoerl — Tue, 27 Sep 2022 07:00:48 +0000

did you even peek at the link ? There's a PR on diffusers, and it's mentioned on the front page https://github.com/huggingface/diffusers/pull/532#issuecomme...

New comment by eoerl in "Make Stable Diffusion up to 100% faster with Memory Efficient Attention"

eoerl — Tue, 27 Sep 2022 06:58:57 +0000

That's the explanation behind https://news.ycombinator.com/item?id=32985716, nice work

New comment by eoerl in "Knots 3D – Learn how to tie over 150 useful knots"

eoerl — Sat, 28 May 2022 01:08:24 +0000

I'm utterly fascinated, big knot fan and I didn't know this one, but it's a combination of parts I knew.

It's really strange to me though, and it's probably a culture thing, because for this use case literally everybody I know would use a bowline knot (https://www.animatedknots.com/bowling-knot), possibly with an extra lock to make sure it does not untie if there's no tension. This observation comes from a French sailing background, hence the probable cultural bias.

Could you educate me as of where the trucker's hitch is better? (to me the bowtie feels simpler, no possible slippage and easy to untie after a load)

New comment by eoerl in "Bolt announces layoffs"

eoerl — Fri, 27 May 2022 06:30:23 +0000

they don't need to layoff, just freeze the hires and wait for the churn

New comment by eoerl in "Accelerated PyTorch Training on M1 Mac"

eoerl — Thu, 19 May 2022 05:00:53 +0000

actually it seems that it was because a lot of other well known models are not yet supported, missing ops in the Metal backend

New comment by eoerl in "Accelerated PyTorch Training on M1 Mac"

eoerl — Thu, 19 May 2022 04:59:05 +0000

I'm not sure what you meant with the link, but the parent is right, so adding an explanation here: M1 Ultra has about 400GB/s theoretical bandwidth but Anandtech shows that none of the SoC blocks can actually reach that, pretty far for it. It seems that Apple summed all the bandwidth to all the blocks to get there, which does mean something but not that the GPU has access to this (the GPU memory controllers seem to be the bottleneck).

On the contrary, a 3080 laptop does reach 400GB/s, I'm personally seeing this routinely on AI workloads, so that's part of the explanation for subpar perf here (the other ones being probably matrix math and mixed precision)

New comment by eoerl in "Shaving is an example of how consumer products extract more money"

eoerl — Sat, 07 May 2022 20:26:10 +0000

How much of that have you actually benchmarked, time wise ? I think it’s the point of the article, most of our “benchmarking” is actually marketing. To take an example I’ve been using a cast iron pan for years after having used Teflon all my life, cast iron doesn’t take any more time to use (initial seasoning vs. timely replacement, you can even factor that out)

New comment by eoerl in "Ask HN: What’s a good laptop for software development at around $2k?"

eoerl — Tue, 26 Apr 2022 14:44:19 +0000

I’m using a similar setup (G15) for “AI” dev on Linux, it works very well with the Asus-laptop utils.

To complement other points:

- you can put the nvidia card in “compute” or “hybrid” mode, which removes the need for X restarts. Compute is really nice, the computer runs on IGP (which is vey capable) and all cuda workloads seamlessly wake up the nvidia card, no question asked

- the above means that the pc is nearly silent, maybe helped by the AMD cpu, while being pretty capable with the 8 real zen3 cores

- no issues really on Linux, and the Asus-laptop tools allow you to switch off the leds or cap the battery charge. The wifi card was an issue initially, quickly fixed with a newer kernel

- the screen is 120Hz, and this is really appreciated actually

I would buy a newer version in a pinch

New comment by eoerl in "Apple M1 Ultra"

eoerl — Wed, 09 Mar 2022 08:04:12 +0000

pushing all this to "un-optimization tax" is an easy pass on apple. - nvidia really is a software company, it's the running joke in the industry. when you buy a nvidia gpu, you pay for the drivers & the frameworks (cuda, dlss, optix, ..). Apple does close to nothing there, they support Metal and CoreML and call it a day, you can decently lay some of the blame at their feet

- the workloads in games can vary a lot, vertex/fragment shaders imbalance, parallel compute pipelines, mixed precision (which the M1 gpu does not do), .. So another explanation is that you can get some 3070 parity on a cherry picked game, like a broken clock is right twice a day, but that does not make it generally true. Objective benchmarks have put the M1 gpus way slower than 3070 on average, and software support seems like an easy but false distraction given the Proton tax on Linux (which is not 30/50%)

- the M1 gpus are lacking a ton of hardware, matrix mul, fp16 again, ray tracing, VRR probably (not sure about this last one). These are used by modern games or applications, you may find a benchmark which skip them, but in the grand scheme of things it's something that the M1 gpu will have to emulate more often than not, and this has a cost

Waving all that as "the GPU is about the same speed" is technically wrong, or not really backed by facts at the very least

New comment by eoerl in "Ampere Altra 80-core ARM CPU"

eoerl — Tue, 03 Mar 2020 17:51:31 +0000

Amazon owns nothing, not the ARM IP nor the manufacturing chain (TSMC or Samsung most probably), in that field it's not a big player. It owns what it does with Graviton, that's pretty much it.

Else yield obviously counts, that's what stands in the way of this CPU having more cache or 160 cores, for what it's worth, so it has to count for something obviously. The multiple tiers in every cpu manufacturer line up is also a consequence of yield, so it's very much not a minor element of the equation

New comment by eoerl in "Ampere Altra 80-core ARM CPU"

eoerl — Tue, 03 Mar 2020 17:02:35 +0000

the key issue when compared to Epyc is that this is mono-die, and not much faster (even with metrics straight from Ampere). Mono-die means that the die is huge, the yield is low, it's probably pretty expensive to produce (and the reason why they went for 32MB cache, well below Arm's recommendations, core count is a bigger seller than cache it seems). Unless they get massively better performance (they don't), this has no chance vs a multi-die solution which has a much better yield. Intel is cornered in a similar situation right now. The same applies to Graviton, this stands absolutely no chance in the long run.

Not saying that the future has to be multi-die, but if it is not, then it has to be way faster than the cheaper-to-manufacture competition.