Hacker News: LegNeato

New comment by LegNeato in "Nanopass Framework: Clean Compiler Creation Language"

LegNeato — Mon, 20 Apr 2026 03:16:33 +0000

There is no reason why compiler passes cannot be in an egraph, they are more general than optimizations. When you think about it, traditional compiler concerns like instructions selection are sort of a optimization problem if you squint.

New comment by LegNeato in "Nanopass Framework: Clean Compiler Creation Language"

LegNeato — Sun, 19 Apr 2026 17:57:30 +0000

Why do passes anymore when we have invented egraphs?

New comment by LegNeato in "Rust Threads on the GPU"

LegNeato — Wed, 15 Apr 2026 02:13:27 +0000

We are the maintainers of https://github.com/rust-gpu/rust-gpu and https://github.com/Rust-GPU/Rust-CUDA FWIW. We haven't upstreamed the VectorWare work yet as it is still being cleaned up and iterated on.

New comment by LegNeato in "Rust Threads on the GPU"

LegNeato — Wed, 15 Apr 2026 02:11:17 +0000

Agreed, and thank you.

New comment by LegNeato in "Rust Threads on the GPU"

LegNeato — Tue, 14 Apr 2026 05:09:26 +0000

It is not, we just haven't yet upstreamed everything.

New comment by LegNeato in "Rust Threads on the GPU"

LegNeato — Tue, 14 Apr 2026 05:04:37 +0000

It depends. At VecorWare are a bit of an extreme case in that we are inverting the relationship and making the GPU the main loop that calls out to the CPU sparingly. So in that model, yes. If your code is run in a more traditional model (CPU driving and using GPU as a coprocessor), probably not. Going across the bus dominates most workloads. That being said, the traditional wisdom is becoming less relevant as integrated memory is popping up everywhere and tech like GPUDirect exists with the right datacenter hardware.

These are the details we intend to insulate people from so they can just write code and have it run fast. There is a reason why abstractions were invented on the CPU and we think we are at that point for the GPU.

(for the datacenter folks I know hardware topology has a HUGE impact that software cannot overcome on its own in many situations)

New comment by LegNeato in "Rust Threads on the GPU"

LegNeato — Tue, 14 Apr 2026 03:44:49 +0000

Founder here.

1. Programming GPUs is a problem. The ratio of CPUs to CPU programmers and GPUs to GPU programmers is massively out of whack. Not because GPU programming is less valuable or lucrative, because GPUs are weird and the tools are weird.

2. We are more interested in leveraging existing libraries than running existing binaries wholesale (mostly within a warp). But, running GPU-unaware code leaves a lot of space for the compiler to move stuff around and optimize things.

3. The compiler changes are not our product, the GPU apps we are building with them are. So it is in our interest to make the apps very fast.

Anyway, skepticism is understandable and we are well aware code wins arguments.

New comment by LegNeato in "The economics of software teams: Why most engineering orgs are flying blind"

LegNeato — Mon, 13 Apr 2026 07:51:31 +0000

No, it means not being able to see what is going on. Which is literally what the word blind means. You can be blinded by many things (blindfold, clouds/fog, bright lights, darkness, accidents, genetics, etc), permanently and temporarily. Non-humans can be blind and blinded. YOU are making it about a specific situation and projecting value judgements on it.

The author specifically says FLYING blind. Not "stumbling around like a blind person" or some such. If you are offended, that is on you. It's your right to be offended of course, but don't expect people to join in your delusion.

New comment by LegNeato in "Taking on CUDA with ROCm: 'One Step After Another'"

LegNeato — Mon, 13 Apr 2026 02:49:24 +0000

One of the rust-gpu maintainers here. Haven't officially heard from anyone at AMD but we've had chats with many others. Happy to talk with whomever! I would imagine AMD is focusing on ROCm over Vulkan for compute right now as their pure datacenter play, which makes sense.

We've started a company around Rust on the GPU btw (https://www.vectorware.com/), both CUDA and Vulkan (and ROCm eventually I guess?).

Note that most platform developers in the GPU space are C++ folks (lots of LLVM!) and there isn't as much demand from customers for Rust on the GPU vs something like Python or Typescript. So Rust naturally gets less attention and is lower on the list...for now.

Rust Threads on the GPU

LegNeato — Thu, 09 Apr 2026 03:30:10 +0000

Article URL: https://www.vectorware.com/blog/threads-on-gpu/

Comments URL: https://news.ycombinator.com/item?id=47698975

Points: 2

# Comments: 0

New comment by LegNeato in "Async/Await on the GPU"

LegNeato — Tue, 17 Feb 2026 21:29:05 +0000

Thank you! We're small so have to focus. If anyone from AMD wants to reach out, happy to chat.

New comment by LegNeato in "Async/Await on the GPU"

LegNeato — Tue, 17 Feb 2026 20:17:08 +0000

Doing things at compile time / AOT is almost always better for perf. We believe async/await and futures enables more complex programs and doing things you couldn't easily do on the GPU before. Less about performance and more about capability (though we believe async/await perf will be better in some cases, time will tell).

New comment by LegNeato in "Async/Await on the GPU"

LegNeato — Tue, 17 Feb 2026 18:10:55 +0000

Currently NVIDIA-only, we're cooking up some Vulkan stuff in rust-gpu though.

New comment by LegNeato in "Async/Await on the GPU"

LegNeato — Tue, 17 Feb 2026 18:10:27 +0000

We aren't focused on performance yet (it is often workload and executor dependent, and as the post says we currently do some inefficient polling) but Rust futures compile down to state machines so they are a zero-cost abstraction.

The anticipated benefits are similar to the benefits of async/await on CPU: better ergonomics for the developer writing concurrent code, better utilization of shared/limited resources, fewer concurrency bugs.

New comment by LegNeato in "Async/Await on the GPU"

LegNeato — Tue, 17 Feb 2026 17:57:48 +0000

Yes, that's the idea.

GPU-wide memory is not quite as scarce on datacenter cards or systems with unified memory. One could also have local executors with local futures that are `!Send` and place in a faster address space.

New comment by LegNeato in "Rust’s Standard Library on the GPU"

LegNeato — Wed, 28 Jan 2026 03:46:40 +0000

We use the cuda device allocator for allocations on the GPU via Rust's default allocator.

New comment by LegNeato in "Rust’s Standard Library on the GPU"

LegNeato — Wed, 28 Jan 2026 03:42:30 +0000

Flip on the pedantic switch. We have std::fs, std::time, some of std::io, and std::net(!). While the `libc` calls go to the host, all the `std` code in-between runs on the GPU.

New comment by LegNeato in "Rust’s Standard Library on the GPU"

LegNeato — Wed, 28 Jan 2026 03:41:01 +0000

Author here! Flip on the pedantic switch, we agree ;-)

New comment by LegNeato in "VectorWare – from creators of `rust-GPU` and `rust-CUDA`"

LegNeato — Fri, 24 Oct 2025 11:43:01 +0000

1. The GPU owns the control loop And the only sparingly kicks to the CPU when it can't do something.

2. Yes

3. We're still investigating the limitations. A lot of them are hardware dependent, obviously data center cards have higher limits more capability than desktop cards.

Thanks! It is super fun trailblazing and realizing more of the pieces are there than everybody expects.

New comment by LegNeato in "VectorWare – from creators of `rust-GPU` and `rust-CUDA`"

LegNeato — Thu, 23 Oct 2025 21:59:25 +0000

You might be interested in a previous blog post where we showed one codebase running on many types of GPUs: https://rust-gpu.github.io/blog/2025/07/25/rust-on-every-gpu...