Hacker News: jms55

New comment by jms55 in "Building the heap: racking 30 petabytes of hard drives for pretraining"

jms55 — Wed, 01 Oct 2025 15:54:30 +0000

If I remember correctly, most drives either:

1. Fail in the first X amount of time

2. Fail towards the end of their rated lifespan

So buying used drives doesn't seem like the worst idea to me. You've already filtered out the drivers that would fail early.

Disclaimer: I have no idea what I'm talking about

New comment by jms55 in "Bevy 0.17: ECS-driven game engine built in Rust"

jms55 — Wed, 01 Oct 2025 01:58:25 +0000

Hi, author of Solari here!

It was pretty straightforward honestly. bevy_solari is written as a standalone crate (library), without any special private APIs or permissions or anything https://github.com/bevyengine/bevy/tree/main/crates/bevy_sol....

The crate itself is further split between the realtime lighting plugin, base "raytracing scene" plugin that could be used for your own custom raytracing-based rendering, and the reference pathtracer I use for comparing the realtime lighting against.

There were some small changes to the rest of Bevy, e.g. adding a way to set extra buffer usages for the buffers we store vertex/index data in from another plugin https://github.com/bevyengine/bevy/pull/19546, or copying some more previous frame camera data to the GPU https://github.com/bevyengine/bevy/pull/19605, but nothing really major. It was added pretty independently.

New comment by jms55 in "Ask HN: What are you working on? (September 2025)"

jms55 — Mon, 29 Sep 2025 23:24:57 +0000

I've been working on raytraced lighting in the Bevy game engine, using wgpu's new support for hardware raytracing in WGSL. The initial prototype is launching with the release of Bevy 0.17 tomorrow, but there's still a ton left to improve. Lots of experimenting with shaders and different optimizations.

I wrote a blog post about my initial findings recently: https://jms55.github.io/posts/2025-09-20-solari-bevy-0-17

Realtime Raytracing in Bevy 0.17 (Solari)

jms55 — Sat, 20 Sep 2025 21:29:50 +0000

Article URL: https://jms55.github.io/posts/2025-09-20-solari-bevy-0-17/

Comments URL: https://news.ycombinator.com/item?id=45317726

Points: 3

# Comments: 0

New comment by jms55 in "Shipping textures as PNGs is suboptimal"

jms55 — Sun, 07 Sep 2025 06:17:43 +0000

Nothing, I haven't found a good option yet.

We do have the existing bindings to a 2-year old version of basis universal, but I've been looking to replace it.

New comment by jms55 in "Shipping textures as PNGs is suboptimal"

jms55 — Sat, 06 Sep 2025 23:29:56 +0000

I've been evaluating texture compression options for including in Bevy https://bevy.org, and there's just, not really any good options?

Requirements:

* Generate mipmaps

* Convert to BC and ASTC

* Convert to ktx2 with zstd super-compression

* Handle color, normal maps, alpha masks, HDR textures, etc

* Open source

* (Nice to have) runs on the GPU to be fast

I unfortunately haven't found any option that cover all of these points. Some tools only write DDS, or don't handle ASTC, or want to use basis universal, or don't generate mipmaps, etc.

New comment by jms55 in "ARM adds neural accelerators to GPUs"

jms55 — Sat, 16 Aug 2025 21:43:09 +0000

For people not familiar with how ML is being used by games, checkout this great and very recent SIGGRAPH 2025 course https://dl.acm.org/doi/suppl/10.1145/3721241.3733999. Slides are in the supplementary material section, and code is at https://github.com/shader-slang/neural-shading-s25.

Neural nets are great for replacing manually-written heuristics or complex function approximations, and 3d rendering is _full_ of these heuristics. Texture compression, light sampling, image denoising/upscaling/antialiasing, etc.

Actual "generative" API in graphics is pretty rare, at least currently. That's more of an artist thing. But there's a lot of really great use cases for small neural networks (think 3-layer MLPs, absolutely nowhere near LLM-levels of size) to approximate expensive or manually-tuned heuristics in existing rendering pipeline, and it just so happens that the GPUs used for rendering also now come with dedicated NPU accelerator things.

New comment by jms55 in "Shipping WebGPU on Windows in Firefox 141"

jms55 — Wed, 16 Jul 2025 19:09:57 +0000

> Metal also definitely has a healthy balance between convenience and low overhead - and more recent Metal versions are an excellent example that a high performance modern 3D API doesn't have to be hard to use, nor require thousands of lines of boilerplate to get a triangle on screen.

Metal 4 has moved a lot in the other direction, and now copies a lot of concepts from Vulkan.

https://developer.apple.com/documentation/metal/understandin...

https://developer.apple.com/documentation/metal/resource-syn...

New comment by jms55 in "Improving performance of rav1d video decoder"

jms55 — Thu, 22 May 2025 18:05:08 +0000

https://pharr.org/matt/blog/2018/07/16/moana-island-pbrt-all

New comment by jms55 in "Bevy 0.16"

jms55 — Thu, 24 Apr 2025 21:52:25 +0000

I wrote most of the virtual geometry / meshlet feature, so thanks for the kind words!

It's not quite at the level of Nanite, but I'm slowly getting there. Main limiter is that I do this in my spare time after work, since I don't have any dedicated funding for my work on Bevy. So, expect progress, but it's going to take a while :)

New comment by jms55 in "Zod v4 Beta"

jms55 — Sun, 13 Apr 2025 01:02:39 +0000

I've used LiveView a little in the past, and besides the lack of static typing (at the time, I think Elixir has gotten some type annotations since), I really liked it and found it easy to work with.

Ecto though, I could never figure out how to use. If I could just make SQL queries in an ORM-style I would understand it, but the repositories and auto generated relations and such I couldn't figure out. Do you have any good resources for learning ecto? I didn't find the official docs helpful.

New comment by jms55 in "Shadertoys Ported to Rust GPU"

jms55 — Sun, 13 Apr 2025 00:57:09 +0000

If you want to make nice looking materials and effects, you need a combination of good lighting (comes from the rendering engine, not the material), and artistic capabilities/talent. Art is a lot harder to teach than programming I feel, or at least I don't know how to teach it.

Programming the shaders themselves are pretty simple imo, they're just pure functions that return color data or triangle positions. The syntax might be a little different than you're used to depending on the shader language, but it should be easy enough to pick up in a day.

If you want to write compute shaders for computation, then it gets a lot more tricky and you need to spend some time learning about memory accesses, the underlying hardware, and profiling.

New comment by jms55 in "Nvidia adds native Python support to CUDA"

jms55 — Sat, 05 Apr 2025 07:02:31 +0000

> An exception that doesn't change the rule, where are the Vulkan extensions for DirectX neural shaders, and RTX kit?

DirectX "neural shaders" is literately the VK_NV_cooperative_vector extension I mentioned previously, which is actually easier to use in Vulkan at the moment since you don't need a custom prelease version of DXC. Same for all the RTX kit stuff, e.g. https://github.com/NVIDIA-RTX/RTXGI has both VK and DX12 support.

New comment by jms55 in "Nvidia adds native Python support to CUDA"

jms55 — Sat, 05 Apr 2025 04:52:20 +0000

> NVidia and AMD keep designing their cards with Microsoft for DirectX first, and Vulkan, eventually.

Not really. For instance NVIDIA released day 1 Vulkan extensions for their new raytracing and neural net tech (VK_NV_cluster_acceleration_structure, VK_NV_partitioned_tlas, VK_NV_cooperative_vector), as well as equivalent NVAPI extensions for DirectX12. Equal support, although DirectX12 is technically worse as you need to use NVAPI and rely on a prerelease version of DXC, as unlike Vulkan and SPIR-V, DirectX12 has no mechanism for vendor-specific extensions (for good or bad).

Meanwhile the APIs, both at a surface level and how the driver implements them under the hood, are basically identical. So identical in fact, that NVIDIA has the nvrhi project which provides a thin wrapper over Vulkan/DirectX12 so that you can run on multiple platforms via one API.

New comment by jms55 in "Nvidia adds native Python support to CUDA"

jms55 — Sat, 05 Apr 2025 04:41:39 +0000

> And not only are we getting tile/block-level primitives and TileIR

As someone working on graphics programming, it always frustrates me to see so much investment in GPU APIs _for AI_, but almost nothing for GPU APIs for rendering.

Block level primitives would be great for graphics! PyTorch-like JIT kernels programmed from the CPU would be great for graphics! ...But there's no money to be made, so no one works on it.

And for some reason, GPU APIs for AI are treated like an entirely separate thing, rather than having one API used for AI and rendering.

New comment by jms55 in "Nvidia adds native Python support to CUDA"

jms55 — Sat, 05 Apr 2025 04:38:17 +0000

Never used CUDA, but I'm guessing these map to the same underlying stuff as timestamp queries in graphics APIs, yes?

New comment by jms55 in "I want a good parallel computer"

jms55 — Sat, 22 Mar 2025 21:22:15 +0000

> But the biggest problem I'm having is management of buffer space for intermediate objects

My advice for right now (barring new APIs), if you can get away with it, is to pre-allocate a large scratch buffer for as big of a workload as you will have over the program's life, and then have shaders virtually sub-allocate space within that buffer.

New comment by jms55 in "I want a good parallel computer"

jms55 — Sat, 22 Mar 2025 21:18:35 +0000

Agreed, there are two different problems being described here.

1. Divergence of threads within a workgroup/SM/whatever

2. Dynamically scheduling new workloads (i.e. dispatches, draws, etc) in response to the output of a previous workload

Raytracing is problem #1 (and has it's own solutions, like shader execution reodering), while Raph is talking about problem #2.

New comment by jms55 in "I want a good parallel computer"

jms55 — Sat, 22 Mar 2025 21:10:18 +0000

> Are you arguing for a better software abstraction, a different hardware abstraction or both?

I don't speak for Raph, but imo it seems like he was arguing for both, and I agree with him.

On the hardware side, GPUs have struggled with dynamic workloads at the API level (not e.g. thread-level dynamism, that's a separate topic) for around a decade. Indirect commands gave you some of that so at least the size of your data/workload can be variable if not the workloads themselves, then mesh shaders gave you a little more access to geometry processing, and finally workgraphs and device generated commands lets you have an actually dynamically defined workload (e.g. completely skipping dispatches for shading materials that weren't used on screen this frame). However it's still very early days, and the performance issues and lack of easy portability are problematic. See https://interplayoflight.wordpress.com/2024/09/09/an-introdu... for instance.

On the software side shading languages have been garbage for far longer than hardware has been a problem. It's only in the last year or two that a proper language server for writing shaders has even existed (Slang's LSP). Much less the innumerable driver compiler bugs, lack of well defined semantics and memory model until the last few years, or the fact that we're still manually dividing work into the correct cache-aware chunks.

New comment by jms55 in "I want a good parallel computer"

jms55 — Sat, 22 Mar 2025 21:01:38 +0000

You can. There are API extensions for persistently mapping memory, and it's up to you to ensure that you never write to a buffer at the same time the GPU is reading from it.

At least for Vulkan/DirectX12. Metal is often weird, I don't know what's available there.