Hacker News: jroesch

New comment by jroesch in "Show HN: Cuq – Formal Verification of Rust GPU Kernels"

jroesch — Thu, 23 Oct 2025 08:35:51 +0000

We also have a formal memory model and the program semantics are simpler so if anything reasoning about it should be easier.

New comment by jroesch in "Nvidia announces next-gen RTX 5090 and RTX 5080 GPUs"

jroesch — Tue, 07 Jan 2025 04:18:06 +0000

There was some solid commentary on the Ps5Pro tech talk stating core rendering is so well optimized much of the gains in the future will come from hardware process technology improvements not from radical architecture changes. It seems clear the future of rendering is likely to be a world where the gains come from things like dlss and less and free lunch savings due to easy optimizations.

New comment by jroesch in "Making AMD GPUs competitive for LLM inference (2023)"

jroesch — Tue, 24 Dec 2024 01:28:09 +0000

Note: this is old work, and much of the team working on TVM, and MLC were from OctoAI and we have all recently joined NVIDIA.

New comment by jroesch in "Launch HN: Deepsilicon (YC S24) – Software and hardware for ternary transformers"

jroesch — Mon, 09 Sep 2024 18:59:47 +0000

Having been working in DL inference for now 7+ years (5 of which at startup) which makes me comparably ancient in the AI world at this point. The performance rat race/treadmill is never ending, and to your point a large (i.e 2x+) performance improvement is not enough of a "painkiller" for customers unless there is something that is impossible for them to achieve without your technology.

The second problem is distribution: it is already hard enough to obtain good enough distribution with software, let alone software + hardware combinations. Even large silicon companies have struggled to get their HW into products across the world. Part of this is due to the actual purchase dynamics and cycle of people who buy chips, many design products and commit to N year production cycles of products built on certain hardware SKUs, meaning you have to both land large deals, and have opportune timing to catch them when they are evening shopping for a new platform. Furthermore the people with existing distribution i.e the Apple, Google, Nvidia, Intel, AMD, Qualcomms of the world already have distribution and their own offerings in this space and will not partner/buy from you.

My framing (which has remained unchanged since 2018) is that for silicon platform to win you have to beat the incumbents (i.e Nvidia) on the 3Ps: Price (really TCO), Performance, and Programmability.

Most hardware accelerators may win on one, but even then it is often theoretical performance because it assumes their existing software can/will work on your chip, which it often doesn't (see AMD and friends).

There are many other threats that come in this form, for example if you have a fixed function accelerator and some part of the model code has to run on CPU the memory traffic/synchronization can completely negate any performance improvements you might offer.

Even many of the existing silicon startups have been struggling with this since them middle of the last decade, the only thing that saved them is the consolidation to Transformers but it is very easy for a new model architecture to come out and require everyone to rework what they have built. This need for flexibility is what has given rise to the design ethos around GPGPU as flexibility in a changing world is a requirement not just a nice to have.

Best of luck, but these things are worth thinking deeply about as when we started in this market we were already aware of many of these things but their importance and gravity in the AI market have only become more important, not less :)

New comment by jroesch in "AI models collapse when trained on recursively generated data"

jroesch — Wed, 24 Jul 2024 18:59:48 +0000

I think this is roughly correct. My 2c is that folks used the initial web data to cold start and bootstrap the first few models, but so much of the performance increase we have seen at smaller sizes is a shift towards more conscientious data creation/purchase/curation/preparation and more refined evaluation datasets. I think the idea of scraping random text except maybe for the initial language understanding pre-training phase will be diminished over time.

This is understood in the academic literature as well, as people months/years ago were writing papers that a smaller amount of high quality data, is worth more than a large amount of low quality data (which tracks with what you can pick up from an ML 101 education/training).

New comment by jroesch in "Show HN: Kimchi Reader – Immersive Korean Learning with a Popup Dictionary"

jroesch — Mon, 30 Oct 2023 05:18:07 +0000

I agree with this after a short while I turned off the romanization in many learning apps as it just messes with/undermines your actual learning.

New comment by jroesch in "Apple Tests ‘Apple GPT,’ Develops Generative AI Tools to Catch OpenAI"

jroesch — Wed, 19 Jul 2023 17:52:58 +0000

To chime in Apple already has a lot of great ML talent they are just far more deliberate and slow to change their products. People forget that FaceID was/is one of the most cutting edge ML features ever developed/deployed when it was released a few years ago.

Siri is sort of a red herring because its built by teams and tech that existed before Apple acquired most of its ML talent and some of its inability to evolve has been due to internal politics not the inability to build tech. iOS 17 is an example of Apple moving towards more deep learning speech/text work. I would bet heavily we will see them catch up with well integrated pieces as they have Money, infra, and already the ability to go wide (i.e all iOS users, again think FaceID).

New comment by jroesch in "Apple Tests ‘Apple GPT,’ Develops Generative AI Tools to Catch OpenAI"

jroesch — Wed, 19 Jul 2023 17:48:55 +0000

Things are already possible on today's hardware, see https://github.com/mlc-ai/mlc-llm which allows many models to be run on M1/M2 Macs, WASM, iOS and more. The main limiting factor will be small enough, high quality enough models that performance is high enough ultimately this is HW limited and they will need to improve the neural engine/map more computation on to it to make the mobile exp. possible.

New comment by jroesch in "Looming demise of the 10x developer – an era of enthusiast programmers is ending"

jroesch — Fri, 14 Jul 2023 06:26:30 +0000

This is also just straight up FUD. ADHD is one of the few psychiatric conditions that has numerous effective medications which work reliably for a large part of the effected population.

Stimulants work for a large number of people diagnosed with ADHD with very little negative effects and are safe modulo a few exceptions for long term use.

Some individuals have negative experiences with Stimulant medications but I know from personal experience and from many friends in the ADHD community stimulants have literally been life saving for them.

They don't just reach for them because they are out to get you but they are effective for many people.

Furthermore many people who choose to forgo medication develop lifestyle and substance use issues which negative effects far out weigh low dose stimulants.

As other commenters said they are just a tool you still have to work on interventions, behavior modification, and so on.

At the end of the day in many ways ADHD is a disability (even if sometimes a super power) and you can't just delete it with a prescription.

Even if you forgo meds there are so many ways to boost your attention and quality of life and lots of research on what is effective, treatment can be much more than just medication.

New comment by jroesch in "Looming demise of the 10x developer – an era of enthusiast programmers is ending"

jroesch — Fri, 14 Jul 2023 06:15:52 +0000

It has nothing to do with the blog post quality or being different the guy has multiple key sentences which map to key ADHD experiences/symptoms. For those of us living with ADHD its just an empathy response as many of us have suffered from experiences which closely map to what he described in that paragraph. We are often just looking to share as many of us have improved our lives substantially after someone suggesting we should get ourselves checked out.

On this part in particular while it can be great to deeply follow your passion with extreme focus; pursuing things regardless of their importance in your overall life and at the cost of other interests, relationships, or responsibilities can be an empty and unfulfilling existence in the end. Furthermore life can be markedly better with the correct interventions and treatment.

People seemingly get offended by even a suggestion because many people have extreme stigma against conditions like ADHD as well as a lot of misinformation from people have very little understanding of the actual traits, diagnostic criteria, treatment and prevalence of it.

New comment by jroesch in "Looming demise of the 10x developer – an era of enthusiast programmers is ending"

jroesch — Fri, 14 Jul 2023 06:01:37 +0000

You just go see a psychiatrist or a psychiatric nurse practitioner and ask them to perform an evaluation. Many GPs can also perform/refer you as well. ADHD understanding and awareness was very low before the 2000s/2010s many people in their late 20s/30s/40s went undiagnosed.

I had even stereotypical symptoms as a child but just wasn't tested.

New comment by jroesch in "File for divorce from LLVM"

jroesch — Fri, 30 Jun 2023 05:35:20 +0000

As the parent comment said, and I mentioned in my reply. JavaScript was just incredibly poorly optimized/not compiled and they applied 20-30 years worth of compiler research to make it significantly faster. You also had an alliance of every hyperscaler working on the tooling for a decade plus with help from all major hardware vendors to bring the best performance out of it. One driver of LLVM was Apple and WebKit which at one point was using LLVM for its JIT compiler so many improvements figured out in that period have also already been applied to LLVM.

LLVM already has decades of research applied to it to make it produce fast code, it will be incredibly challenging to even match its performance across all the targets it supports let alone improve on it in significant ways. It would be better to spend the time building an optimization pipeline for Zig itself and being more thoughtful about what code you send to LLVM versus trying to replace it wholesale.

New comment by jroesch in "File for divorce from LLVM"

jroesch — Fri, 30 Jun 2023 05:30:22 +0000

Long time compiler hacker/engineer and compiler/programming language PhD here all great points. Worth saying out loud that many reasons why this stuff is slow is not due to bad code, its due to the fact that many of the best algorithms are in higher complexity classes and just scale poorly with program/translation unit size. For example when I was working on `rustc` a big challenge was heavy reliance on inlining and inlining has a cost, plus it increases program sizes making everything else take longer as well.

I feel like Go already went through a whole saga of this where the community started with "LLVM and SSA are bad and slow", then a few years go by and they end up building their own SSA IR and spending a bunch of time trying to bring compilation time closer to what it was before as it made everything much slower.

New comment by jroesch in "OpenXLA Is Available Now"

jroesch — Thu, 09 Mar 2023 08:35:38 +0000

This is much broader than ONNX its closer to ONNX Runtime + ONNX but it has some important advantages. StableHLO is the IR already supported by most HW accelerators including Inferentia/Trainium and TPU.

Much of this code is not "new" in the sense that much of the OpenXLA effort has been extracting the existing XLA representations and compiler from the TensorFlow codebase so it can be more modularly used by the ecosystem (including PyTorch).

A better frame is TensorFlow exporting its stable representation that many vendors have already built around, more than a "new" standard.

New comment by jroesch in "OpenXLA Is Available Now"

jroesch — Thu, 09 Mar 2023 08:32:45 +0000

Microsoft and OpenAI have different technologies they have worked on including ONNX + ONNX-RT and OpenAI is focused on Triton which is a kernel compiler being used to speed up models. Given my understanding of heavy PyTorch use it seems more likely they are utilizing that + triton versus XLA.

New comment by jroesch in "Lean – Theorem Prover"

jroesch — Sat, 21 Jan 2023 02:09:57 +0000

Did a PhD half focused on verification and also very biased being an early Lean developer. imo the Lean tooling and everything is much better I personally would never use Coq again especially after writing 200kloc+ of it over a few years.

It has improved a lot but still I think Lean has many things going for it including a large focus on programming and using your Lean programs directly (versus extraction, which is roughly a really shitty compiler from Coq to Ocaml or another extracted language).

New comment by jroesch in "Apache TVM Unity: a vision for the ML software and hardware ecosystem in 2022"

jroesch — Thu, 16 Dec 2021 21:57:57 +0000

There is a lot more happening around https://www.tvmcon.org this week.

For example a few other cool blogs:

* https://octoml.ai/blog/write-python-with-blazing-fast-cuda-l...

* https://octoml.ai/blog/collage-automated-integration-of-vari...

* https://octoml.ai/octoml-bert-model-acceleration-on-apple-m1...

New comment by jroesch in "Does C++ still deserve a bad rap?"

jroesch — Sat, 17 Oct 2020 00:22:33 +0000

TLDR Yes.

I having been writing C++ extensively for research/work/etc the past 5 years. Compared with modern languages it completely deserves the bad rap.

C++ has come a long way, can you write cleaner code than you could 10 years ago? yes. Are there safer abstractions than there were? yes.

The amount of issues I have to deal with in C++ on a day to day basis which are just non-issues in Rust or Go, etc.

Without complaining too much dependency management and tooling are stuck in the stone age.

Every time I get to use Cargo going back to C++ is a nightmare, even Python which has pretty poor tooling around this is a million light years ahead.

Even simple things like having to carefully order headers or deal with obscure cross-platform incompatibility are mostly non-issues in newer languages and the baggage of C and C++ leads to a lot of unnecessary pain that doesn't provide low-level control or performance advantages.

New comment by jroesch in "Ask HN: Will getting a PhD lead to a more interesting life?"

jroesch — Mon, 30 Sep 2019 18:51:02 +0000

Many universities just toss you an MS degree along the way for free in your PhD, but coming in with one has no real effect on your PhD length besides waving some clad requirements. At the end of the day US PhD programs just assume you will do 5 or more years of research, for those looking to be professors this is a boon as many US PhDs skip the postdoc step often required for EU PhDs which are typically 3ish years.

New comment by jroesch in "Ask HN: Will getting a PhD lead to a more interesting life?"

jroesch — Mon, 30 Sep 2019 18:45:21 +0000

It is still pretty much the same, the UW stipend in Seattle is enough to rent an apartment in a nice neighborhood and eat well. Not an extravagant life but no one is going even 20k into debt, let alone 50,100,200k that people do for other professional paths.

I know Stanford PhDs have even higher stipends to offset the costs of Palo Alto.