Hacker News: tmostak

New comment by tmostak in "Waymo robotaxi hits a child near an elementary school in Santa Monica"

tmostak — Fri, 30 Jan 2026 00:22:23 +0000

Evidence (preferably with recent Teslas/HW4)?

New comment by tmostak in "Waymo robotaxi hits a child near an elementary school in Santa Monica"

tmostak — Fri, 30 Jan 2026 00:21:11 +0000

Evidence of this? I own a Tesla (HW4, latest FSD) as well as have taken many Waymo rides, and have found both to react well to unpredictable situations (i.e. a car unexpectedly turning in front of you), far more quickly than I would expect most human drivers to react.

This certainly may have been true of older Teslas with HW3 and older FSD builds (I had one, and yes you couldn't trust it).

New comment by tmostak in "Waymo robotaxi hits a child near an elementary school in Santa Monica"

tmostak — Fri, 30 Jan 2026 00:15:34 +0000

Do you have data to back this claim up, specifically with HW4 (most recent hardware) and FSD software releases?

New comment by tmostak in "Prefix sum: 20 GB/s (2.6x baseline)"

tmostak — Tue, 14 Oct 2025 20:05:43 +0000

Even without NVLink C2C, on a GPU with 16XPCIe 5.0 lanes to host, you have 128GB/sec in theory and 100+ GB/sec in practice bidirectional bandwidth (half that in each direction), so still come out ahead with pipelining.

Of course prefix sums are often used within a series of other operators, so if these are already computed on GPU, you come out further ahead still.

New comment by tmostak in "Modern Minimal Perfect Hashing: A Survey"

tmostak — Wed, 11 Jun 2025 01:27:13 +0000

We've made extensive use of perfect hashing in HeavyDB (formerly MapD/OmniSciDB), and it has definitely been a core part of achieving strong group by and join performance.

You can use perfect hashes not only the usual suspects of contiguous integer and dictionary-encoded string ranges, but also use cases like binned numeric and date ranges (epoch seconds binned per year can use a perfect hash range of one bin per year for a very wide range of timestamps), and can even handle arbitrary expressions if you propagate the ranges correctly.

Obviously you need a good "baseline" hash path to fall back to you, but it's surprising how many real-world use cases you can profitably cover with perfect hashing.

Fast geospatial aggregation and visualization with Uber H3

tmostak — Wed, 30 Apr 2025 15:16:35 +0000

Article URL: https://www.heavy.ai/blog/put-a-hex-on-it-introducing-new-uber-h3-capabilities

Comments URL: https://news.ycombinator.com/item?id=43846464

Points: 1

# Comments: 0

Benchmarking geospatial join performance on GPU vs. CPU

tmostak — Wed, 16 Apr 2025 14:54:03 +0000

Article URL: https://www.heavy.ai/blog/connect-the-dots-in-real-time-benchmarking-geospatial-join-performance-in-gpu-accelerated-heavydb-against-cpu-databases

Comments URL: https://news.ycombinator.com/item?id=43706353

Points: 1

# Comments: 0

Benchmarking a GPU database against CPU data warehouses

tmostak — Fri, 04 Apr 2025 20:59:39 +0000

Article URL: https://www.heavy.ai/blog/speed-at-scale-benchmarking-gpu-accelerated-heavydb

Comments URL: https://news.ycombinator.com/item?id=43587546

Points: 1

# Comments: 0

Interactively Explore 20 Billion Records of Ship AIS Data Using a Single GPU

tmostak — Tue, 18 Feb 2025 17:07:07 +0000

Article URL: https://www.heavy.ai/blog/interactively-explore-20-billion-records-of-ship-ais-data-on-a-single-gpu-heavy-ai-on-the-new-nvidia-grace-hopper-superchip

Comments URL: https://news.ycombinator.com/item?id=43092168

Points: 1

# Comments: 0

New comment by tmostak in "Show HN: TabPFN v2 – A SOTA foundation model for small tabular data"

tmostak — Fri, 10 Jan 2025 05:27:32 +0000

This looks amazing!

Just looking through the code a bit, it seems that the model both supports a (custom) attention mechanism between features and between rows (code uses the term items)? If so, does the attention between rows help improve accuracy significantly?

Generally, for standard regression and classification use cases, rows (observations) are seen to be independent, but I'm guessing cross-row attention might help the model see the gestalt of the data in some way that improves accuracy even when the independence assumption holds?

New comment by tmostak in "All You Need Is 4x 4090 GPUs to Train Your Own Model"

tmostak — Sun, 29 Dec 2024 03:19:09 +0000

You should be able to train/full-fine-tune (i.e. full weight updates, not LoRA) a much larger model with 96GB of VRAM. I generally have been able to do a full fine-tune (which is equivalent to training a model from scratch) of 34B parameter models at full bf16 using 8XA100 servers (640GB of VRAM) if I enable gradient checkpointing, meaning a 96GB VRAM box should be able to handle models of up to 5B parameters. Of course if you use LoRA, you should be able to go much larger than this, depending on your rank.

Benchmarking a GPU-Accelerated Database Against CPU Data Warehouses

tmostak — Mon, 16 Dec 2024 17:20:07 +0000

Article URL: https://www.heavy.ai/blog/speed-at-scale-benchmarking-gpu-accelerated-heavydb

Comments URL: https://news.ycombinator.com/item?id=42433077

Points: 1

# Comments: 0

Benchmarking GPU-Accelerated HeavyDB Against CPU Data Warehouses

tmostak — Tue, 29 Oct 2024 16:19:44 +0000

Article URL: https://www.heavy.ai/blog/speed-at-scale-benchmarking-gpu-accelerated-heavydb

Comments URL: https://news.ycombinator.com/item?id=41985908

Points: 4

# Comments: 0

Embed AI fuzzy logic into your SQL

tmostak — Thu, 10 Oct 2024 15:26:20 +0000

Article URL: https://www.heavy.ai/blog/making-sql-smarter-how-to-embed-ai-into-your-queries-with-heavyiq

Comments URL: https://news.ycombinator.com/item?id=41799844

Points: 2

# Comments: 0

New comment by tmostak in "How Meta trains large language models at scale"

tmostak — Thu, 13 Jun 2024 03:18:51 +0000

This assumes that you can linearly scale up the number of TPUs to get equal performance to Nvidia cards for less cost. Like most things distributed, this is unlikely to be the case.

Exploring 36 years of FAA flight data with AI and a GPU Database

tmostak — Sat, 01 Jun 2024 12:19:41 +0000

Article URL: https://tech.marksblogg.com/heavyiq-faa-ai-llm-gpu-database.html

Comments URL: https://news.ycombinator.com/item?id=40545127

Points: 1

# Comments: 1

Exploring 36 years of FAA flight data with AI and a GPU Database

tmostak — Thu, 30 May 2024 14:41:40 +0000

Article URL: https://tech.marksblogg.com/heavyiq-faa-ai-llm-gpu-database.html

Comments URL: https://news.ycombinator.com/item?id=40524350

Points: 9

# Comments: 1

New comment by tmostak in "GPT-4.5 or GPT-5 being tested on LMSYS?"

tmostak — Tue, 30 Apr 2024 20:52:58 +0000

Are you measuring tokens/sec or words per second?

The difference matters as generally in my experience, Llama 3, by virtue of its giant vocabulary, generally tokenizes text with 20-25% less tokens than something like Mistral. So even if its 18% slower in terms of tokens/second, it may, depending on the text content, actually output a given body of text faster.

New comment by tmostak in "Ollama v0.1.33 with Llama 3, Phi 3, and Qwen 110B"

tmostak — Mon, 29 Apr 2024 03:34:35 +0000

But it's likely to be much slower than what you'd get with a backend like llama.cpp on CPU (particularly if you're running on a Mac, but I think on Linux as well), as well as not supporting features like CPU offloading.

New comment by tmostak in "Show HN: Use natural language to query and visualize 400M tweets"

tmostak — Thu, 22 Feb 2024 17:13:37 +0000

Thank you, it's been a major team effort!