Hacker News: joshhart

New comment by joshhart in "Claude Opus 4.8"

joshhart — Fri, 29 May 2026 04:19:57 +0000

Fireworks will serve them for $1.74 / $0.14 / $3.48. That's input / cached input / output. https://fireworks.ai/models/deepseek-ai/deepseek-v4-pro . Call it about a third the price of Sonnet.

Not nearly as cheap as the Chinese infra but still pretty cheap.

New comment by joshhart in "Layoffs at Block"

joshhart — Thu, 26 Feb 2026 23:47:38 +0000

If you have good ideas that have a nice return on investment and leverage existing skills, sure. If you don’t have good opportunity laying around, best for the business to switch to maintenance mode, which means cutting staff. Or maybe cut staff, then use equity to buy growth via acquisition. It really depends on the business. Block’s growth has slowed so perhaps this would have happened anyway and AI is just what’s getting the blame.

New comment by joshhart in "xAI joins SpaceX"

joshhart — Mon, 02 Feb 2026 21:59:13 +0000

I thought this wasn't viable due to cooling requirements - how do you cool massive amounts of compute when the only option is to radiate it into space - nothing to convect it with?

Also, the incredible amount of grift here with the left hand paying the right is scarcely believable. Same story as Tesla buying Solarcity. Board of directors should be ashamed IMO.

New comment by joshhart in "Vitamin D and Omega-3 have a larger effect on depression than antidepressants"

joshhart — Thu, 29 Jan 2026 14:49:17 +0000

Vitamin D toxicity is absolutely real, causes hypercalcemia, and can occur even at the 4,000 IU dose. I would really recommend you be getting regular bloodwork done if you go beyond that. Here’s a fun podcast on a case study. https://www.barbellmedicine.com/podcast/episodes/episode-381...

New comment by joshhart in "I let ChatGPT analyze a decade of my Apple Watch data, then I called my doctor"

joshhart — Tue, 27 Jan 2026 16:45:47 +0000

Huh. The standard in your case is to measure waist circumference if BMI is high. Did no doctor do that? As long as you are below 40” or 37” if Asian you are considered good to go.

New comment by joshhart in "Nano Banana Pro"

joshhart — Fri, 21 Nov 2025 01:01:01 +0000

This is super awesome, but how in the world did they come up with a name "Nano Banana Pro"? It sounds like an April Fools joke.

New comment by joshhart in "A small number of samples can poison LLMs of any size"

joshhart — Thu, 09 Oct 2025 17:00:33 +0000

I believe it's intended to convince the audience they are experts, that this type of thing is dangerous to a business, and they are the ones doing the most to prevent it. There is no explicit statement to this effect, but I get the sense they are saying that other vendors, and especially open models that haven't done the work to curate the data as much, are vulnerable to attacks that might hurt your business.

Also a recruiting and branding effort.

All of this is educated guesses, but that's my feeling. I do think the post could have been clearer about describing the practical dangers of poisoning. Is it to spew misinformation? Is it to cause a corporate LLM powered application to leak data it shouldn't? Not really sure here.

New comment by joshhart in "Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?"

joshhart — Sat, 09 Aug 2025 21:09:12 +0000

So the inference speed at low to medium usage is memory bandwidth bound, not compute bound. By “forecasting” into the future you do not increase the memory bandwidth pressure much but you use more compute. The compute is checking each potential token in parallel for several tokens forward. That compute is essentially free though because it’s not the limiting resource. Hope this makes sense, tried to keep it simple.

New comment by joshhart in "Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?"

joshhart — Sat, 09 Aug 2025 05:15:08 +0000

A single node with GPUs has a lot of FLOPs and very high memory bandwidth. When only processing a few requests at a time, the GPUs are mostly waiting on the model weights to stream from the GPU ram to the processing units. When batching requests together, they can stream a group of weights and score many requests in parallel with that group of weights. That allows them to have great efficiency.

Some of the other main tricks - compress the model to 8 bit floating point formats or even lower. This reduces the amount of data that has to stream to the compute unit, also newer GPUs can do math in 8-bit or 4-bit floating point. Mixture of expert models are another trick where for a given token, a router in the model decides which subset of the parameters are used so not all weights have to be streamed. Another one is speculative decoding, which uses a smaller model to generate many possible tokens in the future and, in parallel, checks whether some of those matched what the full model would have produced.

Add all of these up and you get efficiency! Source - was director of the inference team at Databricks

New comment by joshhart in "Apple M3 Ultra"

joshhart — Wed, 05 Mar 2025 20:17:02 +0000

This is pretty exciting. Now an organization could produce an open weights mixture of experts model that has 8-15b active parameters but could still be 500b+ parameters and it could be run locally with INT4 quantization with very fast performance. DeepSeek R1 is a similar model but over 30b active parameters which makes it a little slow.

I do not have a good sense of how well quality scales with narrow MoEs but even if we get something like Llama 3.3 70b in quality at only 8b active parameters people could do a ton locally.

New comment by joshhart in "Llama-3.3-70B-Instruct"

joshhart — Fri, 06 Dec 2024 18:28:07 +0000

Hi,

Yes you can. The community creates quantized variants of these that can run on consumer GPUs. A 4-bit quantization of LLAMA 70b works pretty well on Macbook pros, the neural engine with unified CPU memory is quite solid for these. GPUs is a bit tougher because consumer GPU RAM is still kinda small.

You can also fine-tune them. There are lot of frameworks like unsloth that make this easier. https://github.com/unslothai/unsloth . Fine-tuning can be pretty tricky to get right, you need to be aware of things like learning rates, but there are good resources on the internet where a lot of hobbyists have gotten things working. You do not need a PhD in ML to accomplish this. You will, however, need data that you can represent textually.

Source: Director of Engineering for model serving at Databricks.

New comment by joshhart in "DeepSeek v2.5 – open-source LLM comparable to GPT-4, but 95% less expensive"

joshhart — Wed, 30 Oct 2024 20:28:07 +0000

The benchmarks compare it favorably to GPT-4-turbo but not GPT-4o. The latest versions of GPT-4o are much higher in quality than GPT-4-turbo. The HN title here does not reflect what the article is saying.

That said the conclusion that it's a good model for cheap is true. I just would be hesitant to say it's a great model.

New comment by joshhart in "Ask HN: How do you add guard rails in LLM response without breaking streaming?"

joshhart — Wed, 16 Oct 2024 23:49:10 +0000

Hi, I run the model serving team at Databricks. Usually you run regex filters, LLAMA Guard, etc on chunks at a time so you are still streaming but it's in batches of tokens rather than single tokens at a time. Hope that helps!

You could of course use us and get that out of the box if you have access to Databricks.

New comment by joshhart in "Leaving LinkedIn"

joshhart — Wed, 06 Mar 2024 07:15:19 +0000

I spent 12 years at LinkedIn. Sadly, it's not even close to the engineering org it used to be. The era where Kevin Scott led engineering was a really good one in comparison.

New comment by joshhart in "Show HN: Natural-SQL-7B, a strong text-to-SQL model"

joshhart — Mon, 05 Feb 2024 20:05:47 +0000

At Databricks we have an LLM that is fine-tuned to do the problem you raise -

https://www.databricks.com/blog/announcing-public-preview-ai...

Many customers like it a lot. Although perhaps in your case if there are many pricing details it may not be quite accurate.

New comment by joshhart in "Alphabet just banked $3.0B by stretching the life of its servers"

joshhart — Wed, 31 Jan 2024 05:19:53 +0000

Makes sense, CPUs and memory sizes aren’t growing that fast anymore. But I’m sure they are spending a ton on TPUs/GPUs, the article is clear on very high capex

New comment by joshhart in "Visualizing expert firing frequencies in Mixtral MoE"

joshhart — Fri, 22 Dec 2023 15:42:49 +0000

If you are making many requests in batch this works ok because you can shuffle the next layer in while the current one is processing a set of matrix multiplies. This takes it from being a memory bound problem to a flops bound problem. This really only works if you care about throughput and not latency.

New comment by joshhart in "LinkedIn shelved plan to migrate to Microsoft Azure cloud"

joshhart — Sat, 16 Dec 2023 17:08:18 +0000

LinkedIn was already very FCF positive. They tightly managed margins to get to net income positive (account for dilution and so on) but it took maybe 2 years after the acquisition.

New comment by joshhart in "LinkedIn shelved plan to migrate to Microsoft Azure cloud"

joshhart — Fri, 15 Dec 2023 18:59:46 +0000

This was cancelled over a year ago - which the articles notes and is old news. It was clear the effort would have needed a very significant push that would have required a large halt in product development and management wasn't willing to stomach it due to high growth in 2020/2021. Which made sense. But LinkedIn revenue growth has heavily slowed with the pullback in tech hiring and they had the space to do it and consider it optimization time.

Also as part of Blueshift the plan was to do batch processing first but LinkedIn had a culture belief in colocation of batch compute & storage, which is against the disaggregated storage paradigm we see now. IMO this led to some dragging of feet.

Source: Worked at LinkedIn 12 years, am a director at Databricks now.

New comment by joshhart in "LinkedIn shelved plan to migrate to Microsoft Azure cloud"

joshhart — Fri, 15 Dec 2023 18:56:17 +0000

I left LinkedIn 1.5 years ago. I was there 12 years. I saw the revenue & profitability growth that occurred post acquisition. I am very very confident LinkedIn would be worth north of $100B on public markets today and Microsoft made the acquisition for $26B. You might argue that in the subsequent 6 years post acquisition that wasn't enough growth and they should have bought back shares instead but it was completely a debt financed acquisition and very high ROI for Microsoft.