Hacker News: ahzhou

New comment by ahzhou in "Building a TUI is easy now"

ahzhou — Sat, 14 Feb 2026 15:51:26 +0000

Fast as in time-to-market

New comment by ahzhou in "A startup doesn't need to be a unicorn"

ahzhou — Tue, 08 Apr 2025 16:52:12 +0000

VC vs bootstrap is usually based on company TAM. There are certainly high growth bootstrapped businesses.

New comment by ahzhou in "A startup doesn't need to be a unicorn"

ahzhou — Tue, 08 Apr 2025 16:50:57 +0000

Not common in Silicon Valley, but much more common in the rest of the country. There’s an archetype for bootstrapped tech businesses: - highly vertical specific - couple hundred million TAM - founder started the business in their 30s and is now in their 40s

New comment by ahzhou in "DeepSeek's multi-head latent attention and other KV cache tricks"

ahzhou — Wed, 29 Jan 2025 03:53:46 +0000

It’s a tensor stored in GPU memory to improve inference throughput. Check out the PagedAttention (which introduces vLLM) paper for how most systems implement it nowadays.

New comment by ahzhou in "Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI"

ahzhou — Tue, 28 Jan 2025 19:35:36 +0000

They slightly restructure their MoE [1], but I think the main difference is that other big models (e.g Llama 504B) are dense and have higher FLOP requirements. MoE should represent a ~5x improvement. FP8 should be about a ~2x improvement.

We don’t know how much of a speed improvement GRPO represents. They didn’t say how many GPU hours went into to RLing DeepSeek-r1 and we don’t have a o1 numbers to compare.

There’s definitely lots of misinformation spreading though. The $5.5m number refers to Deepseek-v3, not Deepseek-r1. I don't want to take away from HighFlyer's accomplishment, though. I think a lot of these innovations were forced to work around H800 networking limitations, and it's impressive what they've done.

[1] https://arxiv.org/abs/2401.06066

New comment by ahzhou in "DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare"

ahzhou — Tue, 28 Jan 2025 14:12:10 +0000

You can easily do a fermi estimate based on the information given. They are comparing GPU hours.

See: https://planetbanatt.net/articles/v3fermi.html

New comment by ahzhou in "Meta scrambling 'war rooms' of engineers to figure out DeepSeek's AI"

ahzhou — Tue, 28 Jan 2025 13:22:31 +0000

I might be missing something, but DeepSeek’s recipe is right there in plain sight. Most of the cost efficiency of DeepSeek v3 seem to be attributable to MoE and FP8 training. DeepSeek R1s improvements are from GRPO-based RL.

Interesting to note - we have no idea how much R1 cost to train. To speculate - maybe DeepSeek’s release made an upcoming Llama release moot in comparison.

New comment by ahzhou in "Coping with dumb LLMs using classic ML"

ahzhou — Fri, 24 Jan 2025 15:03:05 +0000

LLMs are inherently bad at this due to tokenization, scaling, and lack of training on the task. Anthropic’s computer use feature has a specialized model for pixel-counting: > Training Claude to count pixels accurately was critical. Without this skill, the model finds it difficult to give mouse commands. [1] For a VLM trained on identifying bounding boxes, check out PaliGemma [2]

You may also be able to get the computer use API to draw bounding boxes if the costs make sense.

That said, I think the correct solution is likely to use a non-VLM to draw bounding boxes. Depends on the dataset and problem.

1. https://www.anthropic.com/news/developing-computer-use 2. https://huggingface.co/blog/paligemma

New comment by ahzhou in "Were RNNs all we needed?"

ahzhou — Thu, 03 Oct 2024 20:14:09 +0000

Author: @fandzomga Username: fsndz

Why try to funnel us to your paywalled article?

New comment by ahzhou in "Mako – fast, production-grade web bundler based on Rust"

ahzhou — Tue, 02 Jul 2024 14:36:50 +0000

Conditionally yes. There are many libraries that cannot be tree shaken for various reasons. Libraries typically need to stick to a subset of full JS to ensure that the code can be statically analyzed.

New comment by ahzhou in "Why we no longer use LangChain for building our AI agents"

ahzhou — Fri, 21 Jun 2024 00:41:35 +0000

GraphQL is very powerful when combined with Relay. It’s useless extra bloat if you just use it like REST.

The difference between the two technologies is that LangChain was developed and funded before anyone know what to do with LLMs and GraphQL was internal tooling using to solve a real problem at Meta.

In a lot of ways, LangChain is a poor abstraction because the layer it’s abstracting was (and still is) in it’s infancy.

New comment by ahzhou in "Hate Chatbots? You Aren't the Only One"

ahzhou — Tue, 28 May 2024 21:58:42 +0000

To be fair, LLM-based chatbots are much better about this because you don't need to discover the magic incantation to talk to a human. It's a trade-off because that same property introduces the possibility of hallucination.

New comment by ahzhou in "Hate Chatbots? You Aren't the Only One"

ahzhou — Tue, 28 May 2024 21:54:41 +0000

It depends on the business, but the kind of metrics you are talking about are measured and taken seriously. People have absolutely gotten fired for CS quality KPI drops.

New comment by ahzhou in "Hate Chatbots? You Aren't the Only One"

ahzhou — Tue, 28 May 2024 16:05:13 +0000

While it may not happen for you, “too lazy to look it up” is the vast majority of CS requests.

My understanding from talking to a couple of CS execs is that these have been a slam dunk in terms of ROI because CS agents don’t need to handle type C requests. I expect we’ll only see more as time goes on.

New comment by ahzhou in "Big Tech to EU: "Drop Dead""

ahzhou — Sun, 19 May 2024 13:07:08 +0000

> AppStore would be dead on arrival

Certainly not. PMF was already established via the jailbreaking scene and Installer.app / Cydia. Millions of people went through the annoying processing of jailbreaking their phone to get apps.

New comment by ahzhou in "Financial market applications of LLMs"

ahzhou — Sun, 21 Apr 2024 14:59:25 +0000

If you’re saying that economics is a foundational driver of progress, then yes - almost by definition.

Banks and investors provide liquidity to the system, which is just one of many things the market demands.

New comment by ahzhou in "But what is a GPT? Visual intro to Transformers [video]"

ahzhou — Tue, 02 Apr 2024 03:11:56 +0000

Yes, this is a fundamental weakness with LLMs. Unfortunately this is likely unsolvable because the search space is exponential. Techniques like beam search help, but can only introduce a constant scaling factor.

That said, LLM reach their current performance despite this limitation.

New comment by ahzhou in "Grats: A More Pleasant Way to Build TypeScript GraphQL Servers"

ahzhou — Fri, 08 Mar 2024 06:08:00 +0000

They fall under a few buckets: Driver:

- node-postgres

- node-mysql2

Query Builder / Other thin clients: - knex - kysely - slonik ORM: - TypeORM - MikroORM - Objection.js - DrizzleORM - Prisma (actually runs a separate binary)

New comment by ahzhou in "The creator economy can't rely on Patreon"

ahzhou — Tue, 27 Feb 2024 14:22:37 +0000

Its a two sided marketplace and companies only care about the conversion they get from different channels. If demand dries up, it will be reflected in more attractive pricing - I don’t think it’s likely that the entire market pulls out.

FWIW - it seems like the campaigns are working. You seem to be familiar with the brands and someone below chimed in on how one particular brand is great. Multiply that by the viewership - that’s definitely a win.

Some quick (unverified) research tells me that YouTuber marketing pays somewhere in the range of 30-70 CPM. You can pretty easily calculate that against google AdWords with reasonable conversion assumptions to decide if it’s worth it.

New comment by ahzhou in "Panda CSS: build time and type-safe CSS-in-JS"

ahzhou — Tue, 06 Feb 2024 14:45:39 +0000

Css modules would be great, except there’s bad tooling in VSCode. Autocomplete through Typescript is the killer feature of Panda / Vanilla extract, not that you can style.