Hacker News: ryan_glass

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 13:50:57 +0000

Coding, my own proprietary code hence my desire for local hosting, a decent amount of legacy code. General troubleshooting of anything and everything from running Linux servers to fixing my car. Summarizing and translation of large documents occasionally. Also, image generation and other automations but obviously not LLMs for this.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 13:43:43 +0000

Basically it comes down to memory bandwidth of server CPUs being decent. A bit of oversimplification here but... The model and context have to be pulled through RAM (or VRAM) every time a new token is generated. CPUs that are designed for servers with lots of cores have decent bandwidth - up to 480GB/s with the EPYC 9 series and they can use 16 channels simultaneously to process memory. So, in theory they can pull 480GB through the system every second. GPUs are faster but you also have to fit the entire model and context into RAM (or VRAM) so for larger models they are extremely expensive because a decent consumer GPU only has 24GB of VRAM and costs silly money, if you need 20 of them. Whereas you get a lot of RDIMM RAM for a couple thousand bucks so you can run bigger models and 480GB/s gives output faster than most people can read.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 08:48:45 +0000

To be honest I haven't used o3 or Sonnet as the code I work with is my own proprietary code which I like to keep private, which is one reason for the local setup. For troubleshooting day to day things I have found it at least as good as than the free in-browser version of ChatGPT (not sure which model it uses).

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 08:35:28 +0000

The quality on Gemma 27B is nowhere near good enough for my needs. None of the smaller models are.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 08:29:29 +0000

It might be 5 to 10 times slower than a hosted provider but that doesn't really matter when the output is still faster than a person can read. Context wise, for troubleshooting I have never needed over 16k and for the rare occasion when I need to summarise a very large document I can change up the model to something smaller and get a huge context. I have never needed more than 32k though.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Mon, 02 Jun 2025 08:21:36 +0000

Thank you for making the dynamic quantisations! My setup wouldn't be possible without them and for my personal use, they do exactly what I need and are indeed excellent.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Sun, 01 Jun 2025 21:59:37 +0000

No hard numbers I'm afraid in that I don't monitor the power draw. But the machine uses a standard ATX power supply: a Corsair RM750e 750W PSU and the default TDP of the CPU is 280W - I have my TDP set at 300W. It is basically built like a desktop - ATX form factor, fans spin down at idle etc.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Sun, 01 Jun 2025 21:45:23 +0000

You are right that I haven't been rigorous - it's easy to benchmark tokens/second but quality of output is more difficult to nail down. I couldn't find any decent comparisons for Unsloth either. So I just tried a few of their models out, looking for something that was 'good enough' i.e. does all I need: coding, summarizing documents, troubleshooting anything and everything. I would like to see head to head comparisons too - maybe I will invest in more RAM at some stage but so far I have no need for it. I ran some comparisons between the smaller and larger versions of the Unsloth models and interestingly (for me anyway) didn't notice a huge amount of difference in quality between them. But, the smaller models didn't run significantly faster so I settled for the biggest model I could fit in RAM with a decent context. For more complex coding I use Deepseek R1 (again the Unsloth) but since it's a reasoning model it isn't real-time so no use as my daily driver.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Sun, 01 Jun 2025 21:09:10 +0000

Prompt eval time varies a lot with context but it feels real-time for short prompts - approx 20 tokens per second but I haven't done much benchmarking of this. When there is a lot of re-prompting in a long back and forth it is still quite fast - I do use KV cache which I assume helps and also quantize the KV cache to Q8 if I am running contexts above 16k. However, if I want it to summarize a document of say 15,000 words it does take a long time - here I walk away and come back in about 20 minutes and it will be complete.

New comment by ryan_glass in "Why DeepSeek is cheap at scale but expensive to run locally"

ryan_glass — Sun, 01 Jun 2025 18:59:45 +0000

I run Deepseek V3 locally as my daily driver and I find it affordable, fast and effective. The article assumes GPU which in my opinion is not the best way to serve large models like this locally. I run a mid-range EPYC 9004 series based home server on a supermicro mobo which cost all-in around $4000. It's a single CPU machine with 384GB RAM (you could get 768GB using 64GB sticks but this costs more). No GPU means power draw is less than a gaming desktop. With the RAM limitation I run an Unsloth Dynamic GGUF which, quality wise in real-world use performs very close to the original. It is around 270GB which leaves plenty of room for context - I run 16k context normally as I use the machine for other things too but can up it to 24k if I need more. I get about 9-10 tokens per second, dropping to 7 tokens/second with a large context. There are plenty of people running similar setups with 2 CPUs who run the full version at similar tokens/second.

The effect of Covid-19 lockdowns on website downtime globally

ryan_glass — Sat, 23 May 2020 10:51:37 +0000

Article URL: https://downtimemonkey.com/blog/effect-of-covid-19-lockdown-on-website-downtime.php

Comments URL: https://news.ycombinator.com/item?id=23281877

Points: 1

# Comments: 0

New comment by ryan_glass in "Show HN: I built a service to discover and monitor rapidly growing trends"

ryan_glass — Sat, 04 Apr 2020 21:56:49 +0000

Great idea - I can see this being popular. Small heads-up on responsiveness needing fixed on tablet in portrait view.

New comment by ryan_glass in "Show HN: Log-Scale Covid-19 Plots"

ryan_glass — Sat, 04 Apr 2020 13:45:23 +0000

It would be interesting to see graphs of deaths for other reasons to compare numbers. For example deaths from starvation/malnutrition are likely increasing in India due to lockdown (https://www.theguardian.com/world/commentisfree/2020/mar/29/...) and deaths due to cancer may increase due to patients' treatment being missed.

New comment by ryan_glass in "Vinod Khosla Wins Ruling Threatening Public Beach Access"

ryan_glass — Tue, 26 Nov 2019 16:56:34 +0000

Scotland has a right of access to land (including beaches) as well as inland water throughout the country. This is known as 'the right to roam' and is a good example of a jurisdiction where statutory right of access works well: https://www.scotways.com/faq/law-on-statutory-access-rights

New comment by ryan_glass in "The Value in Go’s Simplicity"

ryan_glass — Sat, 16 Nov 2019 18:20:16 +0000

This article really, really makes me want to give Go a shot - more than anything I've read about the language before.

New comment by ryan_glass in "If you're busy, you're doing something wrong (2011)"

ryan_glass — Fri, 15 Nov 2019 14:11:52 +0000

One possible issue with this article is that its focus is on students - their 'job' being to learn as opposed to producing or delivering something, which is the aim once their skills are learned.

Learning is often done best in small chunks e.g. when learning to rock climb it is usually best to climb no more than 6 hours a day, every other day. But when the skills are learned the climber may spend 15 hours a day, 3 days straight to climb their dream route and they may well sleep hanging from the side of their wall too.

I'd be interested to see how much time the elite players devoted to their work when on tour as professional musicians in later life.

New comment by ryan_glass in "Functions: To Split or Not to Split"

ryan_glass — Thu, 17 Oct 2019 22:50:40 +0000

Nice post - thanks. I'm guilty of sometimes splitting and other times not, making the choice mostly on feel. Never modifying local state when splitting functions is an easy rule of thumb to follow.