Hacker News: htsh

New comment by htsh in "Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model"

htsh — Sun, 08 Mar 2026 15:42:32 +0000

yes! especially b/c i want to process a lot of email and directories full of old, personal documents

New comment by htsh in "Apple's 512GB Mac Studio vanishes, a quiet acknowledgment of the RAM shortage"

htsh — Sun, 08 Mar 2026 12:34:19 +0000

are we sure the RAM market will stop being insane in a year or two or could this be the new norm?

New comment by htsh in "Claude Code: connect to a local model when your quota runs out"

htsh — Wed, 04 Feb 2026 23:15:14 +0000

thanks! came in here to ask this.

we can do much better with a cheap model on openrouter (glm 4.7, kimi, etc.) than anything that I can run on my lowly 3090 :)

New comment by htsh in "The unreasonable effectiveness of an LLM agent loop with tool use"

htsh — Fri, 16 May 2025 15:29:17 +0000

I have been doing this with claude code and openai codex and/or cline. One of the three takes the first pass (usually claude code, sometimes codex), then I will have cline / gemini 2.5 do a "code review" and offer suggestions for fixes before it applies them.

New comment by htsh in "Qwen3: Think deeper, act faster"

htsh — Mon, 28 Apr 2025 22:03:51 +0000

curious, why the 30b MoE over the 32b dense for local coding?

I do not know much about the benchmarks but the two coding ones look similar.

New comment by htsh in "OpenVINO AI effects [denoising and transcription] for Audacity"

htsh — Sun, 16 Feb 2025 16:20:50 +0000

A lot of us have ryzen / nvidia combos... hopefully, soon, though.

New comment by htsh in "Official DeepSeek R1 Now on Ollama"

htsh — Tue, 21 Jan 2025 11:52:48 +0000

assuming you want to run entirely in GPU, with 12gb vram, your sweet spot is likely the distill 14b qwen at a 4bit quant. so just run:

ollama run deepseek-r1:14b

generally, if the model file size < your vram, it is gonna run well. this file is 9gb.

if you don't mind slower generation, you can run models that fit within your vram + ram, and ollama will handle that offloading of layers for you.

so the 32b should run on your system, but it is gonna be much slower as it will be using GPU + CPU.

prob of interest: https://simonwillison.net/2025/Jan/20/deepseek-r1/

-h

New comment by htsh in "I turned my open-source project into a full-time business"

htsh — Tue, 27 Feb 2024 13:19:50 +0000

As a longtime user of nodemailer, thank you.

I am gonna check out emailengine for future work.

New comment by htsh in "MobileDiffusion: Rapid text-to-image generation on-device"

htsh — Thu, 01 Feb 2024 12:46:14 +0000

Dreambooth was kinda great?

That said, I agree that I wish there were more done post-research towards products with some of this stuff.

New comment by htsh in "Brave Leo now uses Mixtral 8x7B as default"

htsh — Sat, 27 Jan 2024 13:35:59 +0000

Yes, offloading some layers to the GPU and VRAM should still help. And 11gb isn't bad.

If you're on linux or wsl2, I would run oobabooga with --verbose. Load a GGUF, start with a small number of GPU layers and creep up, keeping an eye on VRAM usage.

If you're on windows, you can try out LM Studio and fiddle with layers while you monitor VRAM usage, though windows may be doing some weird stuff sharing ram.

Would be curious to see the diffs. Specifically if there's a complexity tax in offloading that makes the CPU-alone faster but in my experience with a 3060 and a mobile 3080, offloading what I can makes a big diff.

New comment by htsh in "Brave Leo now uses Mixtral 8x7B as default"

htsh — Sat, 27 Jan 2024 11:27:48 +0000

openrouter, fireworks, together.

we use openrouter but have had some inconsistency with speed. i hear fireworks is faster, swapping it out soon.

New comment by htsh in "Show HN: Voxos.ai – An Open-Source Desktop Voice Assistant"

htsh — Fri, 19 Jan 2024 17:18:59 +0000

Can one enter their own opeanai URL and api-key? (so we can use openai-compatible things like openrouter or lm-studio)?

New comment by htsh in "Show HN: Open source alternative to ChatGPT and ChatPDF-like AI tools"

htsh — Sun, 10 Dec 2023 13:14:06 +0000

That is what the RAG system does. The PDF is chunked and thrown into a vector store. And then when prompted, only the relevant bits are retrieved and stuffed into the context and sent to the LLM.

So yeah it's kinda smoke and mirrors. In some cases, for some long PDFs, it works really well. If it's a 500 page PDF with many disparate topics, it may do fine.

New comment by htsh in "Show HN: CopilotKit- Build in-app AI chatbots and AI-powered textareas"

htsh — Wed, 06 Dec 2023 16:43:34 +0000

Cool! Any plans for Svelte?

New comment by htsh in "Ask HN: SaaS pricing pages with high prices and not “contact sales”"

htsh — Sat, 30 Sep 2023 18:52:39 +0000

supabase just added their $599 tier for their soc2/hippa compliant product. really appreciated that.

New comment by htsh in "Exllamav2: Inference library for running LLMs locally on consumer-class GPUs"

htsh — Wed, 13 Sep 2023 17:19:16 +0000

This subreddit remained open. Unfortunately, however, the oobabooga one went closed for a while and lost a lot of momentum. It is also back, however.

Are there good lemmy spaces for LLMs?

New comment by htsh in "Chief executives cannot shut up about AI"

htsh — Fri, 02 Jun 2023 10:00:10 +0000

I know how that works. And my point was not that they should or will be replaced, but rather that they are no less expendable than developers (not very much).

But the decisions they make are one of the things that can be automated. I do not know if you have been inside one of these places but the executives are not doing a great job deciding (at mine they decided opensearch was a better bet than elastic and switched existing installations).

A new regime came in and then bad decision after bad decision drove our best talent away. Consultants, everywhere.

Also, that number is much lower. Full time devs are down, contractors and consultants are up. As a full time dev at one of these places, it felt like the number of executives was growing as everything else shrank.

Perhaps you are right about the highest levels, but think about all of the middlemen executives and what they do.

And even that -- I think an AI could choose to not spend millions on Deloitte or Accenture on software that inevitably failed.

New comment by htsh in "Chief executives cannot shut up about AI"

htsh — Fri, 02 Jun 2023 01:47:33 +0000

Having just left a large enterprise, it certainly feels like executive jobs are replaceable soonest with the AI tech available to us now.

Not sure why those of us that live in code editors or even Excel should worry about our jobs more than those that live in Powerpoint.

New comment by htsh in "Ask HN: Where have you found community outside of work?"

htsh — Tue, 30 May 2023 19:02:33 +0000

I moved back to NYC after a long time away, and to a different part of town than where many of my old friends live, and getting a dog considerably improved my connection with the folks around me.

And of course, as others have said, volunteering.

New comment by htsh in "Making friends as an adult is hard (2021)"

htsh — Tue, 18 Apr 2023 11:11:23 +0000

1. go to a place where folks you want to hang out with live

2. get a dog