Hacker News: meatmanek

New comment by meatmanek in "Googlebook"

meatmanek — Tue, 12 May 2026 20:29:58 +0000

> Doubleclicking random exe files off the internet is almost uniquely a Windows problem.

Tell that to my partner's grandfather, who managed to find and install malware chrome extensions on his chromebox.

New comment by meatmanek in "LFM2-24B-A2B: Scaling Up the LFM2 Architecture"

meatmanek — Sat, 02 May 2026 18:33:54 +0000

Ok, I double-checked, and I get 21-22tps with lmstudio-community/LFM2-24B-A2B-Q4_K_M.gguf running under LM Studio on my i5-12400 with 2x32GB sticks of DDR4 3200. This is with small context (just "Write me a poem about a language model named Liquid" in `lms chat`)

    Prediction Stats:
      Stop Reason: eosFound
      Tokens/Second: 21.10
      Time to First Token: 1.827s
      Prompt Tokens: 42
      Predicted Tokens: 187
      Total Tokens: 229

New comment by meatmanek in "LFM2-24B-A2B: Scaling Up the LFM2 Architecture"

meatmanek — Sat, 02 May 2026 05:37:17 +0000

This model is pretty cool if you don't have a GPU - I was able to get I think 20 or 30 tokens per second on CPU (DDR4 ram) alone. (I don't remember if that was with q4 or q8.)

Otherwise, if you have a GPU with more than like 4GB of VRAM, there are better models. Gemma4 and Qwen3.6 (or Qwen3.5 if you need the smaller dense models that haven't yet been released for 3.6) are a good place to start.

New comment by meatmanek in "Granite 4.1: IBM's 8B Model Matching 32B MoE"

meatmanek — Thu, 30 Apr 2026 17:10:19 +0000

It's not that surprising that an 8B dense model would compete with a 35B-A3B MoE model.

The geometric mean rule of thumb for MoE models is that the intelligence level of an MoE model with T total parameters and A active parameters is roughly equivalent to that of a dense model with sqrt(A*T) parameters. For qwen3.6-35B-A3B, that equivalent size is 10.24B, spitting distance of an 8B model. Good training can make up the 28% difference in size.

New comment by meatmanek in "We have a 99% email reputation, but Gmail disagrees"

meatmanek — Sun, 12 Apr 2026 14:28:07 +0000

Several years back when I applied for a Google internship, I missed some emails from my recruiter (soandso@google.com) because they went to my gmail spam folder.

New comment by meatmanek in "Google releases Gemma 4 open models"

meatmanek — Thu, 02 Apr 2026 18:27:50 +0000

The Unsloth llama.cpp guide[1] recommends building the latest llama.cpp from source, so it's possible we need to wait for LM Studio to ship an update to its bundled llama.cpp. Fairly common with new models.

1. https://unsloth.ai/docs/models/gemma-4#llama.cpp-guide

New comment by meatmanek in "Oracle slashes 30k jobs"

meatmanek — Tue, 31 Mar 2026 21:23:52 +0000

By the same logic, wouldn't 4 months of severance pay be equivalent to forfeiting 92% of salary?

For something paid at regular intervals like RSUs, you really should never be looking at the total value of the grant, and instead think of it in terms of how many shares per paycheck/month/quarter/year you vest.

If you've got a cliff coming up, that's different. I'd be pissed if a company laid me off 11.5 months into a 12 month cliff or a few weeks before an annual bonus and didn't accelerate the vesting / bonus.

New comment by meatmanek in "Shell Tricks That Make Life Easier (and Save Your Sanity)"

meatmanek — Thu, 26 Mar 2026 21:12:59 +0000

> Be careful working CTRL + W into muscle memory though, I've lost count of how many browser tabs I've closed by accident...

I still maintain this is why macOS is the best OS for terminal work -- all the common keybindings for GUI tools use a different modifier key, so e.g. ⌘C and ⌘W work the same in your terminal as they do in your browser.

(Lots of the readline/emacs-style editing keybindings work everywhere in macos as well -- ^A, ^E, ^K, ^Y, but not ^U for some reason)

New comment by meatmanek in "Show HN: Three new Kitten TTS models – smallest less than 25MB"

meatmanek — Fri, 20 Mar 2026 01:43:35 +0000

You could try a preprocessing step where you convert to hiragana, but I guess that would lose pitch accent information (e.g. 飴 vs 雨)

New comment by meatmanek in "It Took Me 30 Years to Solve This VFX Problem – Green Screen Problem [video]"

meatmanek — Tue, 17 Mar 2026 21:13:23 +0000

The lights are relatively easy to get. iirc (it's been a bit since I watched their full video on the subject[1]) the hard part to find was the splitter that sends the sodium-vapor light to one camera and everything else to another camera.

1. https://www.youtube.com/watch?v=UQuIVsNzqDk

New comment by meatmanek in "Why Mathematica does not simplify sinh(arccosh(x))"

meatmanek — Sun, 15 Mar 2026 20:11:32 +0000

> For example, Sinh[ArcCosh[2]] returns −√3 but √(2² − 1) = √3. The expression Mathematica returns for Sinh[ArcCosh[x]] correctly evaluates to −√3

but the expression given is sqrt((x-1)/(x+1))(x+1), which for x=2 would be sqrt(1/3)*3 = sqrt(3)

did you mean Sinh[ArcCosh[-2]]?

New comment by meatmanek in "Big data on the cheapest MacBook"

meatmanek — Fri, 13 Mar 2026 18:18:41 +0000

5 months is a lot worse than 1 month, which is what the parent claimed.

New comment by meatmanek in "“This is not the computer for you”"

meatmanek — Fri, 13 Mar 2026 18:16:15 +0000

Yeah, I use Kinto (which seems to be what Toshy is originally based on). A recent Ubuntu update broke it though, and I accidentally deleted my config file while trying to fix it, so maybe now's a good time to try out Toshy. Looks like Toshy creates a python virtualenv instead of relying on system packages, which should make it a little more resilient to system package changes.

New comment by meatmanek in "Can I run AI locally?"

meatmanek — Fri, 13 Mar 2026 17:51:40 +0000

For ASR/STT on a budget, you want https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 - it works great on CPU.

I haven't tried on a raspberry pi, but on Intel it uses a little less than 1s of CPU time per second of audio. Using https://github.com/NVIDIA-NeMo/NeMo/blob/main/examples/asr/a... for chunked streaming inference, it takes 6 cores to process audio ~5x faster than realtime. I expect with all cores on a Pi 4 or 5, you'd probably be able to at least keep up with realtime.

(Batch inference, where you give it the whole audio file up front, is slightly more efficient, since chunked streaming inference is basically running batch inference on overlapping windows of audio.)

EDIT: there are also the multitalker-parakeet-streaming-0.6b-v1 and nemotron-speech-streaming-en-0.6b models, which have similar resource requirements but are built for true streaming inference instead of chunked inference. In my tests, these are slightly less accurate. In particular, they seem to completely omit any sentence at the beginning or end of a stream that was partially cut off.

New comment by meatmanek in "Can I run AI locally?"

meatmanek — Fri, 13 Mar 2026 17:25:24 +0000

This seems to be estimating based on memory bandwidth / size of model, which is a really good estimate for dense models, but MoE models like GPT-OSS-20b don't involve the entire model for every token, so they can produce more tokens/second on the same hardware. GPT-OSS-20B has 3.6B active parameters, so it should perform similarly to a 3-4B dense model, while requiring enough VRAM to fit the whole 20B model.

(In terms of intelligence, they tend to score similarly to a dense model that's as big as the geometric mean of the full model size and the active parameters, i.e. for GPT-OSS-20B, it's roughly as smart as a sqrt(20b*3.6b) ≈ 8.5b dense model, but produces tokens 2x faster.)

New comment by meatmanek in "“This is not the computer for you”"

meatmanek — Fri, 13 Mar 2026 17:03:24 +0000

macOS is the best desktop UNIX for one simple reason: the ⌘ key. The fact that 99% of your GUI keybindings use a key that your CLI tooling cannot use eliminates conflicts and means that you don't have to remember things like "Copy is ^C in Chrome but ^⇧C in the terminal".

New comment by meatmanek in "Big data on the cheapest MacBook"

meatmanek — Thu, 12 Mar 2026 23:31:01 +0000

> I’ve got a first gen M1 Max and it destroys all but the largest cloud instances (that cost its entire current market value per month!)

You're either underestimating how big cloud instances can get or overestimating how much it costs to rent a cloud instance that would beat an M1 Max at any multi-core processing.

According to Geekbench, the M1 Max macbook pro has a single-core performance of 2374 and multicore of 12257; AWS's c8i.4xlarge (16 vCPUs) has 2034 and 12807, so relatively equivalent.

That c8i.4xlarge would cost you $246/mo at current spot pricing of $0.3425/hr, which is, what, 20% of the cost of that M1 Max MBP?

As discussed recently in https://news.ycombinator.com/item?id=47291906, Geekbench is underestimating the multi-core performance of very large machines for parallelizable tasks -- the benchmark's performance peaks at around 12x single-core performance. (I might've picked a different benchmark but I couldn't find another benchmark that had results for both the M1 Max and the Xeon Scalable 6 family.)

If your tasks are _not_ like that, then even a mid-range cloud instance like a 64-vCPU c8i.16xlarge (which currently costs $0.95/hour on the spot market) will handily beat the M1 Max, by a factor of about 4. The largest cloud instances from AWS have 896 vCPUs, so I'd expect they'd outperform the M1 Max by about 50-to-1 for trivially parallelizable workloads. Even if you stay away from the exotic instances like the `u7i-12tb.224xlarge` and stick to the standard c/m/r families, the c8i.96xlarge has 384 vCPUs (so at least 24x the compute power of that M1 Max) and costs $3.76/hr.

New comment by meatmanek in "Shall I implement it? No"

meatmanek — Thu, 12 Mar 2026 22:51:08 +0000

I find that if I ask an LLM to explain what its reasoning was, it comes up with some post-hoc justification that has nothing to do with what it was actually thinking. Most likely token predictor, etc etc.

As far as I understand, any reasoning tokens for previous answers are generally not kept in the context for follow-up questions, so the model can't even really introspect on its previous chain of thought.

New comment by meatmanek in "Don't post generated/AI-edited comments. HN is for conversation between humans."

meatmanek — Wed, 11 Mar 2026 22:28:07 +0000

I like to think about it in terms of output-to-prompt ratio. For HN comments, I think an output ratio of 1 or less is _probably_ fine. Examples:

    - translating (relatively) literally from one language to another would be ~1:1.
    - automatic spelling/grammar correction is ~1:1
    - Using an LLM to help you find a concise way of expressing what you mean, i.e. giving it extra content to help it suggest a way of phrasing something that has the connotation you want, would be <1:1

Expansion (output > prompt) is where it gets problematic, at least for HN comments: if you give it an 8 word prompt and it expands it to 50, you've just wasted the reader's time -- they could've read the prompt and gotten the same information.

(expansion is perfectly fine in a coding context -- it often takes way fewer words to express what you want the program to do than the generated code will contain.)

New comment by meatmanek in "TSA leaves passenger needing surgery after illegally forcing her through scanner"

meatmanek — Fri, 06 Mar 2026 21:45:30 +0000

I want to know more about the mechanism of damage to a spinal implant from (what I assume is) a millimeter-wave scanner. I would expect millimeter waves to not penetrate very deeply -- Wikipedia says "typically less than 1 mm" (their citation for that is behind a paywall though.) Seems like an implant should be more than 1mm below the surface.