Hacker News: ljosifov

New comment by ljosifov in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"

ljosifov — Tue, 16 Jun 2026 07:03:06 +0000

Not replaced but supplemented. For off-line coding current setup is pi + ds4-server + DeepSeek-V4-Flash REAP25 (on M2 Max 96gb). For simpler programming related (e.g. text2sql) as well as synthetic data generation, current best for me is llama.cpp + Gemma-4-26B-A4B (on gpu 7900xtx 24gb; sometimes nemotron-cascade-2-30b-a3b for 1M context). That and (dabbling now) auto-research uses lots of tokens. Used to get paused running out of token quotas all the time. The 1st local model I found somewhat useful to me was glm-4.7-flash, and it's gotten way better since. Recently between OpenCode Go choice of models at many price points, and DeepSeek-V4 dropping the IQ/$$$ by multiples, have become less reliant on local llms for this auxiliary work. Claude I use but with Zai GLM-5.2 subscription. And maintain GPT subscription for quality models.

New comment by ljosifov in "How to setup a local coding agent on macOS"

ljosifov — Sat, 13 Jun 2026 09:58:44 +0000

For high Ram (unified), and relatively middling to lowish Tflops and bandwidth GB/s, usually MoEs are most hopeful. The current top-1 in the (iq, tok/s, @ context depth) ranks for me (M2 Max, 96gb) is DeepSeek-V4-Flash REAP25 <65gb gguf + ds4-server + pi agent. Not better than cloud API ofc, but useful enough to endure if I need to. E.g on a non-Internet 4h flight the battery (local llm draws 60w) held long enough. REAP supporting ds4 branch here

https://github.com/ljubomirj/ds4/tree/reap-compact-support

DS4F dropping to unusable <10 tok/s only at 784K context (!!) makes a big difference.

New comment by ljosifov in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"

ljosifov — Tue, 09 Jun 2026 05:56:21 +0000

Yes, it's performant, and esp performant at non-trivial context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop tok/s speed much less than the rest. On my M2 Max it took context depths of 768K to drop tok/s to ~10 tok/s.

https://x.com/ljupc0/status/2062457314414587996

Other local models I've checked drop to unusable speeds way sooner. Only other model with similarity favourable curve I've tried is nemotron-cascade-2-30b-a3b. But it's a small model, way dumber than DS4F.

Coding agents use cases have large context depths. The rate of decline is as important as the headline number.

New comment by ljosifov in "Quantum Information as Everything"

ljosifov — Mon, 08 Jun 2026 13:40:39 +0000

Thanks for the tear down. IDK anything about quantum (my knowledge there starts and ends with https://www.scottaaronson.com/democritus/lec9.html), but amused enough to follow in the background. See whether it ends crazy-bad or crazy-good. :-)

And just today now I saw this https://arxiv.org/abs/2606.07352. What's your take on it?

Am old enough to have witnessed more than one wave of "impossible things" happen for real in my lifetime, so lets see. As long as the scientific method (evidence, publication, replication, testing, etc) is mostly followed am curious enough to check blog posts or interviews from time to time.

Quantum Information as Everything

ljosifov — Sun, 07 Jun 2026 19:51:32 +0000

Article URL: https://vlatkovedral.substack.com/p/quantum-information-as-everything

Comments URL: https://news.ycombinator.com/item?id=48437929

Points: 2

# Comments: 3

New comment by ljosifov in "Use boring languages with LLMs"

ljosifov — Wed, 27 May 2026 06:55:32 +0000

+1 for boring. Boring code is Solid Code, in the sense of "Writing Solid Code" - the old book by Steve Maguire.

New comment by ljosifov in "A few words on DS4"

ljosifov — Fri, 15 May 2026 20:57:15 +0000

Thanks for the DS4, will give it a try. Was hoping maybe I can re-quantise shave few GB... MiniMax-M2.7 Unsloth's UD-IQ2_XXS is down to 65GB - it run albeit too slow to be usable to an agent at context depth. I'm curious DS4F with it being economical with the KV caches - if that translates into keeping up with context. Was hoping 80GB 2-bit quants maybe come down to 70GB... that would be more comfortable to run.

New comment by ljosifov in "A few words on DS4"

ljosifov — Fri, 15 May 2026 14:27:03 +0000

On 96gb I can give up to about 88GB to the GPU with sysctl iogpu.wired_limit_mb=88000, without suffering any ill-effects. When pushed higher I tend to notice e.g. graphic driver errors, youtube web page not working, other semi-random glitches. So the ~80 GB of DS4-flash quants I could just about fit. Leaving some extra for the KV caches. Will try, I'm curious how's the DS4 degradation with context depth growth, how fast does tok/s drop. E.g. 2-bit lowest quant MiniMax-M2.6 runs, but starts low tok/s and degrades fast with context depth.

The biggest models I can comfortably run are about 1/2 the DS4F size - like gpt-oss-120b. Lately was toying with Ling-2.6-flash. Got the agents to adapt existing metal kernels in llama.cpp, and it did run (model https://huggingface.co/ljupco/Ling-2.6-flash-GGUF, branch https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flas...). It's 104B-A7B4, and for the M2 Max 7.4B active is about the most it can take while still producing 40 tok/s. And the hybrid arch allows for graceful degradation, still close to 30 tok/s at 64K context depth.

Too bad L2.6F while the best have, is not that much better in agentic benchmarks compared to my current incumbent local llm (nemotron-cascade-2). Got inspired by DS4 to start a l26f branch (WIP https://github.com/ljubomirj/l26f). :-) Try squeeze the most from L2.6F. There should be low hanging fruit in good integration of the agent and the inferencing engine. On input - considering the huge difference cached v.s. non-cached tokens. On output - considering that the NN gives us the complete logits set for all 200K+ tokens vocabulary.

New comment by ljosifov in "A few words on DS4"

ljosifov — Fri, 15 May 2026 09:21:28 +0000

Love this, even if can't use it atm (not got the h/w - only 96gb on M2 Max). I get it the general comp/public will find it unusable or worse. Reminds me of how home computers were - mere toys - before they became personal computers (PC). On my h/w the only passable combo for me atm is pi agent + llama.cpp + nemotron cascade-2 model: to 1M context, hybrid arch doesn't crash & burn 1/N^2 with context depths of 10K-50K-100K used by code agents. Was on a plane without Internet the other day. Brought a smile to my face that I could run pi agent (with llama.cpp serving), and it was just about usable at 40-30 tok/s. Afaik the usual API speeds are double that, 60-80 tok/s. Sensors showing using 60W when running inference. So battery probably would not last more than >3h. Model only 30B in size leaves plenty of space for KV-caches, and other programs - even at generous 8-bit quant. Only 3B active params at one time (with MoE A3B) is about the most that ageing M2 Max can carry it seems.

New comment by ljosifov in "As researchers age, they produce less disruptive work"

ljosifov — Wed, 13 May 2026 14:02:07 +0000

What we see and experience - it's all natural, it's the natural order. :-) When people claim something is un-natural, usually it's natural in that occurs in nature, only - they themselves find it objectionable. It's something they personally dislike, and would rather it not happen.

New comment by ljosifov in "DeepSeek 4 Flash local inference engine for Metal"

ljosifov — Fri, 08 May 2026 10:43:08 +0000

In the same boat with 7900xtx. 24GB vram, on paper decent performance, in reality most things don't run. Only llama.cpp is consistent that it can run most models, even if maybe not at top performance (afaik - lacking MTP, problems cache invalidation with hybrid models). At least with llama.cpp I know what runs. With various python-based inferencers, between their uv/venv, my venv, system envs/pythons/libs yadayada - I need an agent to get to the bottom of what's actually running. :-) Yeah IK skill issue/user errors - but don't have seconds in the day left to spend them on that.

Even if not perfect, if you publish on GH or HF, some other agent can maybe start there and not from zero. I did this for Ling-2.6-flash (107B-A7B4 MoE) that's the biggest llm I can ran for practical use on the other h/w I got for local llms (M2 Max). Even if MTP is not working well, still improvement on the current llama.cpp that does not run Ling-2.6-flash at all. This - https://huggingface.co/inclusionAI/Ling-2.6-flash/discussion.... The 4-bit quants are at https://huggingface.co/ljupco/Ling-2.6-flash-GGUF, the branch is at https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flas....

New comment by ljosifov in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"

ljosifov — Sun, 26 Apr 2026 16:05:24 +0000

~/llama.cpp$ Qwen3.6-27B-Q4_K_M | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Qwen3.6-27B-IQ4_NL | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | On mbp M2 Max Qwen3.6-27B-UD-Q8_K_XL | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Compared to similar prior models On amd 7900xtx Qwen3.6-35B-A3B-UD-Q4_K_S | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Qwen3.6-35B-A3B-UD-IQ4_XS | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Carnice-Qwen3.6-MoE-35B-A3B-Q4_K_S | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Nemotron-Cascade-2-30B-A3B-IQ4_XS | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | On mbp M2 Max Qwen3.6-35B-A3B-UD-Q6_K_XL | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Nemotron-Cascade-2-30B-A3B-Q8_0 | PP | |-------|--------| | 512 | | 1024 | | 2048 | | 4096 | | 8192 | | 16384 | | 32768 | Dang I saw build-.../bin/llama-batched-bench -m models/....gguf -npp 512,1024,2048,4096,8192,16384,32768 -ntg 128 -npl 1 -c 36000

  On amd 7900xtx TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.743 |   689.35 |    4.605 |    27.80 |    5.348 |   119.68 | 128 |    1 |   1152 |    1.188 |   862.17 |    4.573 |    27.99 |    5.761 |   199.96 | 128 |    1 |   2176 |    2.566 |   798.09 |    4.602 |    27.81 |    7.168 |   303.57 | 128 |    1 |   4224 |    5.936 |   690.00 |    4.639 |    27.59 |   10.575 |   399.43 | 128 |    1 |   8320 |   15.034 |   544.90 |    4.729 |    27.06 |   19.763 |   420.98 | 128 |    1 |  16512 |   42.807 |   382.74 |    4.886 |    26.20 |   47.694 |   346.21 | 128 |    1 |  32896 |  137.377 |   238.53 |    5.188 |    24.67 |  142.566 |   230.74 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.535 |   957.45 |    3.715 |    34.45 |    4.250 |   150.59 | 128 |    1 |   1152 |    1.124 |   911.16 |    3.677 |    34.81 |    4.801 |   239.97 | 128 |    1 |   2176 |    2.447 |   836.89 |    3.698 |    34.62 |    6.145 |   354.13 | 128 |    1 |   4224 |    5.711 |   717.17 |    3.729 |    34.32 |    9.441 |   447.43 | 128 |    1 |   8320 |   14.615 |   560.52 |    3.821 |    33.50 |   18.436 |   451.30 | 128 |    1 |  16512 |   41.966 |   390.41 |    3.967 |    32.26 |   45.933 |   359.48 | 128 |    1 |  32896 |  135.789 |   241.32 |    4.253 |    30.09 |  140.042 |   234.90 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    2.583 |   198.18 |   22.049 |     5.81 |   24.633 |    25.98 | 128 |    1 |   1152 |    8.321 |   123.06 |   22.364 |     5.72 |   30.685 |    37.54 | 128 |    1 |   2176 |   17.873 |   114.59 |   23.290 |     5.50 |   41.164 |    52.86 | 128 |    1 |   4224 |   41.967 |    97.60 |   23.624 |     5.42 |   65.591 |    64.40 | 128 |    1 |   8320 |   68.722 |   119.20 |   21.077 |     6.07 |   89.799 |    92.65 | 128 |    1 |  16512 |  142.184 |   115.23 |   22.026 |     5.81 |  164.210 |   100.55 | 128 |    1 |  32896 |  339.778 |    96.44 |   24.465 |     5.23 |  364.243 |    90.31 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.203 |  2517.60 |    1.482 |    86.35 |    1.686 |   379.67 | 128 |    1 |   1152 |    0.427 |  2399.22 |    1.471 |    87.04 |    1.897 |   607.15 | 128 |    1 |   2176 |    0.946 |  2165.23 |    1.478 |    86.59 |    2.424 |   897.67 | 128 |    1 |   4224 |    2.253 |  1818.33 |    1.502 |    85.22 |    3.755 |  1125.01 | 128 |    1 |   8320 |    5.849 |  1400.51 |    1.525 |    83.91 |    7.375 |  1128.17 | 128 |    1 |  16512 |   17.115 |   957.27 |    1.589 |    80.55 |   18.705 |   882.78 | 128 |    1 |  32896 |   56.008 |   585.06 |    1.704 |    75.10 |   57.712 |   570.00 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.204 |  2508.94 |    1.313 |    97.46 |    1.517 |   421.78 | 128 |    1 |   1152 |    0.423 |  2418.64 |    1.296 |    98.80 |    1.719 |   670.18 | 128 |    1 |   2176 |    0.946 |  2164.61 |    1.323 |    96.78 |    2.269 |   959.13 | 128 |    1 |   4224 |    2.235 |  1832.54 |    1.326 |    96.52 |    3.561 |  1186.06 | 128 |    1 |   8320 |    5.845 |  1401.44 |    1.352 |    94.70 |    7.197 |  1156.03 | 128 |    1 |  16512 |   17.096 |   958.38 |    1.417 |    90.33 |   18.513 |   891.94 | 128 |    1 |  32896 |   56.013 |   585.00 |    1.530 |    83.66 |   57.543 |   571.67 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.205 |  2499.78 |    1.483 |    86.31 |    1.688 |   379.16 | 128 |    1 |   1152 |    0.434 |  2361.36 |    1.448 |    88.40 |    1.882 |   612.25 | 128 |    1 |   2176 |    0.947 |  2161.87 |    1.478 |    86.62 |    2.425 |   897.27 | 128 |    1 |   4224 |    2.259 |  1813.00 |    1.472 |    86.94 |    3.732 |  1131.98 | 128 |    1 |   8320 |    5.892 |  1390.42 |    1.505 |    85.06 |    7.397 |  1124.85 | 128 |    1 |  16512 |   17.397 |   941.77 |    1.568 |    81.61 |   18.965 |   870.63 | 128 |    1 |  32896 |   56.296 |   582.07 |    1.690 |    75.74 |   57.986 |   567.31 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.195 |  2622.33 |    0.972 |   131.69 |    1.167 |   548.30 | 128 |    1 |   1152 |    0.407 |  2514.76 |    0.934 |   137.10 |    1.341 |   859.16 | 128 |    1 |   2176 |    0.854 |  2396.99 |    0.942 |   135.90 |    1.796 |  1211.42 | 128 |    1 |   4224 |    1.895 |  2161.89 |    0.953 |   134.36 |    2.847 |  1483.50 | 128 |    1 |   8320 |    4.593 |  1783.70 |    0.967 |   132.43 |    5.559 |  1496.60 | 128 |    1 |  16512 |   12.213 |  1341.53 |    0.996 |   128.56 |   13.209 |  1250.10 | 128 |    1 |  32896 |   36.998 |   885.66 |    1.059 |   120.89 |   38.057 |   864.39 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.540 |   947.31 |    2.489 |    51.42 |    3.030 |   211.22 | 128 |    1 |   1152 |    0.951 |  1077.21 |    3.237 |    39.54 |    4.188 |   275.10 | 128 |    1 |   2176 |    2.994 |   684.10 |    3.139 |    40.77 |    6.133 |   354.80 | 128 |    1 |   4224 |    6.245 |   655.85 |    3.210 |    39.88 |    9.455 |   446.75 | 128 |    1 |   8320 |   12.411 |   660.08 |    3.284 |    38.98 |   15.694 |   530.13 | 128 |    1 |  16512 |   28.321 |   578.51 |    3.584 |    35.71 |   31.905 |   517.53 | 128 |    1 |  32896 |   65.725 |   498.56 |    4.029 |    31.77 |   69.754 |   471.60 | TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s | ------|--------|----------|----------|----------|----------|----------|----------| 128 |    1 |    640 |    0.528 |   969.13 |    2.036 |    62.87 |    2.564 |   249.59 | 128 |    1 |   1152 |    1.079 |   948.84 |    3.201 |    39.99 |    4.280 |   269.15 | 128 |    1 |   2176 |    3.390 |   604.10 |    2.952 |    43.36 |    6.342 |   343.11 | 128 |    1 |   4224 |    6.756 |   606.28 |    2.991 |    42.79 |    9.747 |   433.35 | 128 |    1 |   8320 |   13.647 |   600.30 |    3.061 |    41.81 |   16.708 |   497.97 | 128 |    1 |  16512 |   29.491 |   555.56 |    3.414 |    37.50 |   32.905 |   501.81 | 128 |    1 |  32896 |   65.867 |   497.49 |    3.663 |    34.95 |   69.530 |   473.12 | some lowish numbers there for Spaks (and Strix). As I was eyeing a spark to get some CUDA exposure... :-O



New comment by ljosifov in "Hammerspoon"
ljosifov — Sat, 14 Mar 2026 12:17:38 +0000

Glad to see other people using it. Saved my life, was going crazy click-clicking to nab the right window. Now Cmd-1..9 brings to focus a window of my chosen application. (Chrome) In case it helps someone else, myself and Codex iterating over time https://github.com/ljubomirj/dotfiles/blob/main/.hammerspoon.... 
Cmd-1..9 switches over focuses to a particular window, Cmd-0 presents an (ugly; but suffices) dialog box to select the window with arrows (of the App of interest - Chrome for me atm) to switch to. But more important - to see what window what Window name is recalled by the particular Cmd-1..9 shortcut. Option-arrows shuffle window-to-key ordering. I right-click-Name Window my windows. Think back now - on restart they may even be preserved?? Don't recall re/naming them manually recently. (possible I've forgotten though)



New comment by ljosifov in "How to run Qwen 3.5 locally"
ljosifov — Sun, 08 Mar 2026 11:38:12 +0000

Say more please if you can. How/why is ik_llama.cpp faster then mainline, for the 27B dense? I'd like to be able to run 27B dense faster on a 24GB vram gpu, and also on an M2 max.



New comment by ljosifov in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"
ljosifov — Wed, 11 Feb 2026 15:54:01 +0000

Everyone should do the calculation for themselves. I too pay for couple of subs. But I'm noticing having an agent work for me 24/7 changes the calculation somewhat. Often not taken into account: the price of input tokens. To produce 1K of code for me, the agent may need to churn through 1M of tokens of codebase. IDK if that will be cached by the API provider or not, but that makes x5-7 times price difference. OK discussion today about that and more https://x.com/alexocheema/status/2020626466522685499



How to Run Local LLMs with Claude Code and OpenAI Codex
ljosifov — Thu, 29 Jan 2026 21:38:08 +0000

Article URL: https://unsloth.ai/docs/basics/claude-codex
Comments URL: https://news.ycombinator.com/item?id=46816996
Points: 2
# Comments: 0



New comment by ljosifov in "Ask HN: Share your personal website"
ljosifov — Wed, 14 Jan 2026 19:48:34 +0000

https://ljubomirj.github.io small personal ~/public_html



When competition leads to human values by Beren Millidge [video]
ljosifov — Wed, 14 Jan 2026 07:16:35 +0000

Article URL: https://www.youtube.com/watch?v=ua67aXBP76k
Comments URL: https://news.ycombinator.com/item?id=46613277
Points: 1
# Comments: 0



New comment by ljosifov in "I rebooted my social life"
ljosifov — Thu, 01 Jan 2026 16:48:00 +0000

+1. For every one like the author of the blog post, it's likely to be another one in the opposite direction. But they will be unlikely to write a post about that.  I too found weighting 'spend time with human persons v.s. with my own thoughts, or programming and writing, or reading a paper or a post, or listening to a podcast while walking in nature' lately come down on the side away from humans. So far - it's been way more interesting. When/if that changes and becomes boring - will think what next and change.



A new way to extract detailed transcripts from Claude Code
ljosifov — Fri, 26 Dec 2025 10:36:00 +0000

Article URL: https://simonwillison.net/2025/Dec/25/claude-code-transcripts/
Comments URL: https://news.ycombinator.com/item?id=46390913
Points: 3
# Comments: 0