<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ljosifov</title><link>https://news.ycombinator.com/user?id=ljosifov</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 21 Jun 2026 13:47:44 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ljosifov" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ljosifov in "Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?"]]></title><description><![CDATA[
<p>Not replaced but supplemented. For off-line coding current setup is pi + ds4-server + DeepSeek-V4-Flash REAP25 (on M2 Max 96gb). For simpler programming related (e.g. text2sql) as well as synthetic data generation, current best for me is llama.cpp + Gemma-4-26B-A4B (on gpu 7900xtx 24gb; sometimes nemotron-cascade-2-30b-a3b for 1M context). That and (dabbling now) auto-research uses lots of tokens. Used to get paused running out of token quotas all the time. The 1st local model I found somewhat useful to me was glm-4.7-flash, and it's gotten way better since. Recently between OpenCode Go choice of models at many price points, and DeepSeek-V4 dropping the IQ/$$$ by multiples, have become less reliant on local llms for this auxiliary work. Claude I use but with Zai GLM-5.2 subscription. And maintain GPT subscription for quality models.</p>
]]></description><pubDate>Tue, 16 Jun 2026 07:03:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48551616</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48551616</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48551616</guid></item><item><title><![CDATA[New comment by ljosifov in "How to setup a local coding agent on macOS"]]></title><description><![CDATA[
<p>For high Ram (unified), and relatively middling to lowish Tflops and bandwidth GB/s, usually MoEs are most hopeful. The current top-1 in the (iq, tok/s, @ context depth) ranks for me (M2 Max, 96gb) is DeepSeek-V4-Flash REAP25 <65gb gguf + ds4-server + pi agent. Not better than cloud API ofc, but useful enough to endure if I need to. E.g on a non-Internet 4h flight the battery (local llm draws 60w) held long enough. REAP supporting ds4 branch here<p><a href="https://github.com/ljubomirj/ds4/tree/reap-compact-support" rel="nofollow">https://github.com/ljubomirj/ds4/tree/reap-compact-support</a><p>DS4F dropping to unusable <10 tok/s only at 784K context (!!) makes a big difference.</p>
]]></description><pubDate>Sat, 13 Jun 2026 09:58:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=48515485</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48515485</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48515485</guid></item><item><title><![CDATA[New comment by ljosifov in "MiMo-v2.5-Pro-UltraSpeed: 1T model with 1000 tokens per second"]]></title><description><![CDATA[
<p>Yes, it's performant, and esp performant at non-trivial context depths. DeepSeek-V4 DS4 (and Flash - DS4F) drop tok/s speed much less than the rest. On my M2 Max it took context depths of 768K to drop tok/s to ~10 tok/s.<p><a href="https://x.com/ljupc0/status/2062457314414587996" rel="nofollow">https://x.com/ljupc0/status/2062457314414587996</a><p>Other local models I've checked drop to unusable speeds way sooner. Only other model with similarity favourable curve I've tried is nemotron-cascade-2-30b-a3b. But it's a small model, way dumber than DS4F.<p>Coding agents use cases have large context depths. The rate of decline is as important as the headline number.</p>
]]></description><pubDate>Tue, 09 Jun 2026 05:56:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48457071</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48457071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48457071</guid></item><item><title><![CDATA[New comment by ljosifov in "Quantum Information as Everything"]]></title><description><![CDATA[
<p>Thanks for the tear down. IDK anything about quantum (my knowledge there starts and ends with <a href="https://www.scottaaronson.com/democritus/lec9.html" rel="nofollow">https://www.scottaaronson.com/democritus/lec9.html</a>), but amused enough to follow in the background. See whether it ends crazy-bad or crazy-good. :-)<p>And just today now I saw this <a href="https://arxiv.org/abs/2606.07352" rel="nofollow">https://arxiv.org/abs/2606.07352</a>. What's your take on it?<p>Am old enough to have witnessed more than one wave of "impossible things" happen for real in my lifetime, so lets see. As long as the scientific method (evidence, publication, replication, testing, etc) is mostly followed am curious enough to check blog posts or interviews from time to time.</p>
]]></description><pubDate>Mon, 08 Jun 2026 13:40:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48445228</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48445228</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48445228</guid></item><item><title><![CDATA[Quantum Information as Everything]]></title><description><![CDATA[
<p>Article URL: <a href="https://vlatkovedral.substack.com/p/quantum-information-as-everything">https://vlatkovedral.substack.com/p/quantum-information-as-everything</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48437929">https://news.ycombinator.com/item?id=48437929</a></p>
<p>Points: 2</p>
<p># Comments: 3</p>
]]></description><pubDate>Sun, 07 Jun 2026 19:51:32 +0000</pubDate><link>https://vlatkovedral.substack.com/p/quantum-information-as-everything</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48437929</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48437929</guid></item><item><title><![CDATA[New comment by ljosifov in "Use boring languages with LLMs"]]></title><description><![CDATA[
<p>+1 for boring. Boring code is Solid Code, in the sense of "Writing Solid Code" - the old book by Steve Maguire.</p>
]]></description><pubDate>Wed, 27 May 2026 06:55:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48290651</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48290651</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48290651</guid></item><item><title><![CDATA[New comment by ljosifov in "A few words on DS4"]]></title><description><![CDATA[
<p>Thanks for the DS4, will give it a try. Was hoping maybe I can re-quantise shave few GB... MiniMax-M2.7 Unsloth's UD-IQ2_XXS is down to 65GB - it run albeit too slow to be usable to an agent at context depth. I'm curious DS4F with it being economical with the KV caches - if that translates into keeping up with context. Was hoping 80GB 2-bit quants maybe come down to 70GB... that would be more comfortable to run.</p>
]]></description><pubDate>Fri, 15 May 2026 20:57:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48153705</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48153705</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48153705</guid></item><item><title><![CDATA[New comment by ljosifov in "A few words on DS4"]]></title><description><![CDATA[
<p>On 96gb I can give up to about 88GB to the GPU with sysctl iogpu.wired_limit_mb=88000, without suffering any ill-effects. When pushed higher I tend to notice e.g. graphic driver errors, youtube web page not working, other semi-random glitches. So the ~80 GB of DS4-flash quants I could just about fit. Leaving some extra for the KV caches. Will try, I'm curious how's the DS4 degradation with context depth growth, how fast does tok/s drop. E.g. 2-bit lowest quant MiniMax-M2.6 runs, but starts low tok/s and degrades fast with context depth.<p>The biggest models I can comfortably run are about 1/2 the DS4F size - like gpt-oss-120b. Lately was toying with Ling-2.6-flash. Got the agents to adapt existing metal kernels in llama.cpp, and it did run (model <a href="https://huggingface.co/ljupco/Ling-2.6-flash-GGUF" rel="nofollow">https://huggingface.co/ljupco/Ling-2.6-flash-GGUF</a>, branch <a href="https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flash-r2" rel="nofollow">https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flas...</a>). It's 104B-A7B4, and for the M2 Max 7.4B active is about the most it can take while still producing 40 tok/s. And the hybrid arch allows for graceful degradation, still close to 30 tok/s at 64K context depth.<p>Too bad L2.6F while the best have, is not that much better in agentic benchmarks compared to my current incumbent local llm (nemotron-cascade-2). Got inspired by DS4 to start a l26f branch (WIP <a href="https://github.com/ljubomirj/l26f" rel="nofollow">https://github.com/ljubomirj/l26f</a>). :-) Try squeeze the most from L2.6F. There should be low hanging fruit in good integration of the agent and the inferencing engine. On input - considering the huge difference cached v.s. non-cached tokens. On output - considering that the NN gives us the complete logits set for all 200K+ tokens vocabulary.</p>
]]></description><pubDate>Fri, 15 May 2026 14:27:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=48149067</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48149067</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48149067</guid></item><item><title><![CDATA[New comment by ljosifov in "A few words on DS4"]]></title><description><![CDATA[
<p>Love this, even if can't use it atm (not got the h/w - only 96gb on M2 Max). I get it the general comp/public will find it unusable or worse. Reminds me of how home computers were - mere toys - before they became personal computers (PC). On my h/w the only passable combo for me atm is pi agent + llama.cpp + nemotron cascade-2 model: to 1M context, hybrid arch doesn't crash & burn 1/N^2 with context depths of 10K-50K-100K used by code agents. Was on a plane without Internet the other day. Brought a smile to my face that I could run pi agent (with llama.cpp serving), and it was just about usable at 40-30 tok/s. Afaik the usual API speeds are double that, 60-80 tok/s. Sensors showing using 60W when running inference. So battery probably would not last more than >3h. Model only 30B in size leaves plenty of space for KV-caches, and other programs - even at generous 8-bit quant. Only 3B active params at one time (with MoE A3B) is about the most that ageing M2 Max can carry it seems.</p>
]]></description><pubDate>Fri, 15 May 2026 09:21:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=48146381</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48146381</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48146381</guid></item><item><title><![CDATA[New comment by ljosifov in "As researchers age, they produce less disruptive work"]]></title><description><![CDATA[
<p>What we see and experience - it's all natural, it's the natural order. :-) When people claim something is un-natural, usually it's natural in that occurs in nature, only - they themselves find it objectionable. It's something they personally dislike, and would rather it not happen.</p>
]]></description><pubDate>Wed, 13 May 2026 14:02:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48122022</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48122022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48122022</guid></item><item><title><![CDATA[New comment by ljosifov in "DeepSeek 4 Flash local inference engine for Metal"]]></title><description><![CDATA[
<p>In the same boat with 7900xtx. 24GB vram, on paper decent performance, in reality most things don't run. Only llama.cpp is consistent that it can run most models, even if maybe not at top performance (afaik - lacking MTP, problems cache invalidation with hybrid models). At least with llama.cpp I know what runs. With various python-based inferencers, between their uv/venv, my venv, system envs/pythons/libs yadayada - I need an agent to get to the bottom of what's actually running. :-)  Yeah IK skill issue/user errors - but don't have seconds in the day left to spend them on that.<p>Even if not perfect, if you publish on GH or HF, some other agent can maybe start there and not from zero. I did this for Ling-2.6-flash (107B-A7B4 MoE) that's the biggest llm I can ran for practical use on the other h/w I got for local llms (M2 Max). Even if MTP is not working well, still improvement on the current llama.cpp that does not run Ling-2.6-flash at all. This - <a href="https://huggingface.co/inclusionAI/Ling-2.6-flash/discussions/8" rel="nofollow">https://huggingface.co/inclusionAI/Ling-2.6-flash/discussion...</a>. The 4-bit quants are at <a href="https://huggingface.co/ljupco/Ling-2.6-flash-GGUF" rel="nofollow">https://huggingface.co/ljupco/Ling-2.6-flash-GGUF</a>, the branch is at <a href="https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flash-r2" rel="nofollow">https://github.com/ljubomirj/llama.cpp/tree/LJ-Ling-2.6-flas...</a>.</p>
]]></description><pubDate>Fri, 08 May 2026 10:43:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48061265</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=48061265</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48061265</guid></item><item><title><![CDATA[New comment by ljosifov in "Qwen3.6-27B: Flagship-Level Coding in a 27B Dense Model"]]></title><description><![CDATA[
<p>~/llama.cpp$ build-.../bin/llama-batched-bench -m models/....gguf -npp 512,1024,2048,4096,8192,16384,32768 -ntg 128 -npl 1 -c 36000<p><pre><code>  On amd 7900xtx

  Qwen3.6-27B-Q4_K_M
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.743 |   689.35 |    4.605 |    27.80 |    5.348 |   119.68 |
  |  1024 |    128 |    1 |   1152 |    1.188 |   862.17 |    4.573 |    27.99 |    5.761 |   199.96 |
  |  2048 |    128 |    1 |   2176 |    2.566 |   798.09 |    4.602 |    27.81 |    7.168 |   303.57 |
  |  4096 |    128 |    1 |   4224 |    5.936 |   690.00 |    4.639 |    27.59 |   10.575 |   399.43 |
  |  8192 |    128 |    1 |   8320 |   15.034 |   544.90 |    4.729 |    27.06 |   19.763 |   420.98 |
  | 16384 |    128 |    1 |  16512 |   42.807 |   382.74 |    4.886 |    26.20 |   47.694 |   346.21 |
  | 32768 |    128 |    1 |  32896 |  137.377 |   238.53 |    5.188 |    24.67 |  142.566 |   230.74 |

  Qwen3.6-27B-IQ4_NL
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.535 |   957.45 |    3.715 |    34.45 |    4.250 |   150.59 |
  |  1024 |    128 |    1 |   1152 |    1.124 |   911.16 |    3.677 |    34.81 |    4.801 |   239.97 |
  |  2048 |    128 |    1 |   2176 |    2.447 |   836.89 |    3.698 |    34.62 |    6.145 |   354.13 |
  |  4096 |    128 |    1 |   4224 |    5.711 |   717.17 |    3.729 |    34.32 |    9.441 |   447.43 |
  |  8192 |    128 |    1 |   8320 |   14.615 |   560.52 |    3.821 |    33.50 |   18.436 |   451.30 |
  | 16384 |    128 |    1 |  16512 |   41.966 |   390.41 |    3.967 |    32.26 |   45.933 |   359.48 |
  | 32768 |    128 |    1 |  32896 |  135.789 |   241.32 |    4.253 |    30.09 |  140.042 |   234.90 |

  On mbp M2 Max

  Qwen3.6-27B-UD-Q8_K_XL
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    2.583 |   198.18 |   22.049 |     5.81 |   24.633 |    25.98 |
  |  1024 |    128 |    1 |   1152 |    8.321 |   123.06 |   22.364 |     5.72 |   30.685 |    37.54 |
  |  2048 |    128 |    1 |   2176 |   17.873 |   114.59 |   23.290 |     5.50 |   41.164 |    52.86 |
  |  4096 |    128 |    1 |   4224 |   41.967 |    97.60 |   23.624 |     5.42 |   65.591 |    64.40 |
  |  8192 |    128 |    1 |   8320 |   68.722 |   119.20 |   21.077 |     6.07 |   89.799 |    92.65 |
  | 16384 |    128 |    1 |  16512 |  142.184 |   115.23 |   22.026 |     5.81 |  164.210 |   100.55 |
  | 32768 |    128 |    1 |  32896 |  339.778 |    96.44 |   24.465 |     5.23 |  364.243 |    90.31 |

  Compared to similar prior models

  On amd 7900xtx

  Qwen3.6-35B-A3B-UD-Q4_K_S
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.203 |  2517.60 |    1.482 |    86.35 |    1.686 |   379.67 |
  |  1024 |    128 |    1 |   1152 |    0.427 |  2399.22 |    1.471 |    87.04 |    1.897 |   607.15 |
  |  2048 |    128 |    1 |   2176 |    0.946 |  2165.23 |    1.478 |    86.59 |    2.424 |   897.67 |
  |  4096 |    128 |    1 |   4224 |    2.253 |  1818.33 |    1.502 |    85.22 |    3.755 |  1125.01 |
  |  8192 |    128 |    1 |   8320 |    5.849 |  1400.51 |    1.525 |    83.91 |    7.375 |  1128.17 |
  | 16384 |    128 |    1 |  16512 |   17.115 |   957.27 |    1.589 |    80.55 |   18.705 |   882.78 |
  | 32768 |    128 |    1 |  32896 |   56.008 |   585.06 |    1.704 |    75.10 |   57.712 |   570.00 |

  Qwen3.6-35B-A3B-UD-IQ4_XS
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.204 |  2508.94 |    1.313 |    97.46 |    1.517 |   421.78 |
  |  1024 |    128 |    1 |   1152 |    0.423 |  2418.64 |    1.296 |    98.80 |    1.719 |   670.18 |
  |  2048 |    128 |    1 |   2176 |    0.946 |  2164.61 |    1.323 |    96.78 |    2.269 |   959.13 |
  |  4096 |    128 |    1 |   4224 |    2.235 |  1832.54 |    1.326 |    96.52 |    3.561 |  1186.06 |
  |  8192 |    128 |    1 |   8320 |    5.845 |  1401.44 |    1.352 |    94.70 |    7.197 |  1156.03 |
  | 16384 |    128 |    1 |  16512 |   17.096 |   958.38 |    1.417 |    90.33 |   18.513 |   891.94 |
  | 32768 |    128 |    1 |  32896 |   56.013 |   585.00 |    1.530 |    83.66 |   57.543 |   571.67 |

  Carnice-Qwen3.6-MoE-35B-A3B-Q4_K_S
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.205 |  2499.78 |    1.483 |    86.31 |    1.688 |   379.16 |
  |  1024 |    128 |    1 |   1152 |    0.434 |  2361.36 |    1.448 |    88.40 |    1.882 |   612.25 |
  |  2048 |    128 |    1 |   2176 |    0.947 |  2161.87 |    1.478 |    86.62 |    2.425 |   897.27 |
  |  4096 |    128 |    1 |   4224 |    2.259 |  1813.00 |    1.472 |    86.94 |    3.732 |  1131.98 |
  |  8192 |    128 |    1 |   8320 |    5.892 |  1390.42 |    1.505 |    85.06 |    7.397 |  1124.85 |
  | 16384 |    128 |    1 |  16512 |   17.397 |   941.77 |    1.568 |    81.61 |   18.965 |   870.63 |
  | 32768 |    128 |    1 |  32896 |   56.296 |   582.07 |    1.690 |    75.74 |   57.986 |   567.31 |

  Nemotron-Cascade-2-30B-A3B-IQ4_XS
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.195 |  2622.33 |    0.972 |   131.69 |    1.167 |   548.30 |
  |  1024 |    128 |    1 |   1152 |    0.407 |  2514.76 |    0.934 |   137.10 |    1.341 |   859.16 |
  |  2048 |    128 |    1 |   2176 |    0.854 |  2396.99 |    0.942 |   135.90 |    1.796 |  1211.42 |
  |  4096 |    128 |    1 |   4224 |    1.895 |  2161.89 |    0.953 |   134.36 |    2.847 |  1483.50 |
  |  8192 |    128 |    1 |   8320 |    4.593 |  1783.70 |    0.967 |   132.43 |    5.559 |  1496.60 |
  | 16384 |    128 |    1 |  16512 |   12.213 |  1341.53 |    0.996 |   128.56 |   13.209 |  1250.10 |
  | 32768 |    128 |    1 |  32896 |   36.998 |   885.66 |    1.059 |   120.89 |   38.057 |   864.39 |

  On mbp M2 Max

  Qwen3.6-35B-A3B-UD-Q6_K_XL
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.540 |   947.31 |    2.489 |    51.42 |    3.030 |   211.22 |
  |  1024 |    128 |    1 |   1152 |    0.951 |  1077.21 |    3.237 |    39.54 |    4.188 |   275.10 |
  |  2048 |    128 |    1 |   2176 |    2.994 |   684.10 |    3.139 |    40.77 |    6.133 |   354.80 |
  |  4096 |    128 |    1 |   4224 |    6.245 |   655.85 |    3.210 |    39.88 |    9.455 |   446.75 |
  |  8192 |    128 |    1 |   8320 |   12.411 |   660.08 |    3.284 |    38.98 |   15.694 |   530.13 |
  | 16384 |    128 |    1 |  16512 |   28.321 |   578.51 |    3.584 |    35.71 |   31.905 |   517.53 |
  | 32768 |    128 |    1 |  32896 |   65.725 |   498.56 |    4.029 |    31.77 |   69.754 |   471.60 |

  Nemotron-Cascade-2-30B-A3B-Q8_0
  |    PP |     TG |    B |   N_KV |   T_PP s | S_PP t/s |   T_TG s | S_TG t/s |      T s |    S t/s |
  |-------|--------|------|--------|----------|----------|----------|----------|----------|----------|
  |   512 |    128 |    1 |    640 |    0.528 |   969.13 |    2.036 |    62.87 |    2.564 |   249.59 |
  |  1024 |    128 |    1 |   1152 |    1.079 |   948.84 |    3.201 |    39.99 |    4.280 |   269.15 |
  |  2048 |    128 |    1 |   2176 |    3.390 |   604.10 |    2.952 |    43.36 |    6.342 |   343.11 |
  |  4096 |    128 |    1 |   4224 |    6.756 |   606.28 |    2.991 |    42.79 |    9.747 |   433.35 |
  |  8192 |    128 |    1 |   8320 |   13.647 |   600.30 |    3.061 |    41.81 |   16.708 |   497.97 |
  | 16384 |    128 |    1 |  16512 |   29.491 |   555.56 |    3.414 |    37.50 |   32.905 |   501.81 |
  | 32768 |    128 |    1 |  32896 |   65.867 |   497.49 |    3.663 |    34.95 |   69.530 |   473.12 |
</code></pre>
Dang I saw some lowish numbers there for Spaks (and Strix). As I was eyeing a spark to get some CUDA exposure... :-O</p>
]]></description><pubDate>Sun, 26 Apr 2026 16:05:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47911366</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=47911366</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47911366</guid></item><item><title><![CDATA[New comment by ljosifov in "Hammerspoon"]]></title><description><![CDATA[
<p>Glad to see other people using it. Saved my life, was going crazy click-clicking to nab the right window. Now Cmd-1..9 brings to focus a window of my chosen application. (Chrome) In case it helps someone else, myself and Codex iterating over time <a href="https://github.com/ljubomirj/dotfiles/blob/main/.hammerspoon/init.lua" rel="nofollow">https://github.com/ljubomirj/dotfiles/blob/main/.hammerspoon...</a>. 
Cmd-1..9 switches over focuses to a particular window, Cmd-0 presents an (ugly; but suffices) dialog box to select the window with arrows (of the App of interest - Chrome for me atm) to switch to. But more important - to see what window what Window name is recalled by the particular Cmd-1..9 shortcut. Option-arrows shuffle window-to-key ordering. I right-click-Name Window my windows. Think back now - on restart they may even be preserved?? Don't recall re/naming them manually recently. (possible I've forgotten though)</p>
]]></description><pubDate>Sat, 14 Mar 2026 12:17:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47375912</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=47375912</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47375912</guid></item><item><title><![CDATA[New comment by ljosifov in "How to run Qwen 3.5 locally"]]></title><description><![CDATA[
<p>Say more please if you can. How/why is ik_llama.cpp faster then mainline, for the 27B dense? I'd like to be able to run 27B dense faster on a 24GB vram gpu, and also on an M2 max.</p>
]]></description><pubDate>Sun, 08 Mar 2026 11:38:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47296534</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=47296534</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47296534</guid></item><item><title><![CDATA[New comment by ljosifov in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"]]></title><description><![CDATA[
<p>Everyone should do the calculation for themselves. I too pay for couple of subs. But I'm noticing having an agent work for me 24/7 changes the calculation somewhat. Often not taken into account: the price of input tokens. To produce 1K of code for me, the agent may need to churn through 1M of tokens of codebase. IDK if that will be cached by the API provider or not, but that makes x5-7 times price difference. OK discussion today about that and more <a href="https://x.com/alexocheema/status/2020626466522685499" rel="nofollow">https://x.com/alexocheema/status/2020626466522685499</a></p>
]]></description><pubDate>Wed, 11 Feb 2026 15:54:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=46976494</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46976494</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46976494</guid></item><item><title><![CDATA[How to Run Local LLMs with Claude Code and OpenAI Codex]]></title><description><![CDATA[
<p>Article URL: <a href="https://unsloth.ai/docs/basics/claude-codex">https://unsloth.ai/docs/basics/claude-codex</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46816996">https://news.ycombinator.com/item?id=46816996</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 29 Jan 2026 21:38:08 +0000</pubDate><link>https://unsloth.ai/docs/basics/claude-codex</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46816996</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46816996</guid></item><item><title><![CDATA[New comment by ljosifov in "Ask HN: Share your personal website"]]></title><description><![CDATA[
<p><a href="https://ljubomirj.github.io" rel="nofollow">https://ljubomirj.github.io</a> small personal ~/public_html</p>
]]></description><pubDate>Wed, 14 Jan 2026 19:48:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=46621849</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46621849</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46621849</guid></item><item><title><![CDATA[When competition leads to human values by Beren Millidge [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=ua67aXBP76k">https://www.youtube.com/watch?v=ua67aXBP76k</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46613277">https://news.ycombinator.com/item?id=46613277</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 14 Jan 2026 07:16:35 +0000</pubDate><link>https://www.youtube.com/watch?v=ua67aXBP76k</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46613277</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46613277</guid></item><item><title><![CDATA[New comment by ljosifov in "I rebooted my social life"]]></title><description><![CDATA[
<p>+1. For every one like the author of the blog post, it's likely to be another one in the opposite direction. But they will be unlikely to write a post about that.  I too found weighting 'spend time with human persons v.s. with my own thoughts, or programming and writing, or reading a paper or a post, or listening to a podcast while walking in nature' lately come down on the side away from humans. So far - it's been way more interesting. When/if that changes and becomes boring - will think what next and change.</p>
]]></description><pubDate>Thu, 01 Jan 2026 16:48:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=46455534</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46455534</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46455534</guid></item><item><title><![CDATA[A new way to extract detailed transcripts from Claude Code]]></title><description><![CDATA[
<p>Article URL: <a href="https://simonwillison.net/2025/Dec/25/claude-code-transcripts/">https://simonwillison.net/2025/Dec/25/claude-code-transcripts/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46390913">https://news.ycombinator.com/item?id=46390913</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 26 Dec 2025 10:36:00 +0000</pubDate><link>https://simonwillison.net/2025/Dec/25/claude-code-transcripts/</link><dc:creator>ljosifov</dc:creator><comments>https://news.ycombinator.com/item?id=46390913</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46390913</guid></item></channel></rss>