<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: brucehoult</title><link>https://news.ycombinator.com/user?id=brucehoult</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 03:29:35 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=brucehoult" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by brucehoult in "The MilkV Jupiter 2/SpacemiT K3 (RISC-V vector compute)"]]></title><description><![CDATA[
<p>I don't know what kind of code sysbench is using, but I get far better with a very simple `memcpy()` loop:<p>See <a href="https://news.ycombinator.com/item?id=48523343">https://news.ycombinator.com/item?id=48523343</a></p>
]]></description><pubDate>Sun, 14 Jun 2026 01:42:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=48523372</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48523372</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48523372</guid></item><item><title><![CDATA[New comment by brucehoult in "The MilkV Jupiter 2/SpacemiT K3 (RISC-V vector compute)"]]></title><description><![CDATA[
<p>I don't know how they got their 3 GB/s memory bandwidth.<p>My own testing shows 5347.7 MB/s on a 64 MiB to 64 MiB `memcpy()` using a basic 7 instruction RVV copy loop an X100 core. That's a total 10.7 GB/s memory bandwidth.<p>The A100 "AI" cores do better, with 13225.9 MB/s on the 64 MiB to 64 MiB copy, for a total 26.5 GB/s memory bandwidth.<p>Both core types do a 25 GB/s `memcpy()` total 50 GB/s in cache.<p>On X100 cores:<p><pre><code>    bruce@k3:~$ ./test_memcpy 
    Byte size :              ns     Speed
            0 :             6.3       0.0 MB/s
            1 :             6.5     147.6 MB/s
            2 :             6.5     295.7 MB/s
            4 :             6.3     602.7 MB/s
            8 :             6.4    1193.6 MB/s
           16 :             6.4    2402.1 MB/s
           32 :             6.4    4796.1 MB/s
           64 :             7.1    8558.1 MB/s
          128 :             7.1   17313.7 MB/s
          256 :            12.6   19444.2 MB/s
          512 :            20.8   23424.8 MB/s
         1024 :            39.8   24563.3 MB/s
         2048 :            80.4   24284.2 MB/s
         4096 :           158.0   24722.1 MB/s
         8192 :           312.5   24997.6 MB/s
        16384 :           609.6   25630.4 MB/s
        32768 :          1287.0   24281.6 MB/s
        65536 :          2761.8   22630.4 MB/s
       131072 :          6463.0   19340.9 MB/s
       262144 :         12897.6   19383.5 MB/s
       524288 :         25779.1   19395.6 MB/s
      1048576 :         52356.4   19099.9 MB/s
      2097152 :        111030.3   18013.1 MB/s
      4194304 :        569240.2    7026.9 MB/s
      8388608 :       1468409.2    5448.1 MB/s
     16777216 :       2905474.6    5506.8 MB/s
     33554432 :       5769324.2    5546.6 MB/s
     67108864 :      11967851.6    5347.7 MB/s
</code></pre>
And on A100:<p><pre><code>    bruce@k3:~$ ai ./test_memcpy 
    Byte size :              ns     Speed
            0 :            21.0       0.0 MB/s
            1 :            82.7      11.5 MB/s
            2 :            82.9      23.0 MB/s
            4 :            82.9      46.0 MB/s
            8 :            82.8      92.2 MB/s
           16 :            82.9     184.2 MB/s
           32 :            82.9     368.2 MB/s
           64 :            87.2     699.7 MB/s
          128 :            87.1    1401.7 MB/s
          256 :            87.2    2799.1 MB/s
          512 :            77.2    6326.1 MB/s
         1024 :            82.9   11784.2 MB/s
         2048 :            98.4   19855.9 MB/s
         4096 :           193.5   20191.4 MB/s
         8192 :           313.5   24916.8 MB/s
        16384 :           627.0   24919.0 MB/s
        32768 :          1254.2   24915.7 MB/s
        65536 :          2508.0   24920.1 MB/s
       131072 :          5017.3   24913.6 MB/s
       262144 :         10036.5   24909.0 MB/s
       524288 :         20075.0   24906.6 MB/s
      1048576 :         62556.9   15985.4 MB/s
      2097152 :        152324.5   13129.9 MB/s
      4194304 :        303466.3   13181.0 MB/s
      8388608 :        610230.0   13109.8 MB/s
     16777216 :       1186394.5   13486.2 MB/s
     33554432 :       2317591.8   13807.4 MB/s
     67108864 :       4838988.3   13225.9 MB/s
</code></pre>
That's using the following `memcpy()` in both cases.<p><pre><code>    .globl memcpy
    memcpy:
            mv      a3, a0
    0:      vsetvli a4, a2, e8, m4, ta, ma
            vle8.v  v0, (a1)
            sub     a2, a2, a4
            add     a1, a1, a4
            vse8.v  v0, (a3)
            add     a3, a3, a4
            bnez    a2, 0b
            ret</code></pre></p>
]]></description><pubDate>Sun, 14 Jun 2026 01:37:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48523343</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48523343</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48523343</guid></item><item><title><![CDATA[New comment by brucehoult in "The MilkV Jupiter 2/SpacemiT K3 (RISC-V vector compute)"]]></title><description><![CDATA[
<p>See:<p><a href="https://github.com/brucehoult/k3_ai" rel="nofollow">https://github.com/brucehoult/k3_ai</a><p>Or my longer top level comment.<p>Unlike a GPU or NPU, you can just run all your normal RISC-V Linux programs on the AI cores. Bash, gcc, emacs, nodejs ... whatever you want. It's an extra 40% of scalar processing power, for free.<p><pre><code>    Build Linux kernel 7503345ac5f5
    
    14m25.56s SpacemiT K3 8 X100 cores plus 8x A100 cores distcc
    18m3.871s SpacemiT K3 8 X100 cores @2.4 GHz
    19m12.787s i9-13900HX, 24C/32T @5.4 GHz, riscv64/ubuntu docker
    39m23.187s SpacemiT K3 8 A100 cores @2.0 GHz
    42m12.414s Milk-V Megrez 4 P550 cores @1.8 GHz
    67m35.189s VisionFive 2, 4 U74 cores @1.5 GHz
    70m57.001s LicheePi 3A, 8 X60 cores @1.6 GHz
    88m04.185s LicheePi 4A, 4 C910 cores @1.85 GHz
</code></pre>
The A100 cores, all by themselves, give more normal processing power e.g. `gcc` than any previous RISC-V SBC except the $2500 64 core Milk-V Pioneer.</p>
]]></description><pubDate>Fri, 12 Jun 2026 02:02:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48498967</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48498967</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48498967</guid></item><item><title><![CDATA[New comment by brucehoult in "The MilkV Jupiter 2/SpacemiT K3 (RISC-V vector compute)"]]></title><description><![CDATA[
<p>> That documentation also seems to be the original source of the marketing claims that the K3 provides 60 TOPS of AI compute and can run 30B models at over 10 tokens/s. Well, sort of– as another spoiler, I can share that I hit a hard cap at an effective 3B<p>I'm running Qwen3-Coder-30B-A3B-Instruct-Q5_K_M.gguf on mine ... picked kind of at random from a web page as I'm a complete n00b at running local LLMs, have never used anything other than ChatGPT or (mostly) Grok.<p>I get 6-7 tok/s, which is slow for throwing around program code, but fine for general knowledge queries. It's a reasonable speed to read along as it outputs. Simple queries start to give output in about 2 seconds.<p>e.g. I dunno ..<p><pre><code>    > What does auipc do?
    
    `auipc` stands for "Add Upper Immediate to PC" and is a RISC-V instruction that:
    
    ## What it does:
    - Takes a 20-bit immediate value and adds it to the program counter (PC)
    - The immediate is sign-extended and shifted left by 12 bits
    - The result is stored in a destination register
    
    ## Syntax:
    ```
    auipc rd, imm
    ```
    
    ## Key characteristics:
    - **PC-relative addressing**: Uses the current program counter as a base
    - **12-bit offset**: The immediate is shifted left by 12 bits, so it's effectively a 32-bit value
    - **No memory access**: Only performs arithmetic, no memory operations
    - **Used for**: Creating 32-bit immediate values, setting up base addresses for other instructions
    
    ## Example:
    ```assembly
    auipc x5, 0x1000    # x5 = PC + (0x1000 << 12)
    ```
    
    ## Common uses:
    - Setting up base addresses for load/store instructions
    - Creating 32-bit immediate values when the standard `li` instruction can't handle the full range
    - Jump table setup
    - Address calculation for large data structures
    
    The instruction is particularly useful in RISC-V because it allows for efficient 32-bit address generation while maintaining the 16-bit instruction size constraint.
    
    [ Prompt: 20.6 t/s | Generation: 6.5 t/s ]
</code></pre>
Output started in about 2 seconds.<p><pre><code>    > what is the airspeed velocity of an unladen swallow?
    
    The classic answer to this question comes from Monty Python's "Holy Grail" - when asked about the airspeed velocity of an unladen swallow, the character replies "What do you mean? An African or European swallow?"
    
    But in reality, the airspeed velocity depends on the species:
    
    **European Swallow (Hirundo rustica):**
    - Typical flight speed: 15-20 mph (24-32 km/h)
    - Maximum speed: around 25 mph (40 km/h)
    
    **African Swallow (Hirundo sp.):**
    - Similar speeds to European swallows, though some species can reach up to 30 mph (48 km/h)
    
    The "unladen" part refers to the swallow not carrying anything, which would slow it down. Swallows are quite agile birds that can fly at impressive speeds while maneuvering through the air.
    
    So while the Python reference is the more famous answer, the real-world airspeed of an unladen swallow is roughly 15-25 mph, depending on the specific species and conditions.
    
    [ Prompt: 25.5 t/s | Generation: 6.6 t/s ]
</code></pre>
Again, output starts in about two seconds.<p>This is offline, no internet, and uses 14W while running all 8 A100 "AI" cores at max.<p>Is this useful?  I mean, for something, right?<p>I asked it to review <a href="https://github.com/brucehoult/trv" rel="nofollow">https://github.com/brucehoult/trv</a> which is a total of 320 lines of code (I used `/read` on a tar file containing the two code files). It thought for 22 minutes before output started and then spent 8 minutes outputting comments at just over 6.5 tok/s.<p>Nothing there to scare Claude, but 30 minutes total is still faster than asking a colleague for a code review, and probably more comprehensive too. And it did it on about 0.25 cents of electricity.<p>> Turns out getting a thread onto the A100 cores requires a two-step handshake:
> 
> write the thread’s TID to /proc/set_ai_thread (a kernel interface that unlocks scheduling on cores 8–15 for that specific thread)
> then call sched_setaffinity to pin it.<p>If you want to just run arbitrary Linux programs on the A100 cores, I wrote a small assembly language launcher which does the above PID writing and then EXECs the thing you really want.<p><pre><code>    # just run a single program on the A100 cores
    ai as hello.s -o hello.o
    
    # same thing but maybe 1ms faster
    aix /usr/bin/as hello.s -o hello.o
    
    # run a whole build. All processes started by `make` will run on the A100 cores.
    ai make -j8 test
    
    # start a shell on the A100 cores. All programs run from it will be run only on the A100 cores
    ai bash
</code></pre>
<a href="https://github.com/brucehoult/k3_ai" rel="nofollow">https://github.com/brucehoult/k3_ai</a><p>As normal CPUs the eight 2-wide in-order A100 cores (like an A53 or A55 or Pentium or PPC603) add about 40% normal scalar processing power to the eight X100 cores.<p>That's better than Hyperthreading and well worth using for some additional processing power. Just kick off a background build, or CI or something there while you do something else on the X100 cores. If you ignore the special "AI" matrix processing extension they are just perfectly normal RISC-V RVA23 cores as far as user code is concerned — and in fact significantly faster than the previous generation K1 chip.<p>A Linux kernel build on just the A100 "AI" cores is faster than on any previous RISC-V SBC under $1000, including the HiFive Premier P550 or Milk-V Megrez. It's several times faster than the VisionFive 2 or Milk-V Jupiter / BPI-F3.<p><pre><code>    Build Linux kernel 7503345ac5f5
    
    14m25.56s SpacemiT K3 8 X100 cores plus 8x A100 cores distcc
    18m3.871s SpacemiT K3 8 X100 cores @2.4 GHz
    19m12.787s i9-13900HX, 24C/32T @5.4 GHz, riscv64/ubuntu docker
    39m23.187s SpacemiT K3 8 A100 cores @2.0 GHz
    42m12.414s Milk-V Megrez 4 P550 cores @1.8 GHz
    67m35.189s VisionFive 2, 4 U74 cores @1.5 GHz
    70m57.001s LicheePi 3A, 8 X60 cores @1.6 GHz
    88m04.185s LicheePi 4A, 4 C910 cores @1.85 GHz
</code></pre>
The K3 is also faster than using QEMU/Docker on my 24 core i9-13900 laptop, and while using 25W instead of 200W.<p>Note the fastest time using a distccd on the X100 cores and another distccd on the A100 cores. This adds a lot of overhead in preprocessing and communication over the network (loopback, but still). But it still gives a pretty nice boost. But running independent tasks on each set of cores is more efficient. Or teaching `gmake` or `ninja` to distribute to two pools of cores using my `ai` launcher would be even better ...</p>
]]></description><pubDate>Fri, 12 Jun 2026 01:54:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=48498901</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48498901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48498901</guid></item><item><title><![CDATA[New comment by brucehoult in "News about Raspberry Pi 6 and Microcontroller Development"]]></title><description><![CDATA[
<p>Yeah, in 20 years instead of 100 years.<p>I'm going to replace it before then.</p>
]]></description><pubDate>Fri, 29 May 2026 01:44:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48317951</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48317951</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48317951</guid></item><item><title><![CDATA[New comment by brucehoult in "Riscrithm – An intuitive RISC-V assembler and optimizer coded in Go"]]></title><description><![CDATA[
<p>> instead of a PDP 11 as the virtual machine, why not risc V<p>What would that look like, in your view? I can't see them being significantly different, as computation models go. And I'm familiar with both.<p>If you're thinking of ++ and --, those were introduced in the B language before the first PDP-11 was even made, plus of course the PDP-11 only offers pre-decrement and post-increment while C offers all four combinations.</p>
]]></description><pubDate>Wed, 27 May 2026 00:41:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=48287995</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48287995</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48287995</guid></item><item><title><![CDATA[New comment by brucehoult in "Initial Benchmarks of the SpacemiT K3 RVA23 RISC-V CPU with the K3 Pico-ITX"]]></title><description><![CDATA[
<p>They've finally stopped comparing RISC-V boards to Pi 3!<p>But why don't they include Pi 3 and Pi 4 in the charts *anyway*, as well as come more appropriate x86 machines such as Core2Duo, i7-860 (or other Nehalem), 2nd-5th gen i5 etc?<p>It's not telling anyone anything they didn't know that a K3 isn't going to compare to an Apple M5 or Core Ultra 9 or whatever.<p>I've got as 32 GB PicoITX K3 and I've been comparing it to a "Late 2012" Mac Mini with i7-3720QM running Ubuntu 24.04 and the Mac is 20% or 30% faster single-core, but of course on some things the K3 wins from having lots of cores.<p>It's starting to be a race at least.</p>
]]></description><pubDate>Thu, 21 May 2026 10:04:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=48220210</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48220210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48220210</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V and Floating-Point"]]></title><description><![CDATA[
<p>It depends on what you call "high performance".<p>I have in my hands one of the new SpacemiT K3 machines. It arrived today. I'm comparing it to several other things, and finding that it is pretty comparable to a "late 2012" Mac Mini with a i7-3720QM with base 2.6 GHz turbo 3.6 GHz running Ubuntu 24.04. They are quite close in feel for general use, web browsing, code editing, watching YouTube etc. The Mac is a little faster on many things, a LOT slower on others (anything that can use 8 cores, obviously).<p>You might say that's not "high performance" but we thought it was pretty good a dozen years ago.<p>The previous SpacemiT K1 chip two years ago was more like one of the last Pentium IIIs or PowerpC G4s, except with a lot more cores.<p>SpacemiT have a next generation K5 coming out, they say, at the end of the year. Tenstorrent have their new Ascalon-X core comparable to Apple's late 2020 M1 — and designed by the same guy who designed the M1. They've taped out a chip using that and say they'll be selling a dev board in Q2 or Q3. For now the first version is using an old chip process and it will be running at half the clock speed of the M1, but that's still going to be a very decent machine.<p>The HiFive Unleashed was of course 8 years ago. Since then there have been the HiFive Unmatched (roughly like Cortex A55) and the HiFive Premier P550 (a bit better than Cortex A72, other than no SIMD).</p>
]]></description><pubDate>Wed, 20 May 2026 13:11:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=48207111</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48207111</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48207111</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V and Floating-Point"]]></title><description><![CDATA[
<p>If you want to get the absolute most out of a specific CPU that is in your hands then you of course have to refer to the documentation for that specific CPU.<p>That process doesn't depend on whether it's an x86 or an Arm or a RISC-V.<p>That's why x86 people refer to the HUGE document maintained by Agner Fog.<p>If you want your code to run well on all standards-compliant implementations then you write according to the ISA documentation, in this case RVA23. Or ARMv9-A. Or x86_64 v3.</p>
]]></description><pubDate>Wed, 20 May 2026 12:58:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206942</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48206942</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206942</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V and Floating-Point"]]></title><description><![CDATA[
<p>I have a machine in my hot little hands. It arrived today.<p><a href="https://x.com/BruceHoult/status/2056911834737975635/photo/1" rel="nofollow">https://x.com/BruceHoult/status/2056911834737975635/photo/1</a><p>I've already posted on github my first project written on and for it today:<p><a href="https://github.com/brucehoult/k3_ai" rel="nofollow">https://github.com/brucehoult/k3_ai</a><p>Sipeed have posted photos four days ago of the first batch of customer orders going out:<p><a href="https://x.com/SipeedIO/status/2055549071931404291" rel="nofollow">https://x.com/SipeedIO/status/2055549071931404291</a><p>> the average bystander probably isn't used to importing computer parts directly from China, either.<p>It won't take long for them to be available on amazon, just as D1 and JH7110 and K1 boards are now. e.g.<p><a href="https://www.amazon.com/Orange-Pi-RV-Frequency-Development/dp/B0F2HKSKSK" rel="nofollow">https://www.amazon.com/Orange-Pi-RV-Frequency-Development/dp...</a></p>
]]></description><pubDate>Wed, 20 May 2026 12:53:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48206886</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48206886</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48206886</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V and Floating-Point"]]></title><description><![CDATA[
<p>The average bystander doesn't have to care, just buy a machine implementing the RVA23 profile (standard set of extensions) and be happy.<p>If you're building your own embedded hardware then you determine what your needs actually are e.g. do you need double precision? half precision? vector?. Then you choose a chip implementing that. Then you copy the ISA string from your chip's documentation to the `-march=` argument for GCC/Clang and be happy.<p>It's not hard and you don't have to think about it unless you very specifically want to.</p>
]]></description><pubDate>Tue, 19 May 2026 02:43:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=48188596</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48188596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48188596</guid></item><item><title><![CDATA[New comment by brucehoult in "Hosting a website on an 8-bit microcontroller"]]></title><description><![CDATA[
<p>AVR is vastly better to program than 8 bit PIC — either by hand or by compiler from C — but some people still insist on using those PICs for simple things.<p>The "PIC32" name was originally used for MIPS CPUs but more recently ARM ones and PIC32A is an extended dsPIC (16 bit).<p>There is also now PIC64 which is currently a couple of different RISC-V implementations, one based on quad core SiFive U54 from 2018 (same as PolarFire SoC FPGAS), and higher performance (and rad-tolerant in some versions) octa-core SiFive X280 with vector processing.  Microchip have I think also indicated there will be future Arm-based 64 bit PICs.</p>
]]></description><pubDate>Sun, 17 May 2026 09:24:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48167335</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48167335</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48167335</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V Router"]]></title><description><![CDATA[
<p>I'm not sure why anyone would care what chip is in a router that should just sit there doing its job and you're not going to write or run other software on?<p>Sure it's kind of nice to know the car media player thing I've had for a couple of years has a RISC-V D1s/F133 chip in it, but I bought it because it receives CarPlay (and Android Auto) and transmits audio on FM (actually the best quality one I've had, and I've had a few) and cost $30, not because it's RISC-V.<p>And I'm eagerly awaiting a pico-ITX SpacemiT K3 box arriving in the next week or so.<p>But a router? Why do I care, past price and functionality?</p>
]]></description><pubDate>Sat, 16 May 2026 01:31:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48155974</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48155974</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48155974</guid></item><item><title><![CDATA[New comment by brucehoult in "RISC-V Router"]]></title><description><![CDATA[
<p>> it’s not even clear if it can saturate a gigabit<p>If that's the case then it's not the CPU's fault. I can't open the linked site but assuming it's really the same as a BPI-F3 i.e. a SpacemiT K1 chip, that can do 2.8 GB/sec on large RAM to RAM memcpy using a CPU core i.e. 44 Gbps total, 22 Gbps each read and write. Plus I assume it's got DMA so no need to involve the CPU anyway.<p>Here is a test I ran in April 2025 on a Sipeed LicheePi 3A same chip).<p><a href="https://hoult.org/K1_memcpy.txt" rel="nofollow">https://hoult.org/K1_memcpy.txt</a><p>> RISC-V is quite wimpy this far<p>The new K3 chip from the same manufacturer does 8.7 GB/s RAM to RAM memcpy using a dual issue in-order A100 ("AI") core, just over 3x faster.<p>Sure this pales in comparison to recent Apple / Intel / AMD but it's a lot faster than home networking.</p>
]]></description><pubDate>Thu, 14 May 2026 22:44:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=48142256</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48142256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48142256</guid></item><item><title><![CDATA[New comment by brucehoult in "EEVblog: The 555 Timer is 55 years old [video]"]]></title><description><![CDATA[
<p>The size of a register is not the largest value you can conveniently count on a computer. You can use multiple registers.<p>Old computers often had a "carry flag" specifically to make this easier e.g.on Arm:<p><pre><code>    add r0,r0,#1
    adc r1,r1,#0
</code></pre>
But even on RISC-V, often criticised for not having a carry flag, it's not hard:<p><pre><code>    addi  a0,a0,1
    sltiu t1,a0,1 # set to 1 if a0 wrapped back to 0
    add   a1,a1,t1</code></pre></p>
]]></description><pubDate>Fri, 08 May 2026 02:59:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48057979</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48057979</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48057979</guid></item><item><title><![CDATA[New comment by brucehoult in "EEVblog: The 555 Timer is 55 years old [video]"]]></title><description><![CDATA[
<p>That 10c microcontroller has 15 32 bit registers, allowing you to make up to a 480 bit counter. That ought to be enough until long after the heat death of the universe.<p>It also has 2k (16384 bits) of SRAM, allowing even larger counters.<p>It runs off 2.8V - 5.5V DC, so supplying power is pretty trivial. Doesn't need a crystal, though of course adding one will improve the timing accuracy.</p>
]]></description><pubDate>Thu, 07 May 2026 03:02:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48044952</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48044952</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48044952</guid></item><item><title><![CDATA[New comment by brucehoult in "EEVblog: The 555 Timer is 55 years old [video]"]]></title><description><![CDATA[
<p>Trivial with a 10c microcontroller ...</p>
]]></description><pubDate>Wed, 06 May 2026 01:04:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48030888</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48030888</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48030888</guid></item><item><title><![CDATA[New comment by brucehoult in "EEVblog: The 555 Timer is 55 years old [video]"]]></title><description><![CDATA[
<p>I find it much easier to write a ten line program for an 8 pin CH32V003 (or ATTiny85 in past times) to do exactly the timing or SDC comparisons I want than to figure out the circuit and component values for a 555 or op-amp.<p>For that matter, a 16 pin CH32V003 can emulate a vast array of 7400 series devices as long as you don't need ns timing — no problem for µs. It's also cheaper.</p>
]]></description><pubDate>Wed, 06 May 2026 01:03:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48030873</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=48030873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48030873</guid></item><item><title><![CDATA[New comment by brucehoult in "Banana Pi Announces RISC-V Based BPI‑SM10 Developer Kit and K3 Pico‑ITX AI SBC"]]></title><description><![CDATA[
<p>Not THAT much of watershed. I didn't even see a lot of stuff about the equivalent first ARMv9 SBCs (and first with SVE almost TEN YEARS after the spec was published), the Radxa Orion O6 a year ago and Orange Pi 6 Plus half a year ago (same chip).<p>Also, none of us actually HAVE them yet. Sure, I've been using a pre-production board at SpacemiT via ssh to China since mid January, but it's still probably two weeks until I'll have one in front of me and I can browse the web and watch YouTube on it etc.<p>All the things we could do via ssh were published three months ago. LivingLinux for example has a whole series of videos on YouTube.<p><a href="https://www.youtube.com/playlist?list=PLYxFtt1xWrthuSGclxIswGAdLbHpuqnTf" rel="nofollow">https://www.youtube.com/playlist?list=PLYxFtt1xWrthuSGclxIsw...</a><p>There's plenty of coverage over on r/riscv and r/spacemit_riscv</p>
]]></description><pubDate>Fri, 01 May 2026 23:49:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=47981836</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=47981836</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47981836</guid></item><item><title><![CDATA[New comment by brucehoult in "Ask HN: Favourite Assembly Instructions?"]]></title><description><![CDATA[
<p>My example is applicable to compiler / assembler / JIT / emulator.<p>The performance of conventional compilers and assemblers is not important to anyone but developers, but everyone uses JavaScript / WebAsm all the time. And QEMU can be important too (e.g. in docker for non-native ISAs, using binfmt_misc).<p>I guess I should point out in the proposed RISC-V example, it's 6 bytes of code as the initial shift can be a 2-byte "C" extension instruction. So that's slightly smaller code than everything except 32 bit PowerPC, which is another important aspect. Arm64 and M68k use 8 bytes of code.<p>Oh! I just realised standard RISC-V can be improved in this case (but not by so much in the general case).<p><pre><code>    srli   x12, x10, 20          # shift field down to correct position
    andi   x12, x12, 0x7FE       # mask to 10 bits
    andi   x11, x11, ~0x7FE      # clear space in the destination
    or     x11, x11, x12         # insert the field
</code></pre>
That's just 12 bytes of code.<p>In the more general case you need a `lui` or `lui;andi` pair to load the mask into a register, and then register to register ops, for 14 bytes total.<p>Note that x86_64 needs four instructions and 14 bytes of code, so no better than RISC-V.</p>
]]></description><pubDate>Fri, 24 Apr 2026 02:49:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=47884896</link><dc:creator>brucehoult</dc:creator><comments>https://news.ycombinator.com/item?id=47884896</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47884896</guid></item></channel></rss>