Hacker News: oxxoxoxooo

New comment by oxxoxoxooo in "Optimization of 32-bit Unsigned Division by Constants on 64-bit Targets"

oxxoxoxooo — Mon, 13 Apr 2026 10:40:24 +0000

On x86, there is no vector instruction to get the upper half of integer product (64-bits x 64-bits). ARM SVE2 and RISC-V RVV have one, x86 unfortunately does not (and probably wont for a long time as AVX10 does not add it, either).

RISC-V Vector Primer

oxxoxoxooo — Sat, 07 Feb 2026 11:38:12 +0000

Article URL: https://github.com/simplex-micro/riscv-vector-primer/blob/main/index.md

Comments URL: https://news.ycombinator.com/item?id=46923051

Points: 69

# Comments: 22

New comment by oxxoxoxooo in "Milk-V Titan: A $329 8-Core 64-bit RISC-V mini-ITX board with PCIe Gen4x16"

oxxoxoxooo — Sun, 18 Jan 2026 18:47:14 +0000

Do you happen to know how does one access/use those A100 cores?

The RISC-V Instruction Tier List [video]

oxxoxoxooo — Mon, 27 Oct 2025 11:14:47 +0000

Article URL: https://www.youtube.com/watch?v=qLEKOfVQEZI

Comments URL: https://news.ycombinator.com/item?id=45719633

Points: 4

# Comments: 0

New comment by oxxoxoxooo in "Qualcomm to acquire Arduino"

oxxoxoxooo — Tue, 07 Oct 2025 13:36:25 +0000

If you ever wondered, how Arduino came about: The Untold History of Arduino (https://arduinohistory.github.io/).

Crab Nebula (time-lapse movie 2008-2022)

oxxoxoxooo — Tue, 06 May 2025 20:01:02 +0000

Article URL: https://app.astrobin.com:443/u/DetlefHartmann?i=ija7jc

Comments URL: https://news.ycombinator.com/item?id=43909032

Points: 3

# Comments: 0

New comment by oxxoxoxooo in "Tell HN: GpuOwl/PRPLL, GPU software used to find the largest prime number"

oxxoxoxooo — Sun, 27 Oct 2024 15:14:17 +0000

Thank you very much for the answers, very informative!

And congratulations on the discovery!

New comment by oxxoxoxooo in "Tell HN: GpuOwl/PRPLL, GPU software used to find the largest prime number"

oxxoxoxooo — Sun, 27 Oct 2024 10:12:22 +0000

Hi! Please, I also have a few questions:

1. I guess the most time consuming part is multiplication, right? What kind of FFT do you use? Schönhage-Strassen, multi-prime NTT, ..? Is it implemented via floating-point numbers or integers?

2. Not sure if you encountered this, but do you have any advice for small mulmod (multiplication reduced by prime modulus)? By small I mean machine-word size (i.e. preferably 64-bits).

3. For larger modulus, what do you use? Is it worth precomputing the inverse by, say, Newton iteration or is it faster to use asymptotically slower algorithms? Do you use Montgomery representation?

4. Does the code use any kind of GCD? What algorithm did you choose?

5. Now this is a bit broad question, but could you perhaps compare the traditional algorithms implemented sequentially (e.g. GMP) and algorithm suitable to run on GPUs? I mean, does it make sense to use, say, a quadratic algorithm amenable to parallel execution, rather than a asymptotically faster (and sequential) algorithm?

New comment by oxxoxoxooo in "Symbolica Computer Algebra System"

oxxoxoxooo — Fri, 10 May 2024 19:35:18 +0000

Thanks for the reply! If you don't mind asking more: what do you use for polynomial GCD? Apparently it is quite fast, do you use some standard algorithm implemented well or is there some kind of algorithmic improvement? Is it described somewhere, say a paper or a book? Have you tried to benchmark it against NTL, for example? Thanks again!

New comment by oxxoxoxooo in "Symbolica Computer Algebra System"

oxxoxoxooo — Thu, 09 May 2024 17:29:02 +0000

Please, what do you use for bigints? GMP?

New comment by oxxoxoxooo in "The Era of 1-bit LLMs: ternary parameters for cost-effective computing"

oxxoxoxooo — Wed, 28 Feb 2024 19:23:46 +0000

Prior art:

Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1

https://arxiv.org/abs/1602.02830

Ternary Neural Networks for Resource-Efficient AI Applications

https://arxiv.org/abs/1609.00222

New comment by oxxoxoxooo in "STB: Single-file public domain libraries for C/C++"

oxxoxoxooo — Sun, 07 Jan 2024 09:56:26 +0000

True. Except, it's trivial to mitigate this: one only needs to wrap the whole library under one giant #ifndef. Like here, for example: https://github.com/sheredom/utf8.h/blob/b7ed0a28eb92803c81d6...

New comment by oxxoxoxooo in "Passive SSH Key Compromise via Lattices [pdf]"

oxxoxoxooo — Mon, 06 Nov 2023 18:26:10 +0000

Thanks!

New comment by oxxoxoxooo in "Passive SSH Key Compromise via Lattices [pdf]"

oxxoxoxooo — Mon, 06 Nov 2023 18:08:44 +0000

> Implementations of big number math can and does contain bugs. (I used to hunt for those via fuzzing, which turned up an amazing number of them.)

I'm curious, can you give some examples what kind of bugs did you discover?

New comment by oxxoxoxooo in "My favorite prime number generator"

oxxoxoxooo — Wed, 23 Aug 2023 17:33:51 +0000

And another one:

The Genuine Sieve of Eratosthenes https://www.cs.hmc.edu/~oneill/papers/Sieve-JFP.pdf

New comment by oxxoxoxooo in "Zen4's AVX512 Teardown"

oxxoxoxooo — Mon, 26 Sep 2022 20:48:01 +0000

> This instruction is used in some bignum code

Could you be more specific? I think for that to work one would also need the upper half of 64x64 multiplication and `vpmullq` provides only the lower half. You could break one 64x64 multiplication into four 32x32 multiplications (i.e. emulate the full 64x64 = 128 bits multiplication) but I was under the impression that this was slow.

What If? 2

oxxoxoxooo — Mon, 31 Jan 2022 17:56:52 +0000

Article URL: https://xkcd.com/2575/

Comments URL: https://news.ycombinator.com/item?id=30151367

Points: 1

# Comments: 0

“Risc V greatly underperforms”

oxxoxoxooo — Thu, 02 Dec 2021 18:51:45 +0000

Article URL: https://gmplib.org/list-archives/gmp-devel/2021-September/006013.html

Comments URL: https://news.ycombinator.com/item?id=29420622

Points: 310

# Comments: 348

New comment by oxxoxoxooo in "Benchmarking division and libdivide on Apple M1 and Intel AVX512"

oxxoxoxooo — Wed, 12 May 2021 23:16:23 +0000

Hi fish,

thanks for very interesting article, again!

Do you think the very fast division on M1 has any implications for 128/64 narrowing division as well? Do you know of a faster way than the method by Moller and Granlund? Do you plan on to include 128/64 division in libdivide?

And I asked this question before but the parent post got flagged so I'm trying once more: at the very bottom of the Labor of Division (Episode V) post [1], is it really possible for the second `qhat` (i.e. `q0`) to be off by 2? Do you have any examples of that?

[1] https://ridiculousfish.com/blog/posts/labor-of-division-epis...

New comment by oxxoxoxooo in "Labor of Division: Algorithm D"

oxxoxoxooo — Wed, 05 May 2021 20:54:16 +0000

I think that I shall never envision

An op unlovely as division

An op whose answer must be guessed

And then, through multiply, assessed;

An op for which we dearly pay,

In cycles wasted every day.

Division code is often hairy;

Long division's downright scary.

The proofs can overtax your brain,

The ceiling and floor may drive you insane.

Good code to divide takes a Knuthian hero

But even God can't divide by zero!

-- Henry S. Warren Jr.

Hacker's Delight