Hacker News: nestorD

New comment by nestorD in "Futhark by example (2020)"

nestorD — Sun, 17 May 2026 02:49:54 +0000

See also jaxtyping which, contrary to what its name might imply, covers JAX/PyTorch/NumPy/MLX/TensorFlow arrays and tensors.

https://docs.kidger.site/jaxtyping/

New comment by nestorD in "MuJoCo – Advanced Physics Simulation"

nestorD — Wed, 22 Apr 2026 16:49:20 +0000

It's what put MuJoCo on my radar recently! But I was surprised to not see him do any kind of gradient descent to optimize his hyperparameters. MuJoCo has a JAX backend so it should be fairly straightforward.

New comment by nestorD in "Gerard of Cremona"

nestorD — Sat, 28 Mar 2026 18:57:47 +0000

For people wondering why the Islamic world would have had more texts, many of which are of western (Greek/Latin) origin, than the western world. The problem is that, as the Roman empire collapsed, papyrus supply disappeared in the west (while north Africa still had papyrus, and later early paper) forcing copyist to use-significantly more expensive and lower supply-parchment. As the texts on papyrus started to crumble to dust, monks had to decide which ones to save given the limited writing material available (so they saved a lot of Saint Augustin...).

New comment by nestorD in "Curating a Show on My Ineffable Mother, Ursula K. Le Guin"

nestorD — Sun, 08 Feb 2026 19:06:32 +0000

Her book Steering the Craft, is very much her writing workshop distilled into book form.

New comment by nestorD in "Task-free intelligence testing of LLMs"

nestorD — Fri, 09 Jan 2026 18:58:43 +0000

In theory, yes! If this metric ever becomes a widely used standard, one would have to start accounting for that...

But, in practice, when asking a model to pick the best answer they see a single question / answers pair and focus on determining what they think is best.

New comment by nestorD in "Task-free intelligence testing of LLMs"

nestorD — Fri, 09 Jan 2026 18:55:41 +0000

It presumes some models are better than others (and we do find that providing data with a wide mix of model strengths improves convergence) but it does not need to be one model, and it does not even need to be transitive.

New comment by nestorD in "Task-free intelligence testing of LLMs"

nestorD — Thu, 08 Jan 2026 22:14:57 +0000

On alternative ways to measure LLM intelligence, we had good success with this: https://arxiv.org/abs/2509.23510

In short: start with a dataset of question and answer pairs, where each question has been answered by two different LLMs. Ask the model you want to evaluate to choose the better answer for each pair. Then measure how consistently it selects winners. Does it reliably favor some models over the questions, or does it behave close to randomly? This consistency is a strong proxy for the model’s intelligence.

It is not subject to dataset leaks, lets you measure intelligence in many fields where you might not have golden answers, and converges pretty fast making it really cheap to measure.

New comment by nestorD in "Show HN: I Ching simulator with accurate Yarrow Stalk probabilities"

nestorD — Tue, 16 Dec 2025 00:35:00 +0000

I doubt it. The I Ching does not really have bad / low interest hexagrams. Also historians who studied the topic seem pretty sure that the yarrow stalk method is a recent introduction (by I Ching standards, we are talking about a bronze age divination tool...).

New comment by nestorD in "Show HN: I Ching simulator with accurate Yarrow Stalk probabilities"

nestorD — Mon, 15 Dec 2025 20:35:20 +0000

Fun fact: archaeological evidence on I Ching divinatory records shows an hexagram distribution different from the one produced by the yarrow stalk method. Meaning that, while it is now considered the traditional method, it was likely not the original approach.

New comment by nestorD in "What if you don't need MCP at all?"

nestorD — Sun, 16 Nov 2025 23:16:42 +0000

So far I have seen two genuinely good arguments for the use of MCPs:

* They can encapsulate (API) credentials, keeping those out of reach of the model,

* Contrary to APIs, they can change their interface whenever they want and with little consequences.

New comment by nestorD in "A new Google model is nearly perfect on automated handwriting recognition"

nestorD — Sat, 15 Nov 2025 02:43:47 +0000

I started with a UI that sounded like it was built along the same lines as yours, which had the advantage of letting me enforce a pipeline and exhaustivity of search (I don't want the 10 most promising documents, I want all of them).

But I realized I was not using it much because it was that big and inflexible (plus I keep wanting to stamp out all the bugs, which I do not have the time to do on a hobby project). So I ended up extracting it into MCPs (equipped to do full-text search and download OCR from the various databases I care about) and AGENTS.md files (defining pipelines, as well as patterns for both searching behavior and reporting of results). I also put together a sub-agent for translation (cutting away all tools besides reading and writing files, and giving it some document-specific contextual information).

That lets me use Claude Code and Codex CLI (which, anecdotally, I have found to be the better of the two for that kind of work; it seems to deal better with longer inputs produced by searches) as the driver, telling them what I am researching and maybe how I would structure the search, then letting them run in the background before checking their report and steering the search based on that.

It is not perfect (if a search surfaces 300 promising documents, it will not check all of them, and it often misunderstands things due to lacking further context), but I now find myself reaching for it regularly, and I polish out problems one at a time. The next goal is to add more data sources and to maybe unify things further.

New comment by nestorD in "A new Google model is nearly perfect on automated handwriting recognition"

nestorD — Fri, 14 Nov 2025 22:27:54 +0000

Oh! That's a nice use-case and not too far from stuff I have been playing with! (happily I do not have to deal with handwriting, just bad scans of older newspapers and texts)

I can vouch for the fact that LLMs are great at searching in the original language, summarizing key points to let you know whether a document might be of interest, then providing you with a translation where you need one.

The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

New comment by nestorD in "A non-diagonal SSM RNN computed in parallel without requiring stabilization"

nestorD — Wed, 22 Oct 2025 17:40:54 +0000

The paper[0] is actually about their logarithmic number system. Deep learning is given as an example, and their reference implementation is in PyTorch, but it is far from the only application.

Anything involving a large number of multiplications that produce extremely small or extremely large numbers could make use of their number representation.

It builds on existing complex number implementations, making it fairly easy to implement in software and relatively efficient. They provide implementations of a number of common operations, including dot product (building on PyTorch's preexisting, numerically stabilized by experts, log-sum-of-exponentials) and matrix multiplication.

The main downside is that this is a very specialized number system: if you care about things other than chains of multiplications (say... addition?), then you should probably use classical floating-point numbers.

[0]: https://arxiv.org/abs/2510.03426

New comment by nestorD in "Vibe coding tips and tricks"

nestorD — Mon, 18 Aug 2025 15:09:33 +0000

I have found putting the spec together with a model, having it to try find blindspots and write done the final take in clear and concise language, useful.

A good next step is to have the model provide a detailed step by step plan to implement the spec.

Both steps are best done with a strong planning model like Claude Opus or ChatGPT5, having it write "for my developer", before switching to something like Claude Code.

New comment by nestorD in "Claude Code is all you need"

nestorD — Mon, 11 Aug 2025 15:52:19 +0000

I have found Claude code to be significantly better, both in how good the model ends up being and in how polished it is. To the point that I do not drop down to Gemini CLI when I reach my Claude usage limit.

New comment by nestorD in "Ask HN: How can ChatGPT serve 700M users when I can't run one GPT-4 locally?"

nestorD — Fri, 08 Aug 2025 20:19:08 +0000

The first step is to acquire hardware fast enough to run one query quickly (and yes, for some model size you are looking at sharding the model and distributed runs). The next one is to batch request, improving GPU use significantly.

Take a look at vLLM for an open source solution that is pretty close to the state of the art as far as handling many user queries:https://docs.vllm.ai/en/stable/

New comment by nestorD in "Measuring the Impact of AI on Experienced Open-Source Developer Productivity"

nestorD — Thu, 10 Jul 2025 17:42:34 +0000

One thing I could not find on a cursory read is how used were those developers to AI tools. I would expect someone using those regularly to benefit while someone who only played with them a couple of time would likely be slowed down as they deal with the friction of learning to be productive with the tool.

New comment by nestorD in "XBOW, an autonomous penetration tester, has reached the top spot on HackerOne"

nestorD — Wed, 25 Jun 2025 09:00:59 +0000

Yes! I recently had to manually answer and close a Github issue telling me I might have pushed an API key to github. No, "API_KEY=put-your-key-here;" is a placeholder and I should not have to waste time writing that.

New comment by nestorD in "Show HN: Shelly, terminal assistant that translates natural language into shell"

nestorD — Mon, 16 Jun 2025 09:22:08 +0000

I don't use it to avoid reading man pages. Rather, as often with LLMs, this is a faster way to do things I already know how to do. Looking at commands I run in various situations and typing them for me, faster than I can remember the name of a flag i use weekly with a pdf processing tool or type 5 consecutive shell commands.

Money wise, my full usage so far (including running purposely large inputs/outputs to stress test it) has cost me.... 19c. And I am not even using the cheapest model available. But, you could also run it with a local model.

New comment by nestorD in "Show HN: Shelly, terminal assistant that translates natural language into shell"

nestorD — Sat, 14 Jun 2025 13:03:55 +0000

Yes, it is API based and uses your last unique 100 shell commands as part of its prompt: it seemed important to remind users that this data does leave their machine. A fork using a local model should be fairly easy to set up.