Hacker News: timlarshanson

New comment by timlarshanson in "Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model"

timlarshanson — Sat, 08 Nov 2025 02:44:51 +0000

This was very surprising to me, so I just fact-check this statement (using Kimi K2 thinking, natch), and it's presently is off by a factor of 2 - 4. In 2024 China installed 277 GW solar, so 0.25 GW / 8 hours. First half of 2025 they installed 210 GW, so 0.39 GW / 8 hours.

Not quite at 1 GW / 8 hrs, but approaching that figure rapidly!

(I'm not sure where the coal plant comes in - really, those numbers should be derated relative to a coal plant, which can run 24/7)

New comment by timlarshanson in ""Ensuring Accountability for All Agencies" – Executive Order"

timlarshanson — Wed, 19 Feb 2025 19:08:47 +0000

Re the Apportionment Act of 1929 -- care to elaborate? Are there figures for "the worst representation in the free world"?

My impression is that there are many reasons for the dysfunction of congress; the media feedback control system (in a literal and metaphorical sense) plays an important role, as does the filibuster, lobbyists, and other corruption.

(Aside: in aging, an organisms feedback and homeostatic systems tend to degrade / become simpler with time, which leads to decreased function / cancer etc. While some degree of refactoring & dead-code cruft-removal is necessary - and hopefully is happening now, as I think most Americans desire - the explicit decline in operational structure is bad. (Not that you'd want a systems biologist to run the country.))

New comment by timlarshanson in "Titans: Learning to Memorize at Test Time"

timlarshanson — Fri, 17 Jan 2025 21:28:30 +0000

Ok, thanks for the clarification.

Seems the implicit assumption then is that M(q) -> v 'looks like' or 'is smooth like' the dot product, otherwise 'train on keys, inference on queries' wouldn't work ? (safe assumption imo with that l2 norm & in general; unsafe if q and k are from different distributions).

Correct me if I'm wrong, but typically k and v are generated via affine projections K, V of the tokens; if M is matrix-valued and there are no forget and remember gates (to somehow approx the softmax?), then M = V K^-1

New comment by timlarshanson in "Titans: Learning to Memorize at Test Time"

timlarshanson — Fri, 17 Jan 2025 01:35:49 +0000

I doubt it. This does not seem to be a particularly well written or well thought-out paper -- e.g. equations 6 and 7 contradict their descriptions in the sentence below; the 'theorem' is an assertion.

After reading a few times, I gather that, rather than kernelizing or linearizing attention (which has been thoroughly explored in the literature), they are using a MLP to do run-time modelling of the attention operation. If that's the case (?), (which is interesting, sure): 1 -- Why did they not say this plainly. 2 -- Why does eq. 12 show the memory MLP being indexed by the key, whereas eq. 15 shows it indexed by the query? 3 -- What's with all the extra LSTM-esque forget and remember gates? Meh. Wouldn't trust it without ablations.

I guess if a MLP can model a radiance field (NeRF) well, stands to reason it can approx attention too. The Q,K,V projection matrices will need to be learned beforehand using standard training.

While the memory & compute savings are clear, uncertain if this helps with reasoning or generalization thereof. I doubt that too.

New comment by timlarshanson in "I am rich and have no idea what to do"

timlarshanson — Mon, 06 Jan 2025 18:47:50 +0000

Yes, this bothered be as well - the department of government efficiency is, as with all government agencies, is working for the public good in the public interest. This means everything must default to being open, unless there is a good reason not to be (military, CIA etc).

I don't trust Elon, and don't see why DOGE should (or could) be secret - unless it's a cover to acquire more power, which seems to be his true objective. (recently, at least)

New comment by timlarshanson in "Differential Transformer"

timlarshanson — Wed, 09 Oct 2024 18:52:24 +0000

Yep. From what I've seen, if the head wants to do nothing, it can attend to itself = no inter-token communication.

Still, differential attention is pretty interesting & the benchmarking good, seems worth a try! It's in the same vein as linear or non-softmax attention, which also can work.

Note that there is an error below Eq. 1: W^V should be shape [d_model x d_model] not [d_model, 2*d_model] as in the Q, K matrices.

Idea: why not replace the lambda parameterization between softmax operations with something more general, like a matrix or MLP? E.g: Attention is the affine combination of N softmax attention operations (say, across heads). If the transformer learns an identity matrix here, then you know the original formulation was correct for the data; if it's sparse, these guys were right; if it's something else entirely then who knows...

New comment by timlarshanson in "Company builds 500cc ‘one-stroke’ engine"

timlarshanson — Sat, 15 Jul 2023 18:58:39 +0000

Agreed.

Also, how does it get started? Seems like the only force pushing the piston against the axial cam is the fuel explosion. (Perhaps extra springs to retract the piston for intake?)

New comment by timlarshanson in "New lightweight material is stronger than steel"

timlarshanson — Thu, 03 Feb 2022 02:30:05 +0000

Yep. This is a very interesting material, and of course it's a research prototype -- but it's not very strong. They list the modulus as 12.7 GPa and the yield strength (= ultimate tensile strength, since the film tears) as 488 MPa.

In comparison, polyimide (PMDA-PPD), which also easily solvent processable, has a modulus of 8.9 GPa, and a yield strength of 350 MPa.

Less equal comparisons involve polymers that are molecuarly aligned by drawing, spinning, or chemical processes. Dyneema UHMEPE has a modulus of 110 GPa and a ultimate tensile strength of 3.5 GPa. Kevlar is similar; it utilizes interlocking hydrogen bonds to convey strength. Even stronger are glass fibers (>4 GPa tensile strength) or PAN carbon fiber (> 6 GPa tensile strength).

You of course lose some strength when you make composites out of fiber -- but irregardless this polymer is many times weaker and softer.

New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"

timlarshanson — Tue, 16 Feb 2021 04:15:43 +0000

I don't follow how punctuated equilibrium fits in here, but I do agree with your general intuition. Evolution 'likes' spaces that are navigable. Protein evolution is, in my mind, the paragon of this: even though the space of possible amino acid sequences is tremendously huge, since 2010 relatively few new folds have been discovered, and it seems that there are only ~ 100k of them in nature. See https://ebrary.net/44216/health/limits_fold_space

Proteins get the substrate right, and a handful of folds are sufficient for all the interactions an organism could need -- so evolution can find new solutions quickly. (It only took hundreds of millions of years for LUCA's parents to figure /that/ out.)

It seems that being able to parameterize the problem space such that solutions are plentiful and accessible via random search is nearly equivalent to solving the problem... In this case, using an ANN to stand in for ('parameterize') organismal development is entirely reasonable (and would hence 'solve' the problem), look forward to seeing the results of that. But as with the OP I'm cautious as to the efficiency of backprop vs evolution.

New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"

timlarshanson — Tue, 16 Feb 2021 02:05:43 +0000

I was aware of Neftci's work, but not your result -- I stand corrected! Given the perspective, given LIF networks are causal systems, of course you can reverse it with sufficient memory. I understand the memory in this case are input synaptic currents at the time of every spike (e.g. what synapses contributed to the spike). This is suspiciously similar to spine and dendritic calcium concentrations. Those variables are usually only stored for a short time - but that said the hippocampus (at least) is adept at reverse replay so there is no reason calcium could not be a proxy for 'adjoint'. hum.

Interesting Maass references too. Cheers

New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"

timlarshanson — Mon, 15 Feb 2021 21:32:08 +0000

But, if your realistically-spiking, stateful, noisy biological neural network is non-differentiable (which, so far as I know, is true), then how are you going to propagate gradients back through it to update your ANN approximated learning rule?

I suspect that given the small size of synapses the algorithmic complexity of learning rules (and there are several) is small. Hence, you can productively use evolutionary or genetic algorithms to perform this search/optimization. Which I think you'd have to due to the lack of gradients, or simply due to computational cost. Plenty of research going on in this field. (Heck, while you're at it, might as well perform similar search over wiring typologies & recapitulate our own evolution without having to deal with signaling cascades, transport of mRNA & protein along dendrites, metabolic limits, etc)

Anyway, coming from a biological perspective: evolution is still more general than backprop, even if in some domains it's slower.

New comment by timlarshanson in "Not everyone has an internal monologue"

timlarshanson — Fri, 31 Jan 2020 04:54:54 +0000

Agreed. As I've grown older, I spend less time running tight verbal loops in my mind, and more time examining things visually. It seems more externally-oriented, and allows for better sleep.

New comment by timlarshanson in "The “sewing machine” for minimally invasive neural recording"

timlarshanson — Sat, 13 Apr 2019 06:01:36 +0000

> This is mostly meant for neuroscience research in animals.

Exactly! And thank you, nice summary.

It should be noted, with respect to other comments here as well, that in animals the cranial vault re-closes 4-8 weeks after a craniotomy/craniectomy (rats). Dead neurons basically never grow back. Hence 'minimally invasive' refers to the attempt to minimize the brain insult and injury.