<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: timlarshanson</title><link>https://news.ycombinator.com/user?id=timlarshanson</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 24 Apr 2026 08:23:52 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=timlarshanson" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by timlarshanson in "Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model"]]></title><description><![CDATA[
<p>This was very surprising to me, so I just fact-check this statement (using Kimi K2 thinking, natch), and it's presently is off by a factor of 2 - 4.  In 2024 China installed 277 GW solar, so 0.25 GW / 8 hours.  First half of 2025 they installed 210 GW, so 0.39 GW / 8 hours.<p>Not quite at 1 GW / 8 hrs, but approaching that figure rapidly!<p>(I'm not sure where the coal plant comes in - really, those numbers should be derated relative to a coal plant, which can run 24/7)</p>
]]></description><pubDate>Sat, 08 Nov 2025 02:44:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=45853655</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=45853655</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45853655</guid></item><item><title><![CDATA[New comment by timlarshanson in ""Ensuring Accountability for All Agencies" – Executive Order"]]></title><description><![CDATA[
<p>Re the Apportionment Act of 1929 -- care to elaborate?  Are there figures for "the worst representation in the free world"?<p>My impression is that there are many reasons for the dysfunction of congress; the media feedback control system (in a literal and metaphorical sense) plays an important role, as does the filibuster, lobbyists, and other corruption.<p>(Aside: in aging, an organisms feedback and homeostatic systems tend to degrade / become simpler with time, which leads to decreased function / cancer etc.  While some degree of refactoring & dead-code cruft-removal is necessary - and hopefully is happening now, as I think most Americans desire - the explicit decline in operational structure is bad.  (Not that you'd want a systems biologist to run the country.))</p>
]]></description><pubDate>Wed, 19 Feb 2025 19:08:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=43106060</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=43106060</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43106060</guid></item><item><title><![CDATA[New comment by timlarshanson in "Titans: Learning to Memorize at Test Time"]]></title><description><![CDATA[
<p>Ok, thanks for the clarification.<p>Seems the implicit assumption then is that M(q) -> v 'looks like' or 'is smooth like' the dot product, otherwise 'train on keys, inference on queries' wouldn't work ?  (safe assumption imo with that l2 norm & in general; unsafe if q and k are from different distributions).<p>Correct me if I'm wrong, but typically k and v are generated via affine projections K, V of the tokens; if M is matrix-valued and there are no forget and remember gates (to somehow approx the softmax?), then M = V K^-1</p>
]]></description><pubDate>Fri, 17 Jan 2025 21:28:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=42743468</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=42743468</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42743468</guid></item><item><title><![CDATA[New comment by timlarshanson in "Titans: Learning to Memorize at Test Time"]]></title><description><![CDATA[
<p>I doubt it.  This does not seem to be a particularly well written or well thought-out paper -- e.g. equations 6 and 7 contradict their descriptions in the sentence below; the 'theorem' is an assertion.<p>After reading a few times, I gather that, rather than kernelizing or linearizing attention (which has been thoroughly explored in the literature), they are using a MLP to do run-time modelling of the attention operation.  If that's the case (?), (which is interesting, sure): 
1 -- Why did they not say this plainly.  
2 -- Why does eq. 12 show the memory MLP being indexed by the key, whereas eq. 15 shows it indexed by the query? 
3 -- What's with all the extra LSTM-esque forget and remember gates?  Meh. Wouldn't trust it without ablations.<p>I guess if a MLP can model a radiance field (NeRF) well, stands to reason it can approx attention too.  The Q,K,V projection matrices will need to be learned beforehand using standard training.<p>While the memory & compute savings are clear, uncertain if this helps with reasoning or generalization thereof.  I doubt that too.</p>
]]></description><pubDate>Fri, 17 Jan 2025 01:35:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=42733107</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=42733107</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42733107</guid></item><item><title><![CDATA[New comment by timlarshanson in "I am rich and have no idea what to do"]]></title><description><![CDATA[
<p>Yes, this bothered be as well - the department of government efficiency is, as with all government agencies, is working for the public good in the public interest.  This means everything must default to being open, unless there is a good reason not to be (military, CIA etc).<p>I don't trust Elon, and don't see why DOGE should (or could) be secret - unless it's a cover to acquire more power, which seems to be his true objective.  (recently, at least)</p>
]]></description><pubDate>Mon, 06 Jan 2025 18:47:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=42613831</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=42613831</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42613831</guid></item><item><title><![CDATA[New comment by timlarshanson in "Differential Transformer"]]></title><description><![CDATA[
<p>Yep. From what I've seen, if the head wants to do nothing, it can attend to itself = no inter-token communication.<p>Still, differential attention is pretty interesting & the benchmarking good, seems worth a try! It's in the same vein as linear or non-softmax attention, which also can work.<p>Note that there is an error below Eq. 1: W^V should be shape [d_model x d_model] not [d_model, 2*d_model] as in the Q, K matrices.<p>Idea: why not replace the lambda parameterization between softmax operations with something more general, like a matrix or MLP?  E.g: Attention is the affine combination of N softmax attention operations (say, across heads). If the transformer learns an identity matrix here, then you know the original formulation was correct for the data; if it's sparse, these guys were right; if it's something else entirely then who knows...</p>
]]></description><pubDate>Wed, 09 Oct 2024 18:52:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=41791335</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=41791335</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41791335</guid></item><item><title><![CDATA[New comment by timlarshanson in "Company builds 500cc ‘one-stroke’ engine"]]></title><description><![CDATA[
<p>Agreed.<p>Also, how does it get started?  Seems like the only force pushing the piston against the axial cam is the fuel explosion.  (Perhaps extra springs to retract the piston for intake?)</p>
]]></description><pubDate>Sat, 15 Jul 2023 18:58:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=36739872</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=36739872</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36739872</guid></item><item><title><![CDATA[New comment by timlarshanson in "New lightweight material is stronger than steel"]]></title><description><![CDATA[
<p>Yep.  This is a very interesting material, and of course it's a research prototype -- but it's not very strong.  They list the modulus as 12.7 GPa and the yield strength (= ultimate tensile strength, since the film tears) as 488 MPa.<p>In comparison, polyimide (PMDA-PPD), which also easily solvent processable, has a modulus of 8.9 GPa, and a yield strength of 350 MPa.<p>Less equal comparisons involve polymers that are molecuarly aligned by drawing, spinning, or chemical processes.  Dyneema UHMEPE has a modulus of 110 GPa and a ultimate tensile strength of 3.5 GPa.  Kevlar is similar; it utilizes interlocking hydrogen bonds to convey strength.  Even stronger are glass fibers (>4 GPa tensile strength) or PAN carbon fiber (> 6 GPa tensile strength).<p>You of course lose some strength when you make composites out of fiber -- but irregardless this polymer is many times weaker and softer.</p>
]]></description><pubDate>Thu, 03 Feb 2022 02:30:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=30187708</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=30187708</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30187708</guid></item><item><title><![CDATA[New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"]]></title><description><![CDATA[
<p>I don't follow how punctuated equilibrium fits in here, but I do agree with your general intuition.  Evolution 'likes' spaces that are navigable.  Protein evolution is, in my mind,  the paragon of this: even though the space of possible amino acid sequences is tremendously huge, since 2010 relatively few new folds have been discovered, and it seems that there are only ~ 100k of them in nature.  See <a href="https://ebrary.net/44216/health/limits_fold_space" rel="nofollow">https://ebrary.net/44216/health/limits_fold_space</a><p>Proteins get the substrate right, and a handful of folds are sufficient for all the interactions an organism could need -- so evolution can find new solutions quickly.  (It only took hundreds of millions of years for LUCA's parents to figure /that/ out.)<p>It seems that being able to parameterize the problem space such that solutions are plentiful and accessible via random search is nearly equivalent to solving the problem...  In this case, using an ANN to stand in for ('parameterize') organismal development is entirely reasonable (and would hence 'solve' the problem), look forward to seeing the results of that. But as with the OP I'm cautious as to the efficiency of backprop vs evolution.</p>
]]></description><pubDate>Tue, 16 Feb 2021 04:15:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=26150869</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=26150869</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=26150869</guid></item><item><title><![CDATA[New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"]]></title><description><![CDATA[
<p>I was aware of Neftci's work, but not your result -- I stand corrected!  Given the perspective, given LIF networks are causal systems, of course you can reverse it with sufficient memory.  I understand the memory in this case are input synaptic currents at the time of every spike (e.g. what synapses contributed to the spike).  This is suspiciously similar to spine and dendritic calcium concentrations.  Those variables are usually only stored for a short time - but that said the hippocampus (at least) is adept at reverse replay so there is no reason calcium could not be a proxy for 'adjoint'. hum.<p>Interesting Maass references too.  Cheers</p>
]]></description><pubDate>Tue, 16 Feb 2021 02:05:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=26150107</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=26150107</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=26150107</guid></item><item><title><![CDATA[New comment by timlarshanson in "Don't Mess with Backprop: Doubts about Biologically Plausible Deep Learning"]]></title><description><![CDATA[
<p>But, if your realistically-spiking, stateful, noisy biological neural network is non-differentiable (which, so far as I know, is true), then how are you going to propagate gradients back through it to update your ANN approximated learning rule?<p>I suspect that given the small size of synapses the algorithmic complexity of learning rules (and there are several) is small.  Hence, you can productively use evolutionary or genetic algorithms to perform this search/optimization.    Which I think you'd have to due to the lack of gradients, or simply due to computational cost.  Plenty of research going on in this field. (Heck, while you're at it, might as well perform similar search over wiring typologies & recapitulate our own evolution without having to deal with signaling cascades, transport of mRNA & protein along dendrites, metabolic limits, etc)<p>Anyway, coming from a biological perspective: evolution is still more general than backprop, even if in some domains it's slower.</p>
]]></description><pubDate>Mon, 15 Feb 2021 21:32:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=26147727</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=26147727</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=26147727</guid></item><item><title><![CDATA[New comment by timlarshanson in "Not everyone has an internal monologue"]]></title><description><![CDATA[
<p>Agreed.  As I've grown older, I spend less time running tight verbal loops in my mind, and more time examining things visually.  It seems more externally-oriented, and allows for better sleep.</p>
]]></description><pubDate>Fri, 31 Jan 2020 04:54:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=22199185</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=22199185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=22199185</guid></item><item><title><![CDATA[New comment by timlarshanson in "The “sewing machine” for minimally invasive neural recording"]]></title><description><![CDATA[
<p>> This is mostly meant for neuroscience research in animals.<p>Exactly!  And thank you, nice summary.<p>It should be noted, with respect to other comments here as well, that in animals the cranial vault re-closes 4-8 weeks after a craniotomy/craniectomy (rats).  Dead neurons basically never grow back.  Hence 'minimally invasive' refers to the attempt to minimize the brain insult and injury.</p>
]]></description><pubDate>Sat, 13 Apr 2019 06:01:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=19651336</link><dc:creator>timlarshanson</dc:creator><comments>https://news.ycombinator.com/item?id=19651336</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=19651336</guid></item></channel></rss>