<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: flebron</title><link>https://news.ycombinator.com/user?id=flebron</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 06 May 2026 18:02:42 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=flebron" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by flebron in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"]]></title><description><![CDATA[
<p>The standard way of doing MTP is to run the drafter autoregressively for k steps, and <i>then</i> (not concurrently) use the larger model as a verifier for those k tokens at the same time. The larger model can then accept a prefix of those k tokens, and in any case generates one more token (which is needed in case you accepted zero tokens from the drafter). The larger model can effectively use this k as a "batch" dimension, reducing the penalty of large weight loading. Meanwhile the drafter is much smaller, so it's fine for _it_ to be autoregressive, as long as the main model is parallel.</p>
]]></description><pubDate>Tue, 05 May 2026 20:11:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48027836</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=48027836</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48027836</guid></item><item><title><![CDATA[New comment by flebron in "Generalised plusequals"]]></title><description><![CDATA[
<p>The website asks what they do in Haskell. The answer is property modification and reading, as well as very powerful traversal constructs, use lenses (<a href="https://hackage.haskell.org/package/lens" rel="nofollow">https://hackage.haskell.org/package/lens</a> , tutorial at <a href="https://hackage.haskell.org/package/lens-tutorial-1.0.5/docs/Control-Lens-Tutorial.html" rel="nofollow">https://hackage.haskell.org/package/lens-tutorial-1.0.5/docs...</a>).</p>
]]></description><pubDate>Sat, 25 Apr 2026 01:16:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47897754</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=47897754</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47897754</guid></item><item><title><![CDATA[New comment by flebron in "A list is a monad"]]></title><description><![CDATA[
<p>I think the point is that a monad is a useful concept _purely_ because of what it _allows_ you to do, and _not_ because of anything syntactical. Those rules that you're obviating there, the commutative squares, are precisely what then lets us have powerful intuitions about these objects. The type signatures matter a lot less. If, for example, you don't have functoriality (which is false for `std::vector`, for instance, since `std::vector<bool>` is special-cased) you lose the ability to reason powerfully about abstract algorithms.<p>Thus, explaining the syntax and where the type variables go is explaining the least relevant thing about monads to their power and importance. It's certainly easy to showcase both the syntax and the list and maybe monads, that's part of the "monad tutorial fallacy". Gaining intuition for how to think about monads _in general_ is a lot harder and requires practice. Like, yes, list and maybe are "containers", but is `(->) t` a container? Is `IO`? How do these compose, if at all? What is this about "effect" semantics, "I thought monads were just burritos/containers"? etc. These are the hard, both conceptually and pedagogically, questions. Yes you need to know the syntax to use it in any given programming language, but knowing what scabbard your knife fits in doesn't give you the skills of how to use knife :)</p>
]]></description><pubDate>Thu, 03 Jul 2025 01:55:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=44450885</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=44450885</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44450885</guid></item><item><title><![CDATA[New comment by flebron in "Attention Wasn't All We Needed"]]></title><description><![CDATA[
<p>This is an excellent summary of these techniques :) I like that every single one comes with an example implementation, with shape comments on the tensors. Thanks Stephen!</p>
]]></description><pubDate>Fri, 23 May 2025 19:57:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=44076071</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=44076071</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44076071</guid></item><item><title><![CDATA[New comment by flebron in "Google is winning on every AI front"]]></title><description><![CDATA[
<p>Perhaps this chapter can help? <a href="https://jax-ml.github.io/scaling-book/tpus/" rel="nofollow">https://jax-ml.github.io/scaling-book/tpus/</a><p>It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (<a href="https://openxla.org/xla/operation_semantics" rel="nofollow">https://openxla.org/xla/operation_semantics</a>), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).</p>
]]></description><pubDate>Sat, 12 Apr 2025 15:39:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=43665307</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=43665307</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43665307</guid></item><item><title><![CDATA[New comment by flebron in "You could have designed state of the art positional encoding"]]></title><description><![CDATA[
<p>All of them are vectors of embedded representations of tokens. In a transformer, you want to compute the inner product between a query (the token who is doing the attending) and the key (the token who is being attended to). An inductive bias we have is that the neural network's performance will be better if this inner product depends on the relative distance between the query token's position, and the key token's position. We thus encode each one with positional information, in such a way that (for RoPE at least) the inner product depends only on the distance between these tokens, and not their absolute positions in the input sentence.</p>
]]></description><pubDate>Mon, 18 Nov 2024 18:29:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=42175364</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=42175364</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42175364</guid></item><item><title><![CDATA[New comment by flebron in "What Is ChatGPT Doing and Why Does It Work? (2023)"]]></title><description><![CDATA[
<p>No. In the common use of the word fine-tuning, one is in the supervised learning scenario. One has an input prompt, and an output sentence. One teaches the model to say that output in response to that prompt. In the reinforcement learning scenario, one has a prompt, and a way of rewarding the model for different outputs. One can have, for instance, a reward model, that assigns a reward for a given model output. One could also have a pairwise reward model, where the learner is sampled with that prompt twice (with different RNGs), and the reward model gives a reward based on the better of the two samples. You could also have humans give these pointwise or pairwise rewards.<p>In essence, one is not telling the model "This. This is what you should output next time." but rather "I liked this reply. Have a cookie." The behaviors that you can learn in RL are more subtle, but you get a lot less information per step. That's because, in a causal language modeling objective, when I tell you "For the prompt X, you should output exactly Y[0...m)", you get a gradient for P(Y[0] | X), another one for P(Y[1] | X Y[0..1)), another for P(Y[2] | X Y[0..2)), another for P(Y[3] | X Y[0..3)), and so on. It's a lot more of a step-by-step guidance, than it is a sentence-wise reward that you get in the RL framework. In RL, I'd give you a cookie for P(Y | X). What part of Y made me give you that cookie? Was there even such a part? Was it perhaps some internal representation that made everything in Y better? That's for the model to learn.</p>
]]></description><pubDate>Tue, 18 Jun 2024 16:45:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=40719780</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=40719780</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40719780</guid></item><item><title><![CDATA[New comment by flebron in "Almost no one pays a 6% real-estate commission except Americans"]]></title><description><![CDATA[
<p>"Almost nobody except X does Y." and "Z does Y, with Z != X" are consistent. Is your disagreement entirely due to the (possibly nil) distinction between "Almost nobody except X does Y" and "Almost nobody does Y -- Except X", which is how the article is worded? If so, at least one native English speaker will disagree.<p>The poster you're replying to seems to have more of an emphasis in how the "Wrong." parent was worded. There was no need to be that confrontational, when one is adding a point of information ("Z does Y, with Z != X") that is most likely consistent with the thread title, and with the article content.</p>
]]></description><pubDate>Fri, 17 Nov 2023 18:29:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=38307792</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=38307792</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38307792</guid></item><item><title><![CDATA[New comment by flebron in "90% of Kidnappings in São Paulo result from dates on Tinder and similar apps"]]></title><description><![CDATA[
<p>I don't see a link between that Bloomberg article and these kidnappings happening mostly to married men cheating. Nor is it the case that "love motels" are primarily used for cheating. We have plenty of those in Argentina. As the Bloomberg article indicates, they serve a social need because of the traditional multigenerational family homes people live in.</p>
]]></description><pubDate>Thu, 02 Mar 2023 01:08:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=34990644</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=34990644</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34990644</guid></item><item><title><![CDATA[New comment by flebron in "YouTube CEO Susan Wojcicki is stepping down"]]></title><description><![CDATA[
<p>I think k8sToGo's point is that she never "joined Alphabet", Alphabet did not exist when she joined Larry and Sergey. She joined concurrent with Google's founding as a company, she probably joined when the search engine was named Backrub. Alphabet would be created decades later.</p>
]]></description><pubDate>Thu, 16 Feb 2023 17:41:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=34822162</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=34822162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34822162</guid></item><item><title><![CDATA[New comment by flebron in "2000 Years of Matrix Multiplication"]]></title><description><![CDATA[
<p>The trifecta of:<p><pre><code>  * There's almost always a simple geometric intuition, and low-dimensional intuition can get you quite far even in high dimensional cases.
  * You can surprisingly often get by with closing your eyes and saying "my problem is linear" three times. See: All of neural networks.
  * Linear problems have practically all nice properties you could ever ask of any function.
</code></pre>
Has made linear algebra by far the most bang/buck mathematics topic I've studied in my life. Close behind is asymptotic analysis.</p>
]]></description><pubDate>Sat, 04 Feb 2023 20:10:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=34657703</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=34657703</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34657703</guid></item><item><title><![CDATA[New comment by flebron in "Archaeologists ask Netflix to reclassify Graham Hancock’s docuseries as fiction"]]></title><description><![CDATA[
<p>That seems a weird use of the word. By that notion, every convicted criminal is a victim, because the court imposed a sentence that they, presumably, do not like. This removes practically all meaning from the word "victim". In common parlance, well-understood consequences of one's own actions occurring does not make one a victim.</p>
]]></description><pubDate>Sun, 04 Dec 2022 23:26:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=33859201</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=33859201</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33859201</guid></item><item><title><![CDATA[New comment by flebron in "A Sun-like star orbiting a black hole"]]></title><description><![CDATA[
<p>pc = Parsec :) <a href="https://en.wikipedia.org/wiki/Parsec" rel="nofollow">https://en.wikipedia.org/wiki/Parsec</a><p>I don't know what the G stands for. Possibly luminosity of some kind?</p>
]]></description><pubDate>Tue, 08 Nov 2022 16:45:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=33521079</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=33521079</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33521079</guid></item><item><title><![CDATA[New comment by flebron in "Gödel, Escher, Bach: an in-depth explainer"]]></title><description><![CDATA[
<p>Well, most interesting properties about computer programs are, in general for all programs, undecidable (<a href="https://en.wikipedia.org/wiki/Rice%27s_theorem" rel="nofollow">https://en.wikipedia.org/wiki/Rice%27s_theorem</a>). Undecidability is a closely related notion to unprovability (<a href="https://en.wikipedia.org/wiki/Undecidable_problem#Relationship_with_G%C3%B6del's_incompleteness_theorem)" rel="nofollow">https://en.wikipedia.org/wiki/Undecidable_problem#Relationsh...</a>.</p>
]]></description><pubDate>Wed, 14 Sep 2022 02:51:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=32832763</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=32832763</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32832763</guid></item><item><title><![CDATA[New comment by flebron in "Examples of common false beliefs in mathematics"]]></title><description><![CDATA[
<p>Consider a disconnected domain (say, union of a few open balls in R^n), and f being constant in each connected component, but having different values in each ball. The differential is indeed everywhere 0 in the entire domain.</p>
]]></description><pubDate>Mon, 31 Jan 2022 10:14:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=30146461</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=30146461</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30146461</guid></item><item><title><![CDATA[Reversing an integer hash function]]></title><description><![CDATA[
<p>Article URL: <a href="https://taxicat1.github.io/hash6432shift_inversion.html">https://taxicat1.github.io/hash6432shift_inversion.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=29937844">https://news.ycombinator.com/item?id=29937844</a></p>
<p>Points: 87</p>
<p># Comments: 16</p>
]]></description><pubDate>Fri, 14 Jan 2022 18:06:31 +0000</pubDate><link>https://taxicat1.github.io/hash6432shift_inversion.html</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=29937844</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=29937844</guid></item><item><title><![CDATA[New comment by flebron in "Monads are monoids in the category of endofunctors"]]></title><description><![CDATA[
<p>I've always found the definition of monoid objects in a category of endofunctors to be tougher to grasp than the Kleisli-category definition of a monad, at least if one is being formal about "monoid object", and checking that the necessary diagrams commute.<p>Given a functor m of Hask, we define a squiggly arrow ~>, where a ~> b is a -> m b. If these ~> are the arrows in some category (which we call the Kleisli category for m), then we call m a monad. Here id :: a ~> a in that category is what we call return :: a -> m a, and the composition (.) :: (b ~> c) -> (a ~> b) -> (a ~> c) in that category is used to create bind :: m a -> (a -> m b) -> m b, where bind x f = (.) f (const x) (return ()), where (.) is the squiggly arrow (.) we just mentioned. Bind is what's spelled >>= in Haskell, and is what's behind the "x <- f" do-notation.</p>
]]></description><pubDate>Fri, 10 Sep 2021 04:44:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=28477901</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=28477901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=28477901</guid></item><item><title><![CDATA[New comment by flebron in "Microsoft's forthcoming Minecraft Education Edition is written in C++"]]></title><description><![CDATA[
<p>Yes.</p>
]]></description><pubDate>Tue, 26 Jan 2016 17:33:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=10974584</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=10974584</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=10974584</guid></item><item><title><![CDATA[New comment by flebron in "Show HN: Intuitive nutrition information"]]></title><description><![CDATA[
<p>"1 bowl of rice" gets interpreted as 1 cup of Sake?</p>
]]></description><pubDate>Tue, 18 Aug 2015 06:48:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=10077871</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=10077871</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=10077871</guid></item><item><title><![CDATA[New comment by flebron in "Who owns copyright to Deep Dream images?"]]></title><description><![CDATA[
<p>To a sufficiently advanced alien species, _we_ are neural networks fed with some pictures (and sounds, smells, touches, and tastes) :)</p>
]]></description><pubDate>Wed, 22 Jul 2015 01:07:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=9927081</link><dc:creator>flebron</dc:creator><comments>https://news.ycombinator.com/item?id=9927081</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=9927081</guid></item></channel></rss>