Hacker News: flebron

New comment by flebron in "Accelerating Gemma 4: faster inference with multi-token prediction drafters"

flebron — Tue, 05 May 2026 20:11:46 +0000

The standard way of doing MTP is to run the drafter autoregressively for k steps, and then (not concurrently) use the larger model as a verifier for those k tokens at the same time. The larger model can then accept a prefix of those k tokens, and in any case generates one more token (which is needed in case you accepted zero tokens from the drafter). The larger model can effectively use this k as a "batch" dimension, reducing the penalty of large weight loading. Meanwhile the drafter is much smaller, so it's fine for _it_ to be autoregressive, as long as the main model is parallel.

New comment by flebron in "Generalised plusequals"

flebron — Sat, 25 Apr 2026 01:16:51 +0000

The website asks what they do in Haskell. The answer is property modification and reading, as well as very powerful traversal constructs, use lenses (https://hackage.haskell.org/package/lens , tutorial at https://hackage.haskell.org/package/lens-tutorial-1.0.5/docs...).

New comment by flebron in "A list is a monad"

flebron — Thu, 03 Jul 2025 01:55:30 +0000

I think the point is that a monad is a useful concept _purely_ because of what it _allows_ you to do, and _not_ because of anything syntactical. Those rules that you're obviating there, the commutative squares, are precisely what then lets us have powerful intuitions about these objects. The type signatures matter a lot less. If, for example, you don't have functoriality (which is false for `std::vector`, for instance, since `std::vector` is special-cased) you lose the ability to reason powerfully about abstract algorithms.

Thus, explaining the syntax and where the type variables go is explaining the least relevant thing about monads to their power and importance. It's certainly easy to showcase both the syntax and the list and maybe monads, that's part of the "monad tutorial fallacy". Gaining intuition for how to think about monads _in general_ is a lot harder and requires practice. Like, yes, list and maybe are "containers", but is `(->) t` a container? Is `IO`? How do these compose, if at all? What is this about "effect" semantics, "I thought monads were just burritos/containers"? etc. These are the hard, both conceptually and pedagogically, questions. Yes you need to know the syntax to use it in any given programming language, but knowing what scabbard your knife fits in doesn't give you the skills of how to use knife :)

New comment by flebron in "Attention Wasn't All We Needed"

flebron — Fri, 23 May 2025 19:57:12 +0000

This is an excellent summary of these techniques :) I like that every single one comes with an example implementation, with shape comments on the tensors. Thanks Stephen!

New comment by flebron in "Google is winning on every AI front"

flebron — Sat, 12 Apr 2025 15:39:22 +0000

Perhaps this chapter can help? https://jax-ml.github.io/scaling-book/tpus/

It's a chip (and associated hardware) that can do linear algebra operations really fast. XLA and TPUs were co-designed, so as long as what you are doing is expressible in XLA's HLO language (https://openxla.org/xla/operation_semantics), the TPU can run it, and in many cases run it very efficiently. TPUs have different scaling properties than GPUs (think sparser but much larger communication), no graphics hardware inside them (no shader hardware, no raytracing hardware, etc), and a different control flow regime ("single-threaded" with very-wide SIMD primitives, as opposed to massively-multithreaded GPUs).

New comment by flebron in "You could have designed state of the art positional encoding"

flebron — Mon, 18 Nov 2024 18:29:09 +0000

All of them are vectors of embedded representations of tokens. In a transformer, you want to compute the inner product between a query (the token who is doing the attending) and the key (the token who is being attended to). An inductive bias we have is that the neural network's performance will be better if this inner product depends on the relative distance between the query token's position, and the key token's position. We thus encode each one with positional information, in such a way that (for RoPE at least) the inner product depends only on the distance between these tokens, and not their absolute positions in the input sentence.

New comment by flebron in "What Is ChatGPT Doing and Why Does It Work? (2023)"

flebron — Tue, 18 Jun 2024 16:45:29 +0000

No. In the common use of the word fine-tuning, one is in the supervised learning scenario. One has an input prompt, and an output sentence. One teaches the model to say that output in response to that prompt. In the reinforcement learning scenario, one has a prompt, and a way of rewarding the model for different outputs. One can have, for instance, a reward model, that assigns a reward for a given model output. One could also have a pairwise reward model, where the learner is sampled with that prompt twice (with different RNGs), and the reward model gives a reward based on the better of the two samples. You could also have humans give these pointwise or pairwise rewards.

In essence, one is not telling the model "This. This is what you should output next time." but rather "I liked this reply. Have a cookie." The behaviors that you can learn in RL are more subtle, but you get a lot less information per step. That's because, in a causal language modeling objective, when I tell you "For the prompt X, you should output exactly Y[0...m)", you get a gradient for P(Y[0] | X), another one for P(Y[1] | X Y[0..1)), another for P(Y[2] | X Y[0..2)), another for P(Y[3] | X Y[0..3)), and so on. It's a lot more of a step-by-step guidance, than it is a sentence-wise reward that you get in the RL framework. In RL, I'd give you a cookie for P(Y | X). What part of Y made me give you that cookie? Was there even such a part? Was it perhaps some internal representation that made everything in Y better? That's for the model to learn.

New comment by flebron in "Almost no one pays a 6% real-estate commission except Americans"

flebron — Fri, 17 Nov 2023 18:29:28 +0000

"Almost nobody except X does Y." and "Z does Y, with Z != X" are consistent. Is your disagreement entirely due to the (possibly nil) distinction between "Almost nobody except X does Y" and "Almost nobody does Y -- Except X", which is how the article is worded? If so, at least one native English speaker will disagree.

The poster you're replying to seems to have more of an emphasis in how the "Wrong." parent was worded. There was no need to be that confrontational, when one is adding a point of information ("Z does Y, with Z != X") that is most likely consistent with the thread title, and with the article content.

New comment by flebron in "90% of Kidnappings in São Paulo result from dates on Tinder and similar apps"

flebron — Thu, 02 Mar 2023 01:08:40 +0000

I don't see a link between that Bloomberg article and these kidnappings happening mostly to married men cheating. Nor is it the case that "love motels" are primarily used for cheating. We have plenty of those in Argentina. As the Bloomberg article indicates, they serve a social need because of the traditional multigenerational family homes people live in.

New comment by flebron in "YouTube CEO Susan Wojcicki is stepping down"

flebron — Thu, 16 Feb 2023 17:41:02 +0000

I think k8sToGo's point is that she never "joined Alphabet", Alphabet did not exist when she joined Larry and Sergey. She joined concurrent with Google's founding as a company, she probably joined when the search engine was named Backrub. Alphabet would be created decades later.

New comment by flebron in "2000 Years of Matrix Multiplication"

flebron — Sat, 04 Feb 2023 20:10:35 +0000

The trifecta of:

  * There's almost always a simple geometric intuition, and low-dimensional intuition can get you quite far even in high dimensional cases.
  * You can surprisingly often get by with closing your eyes and saying "my problem is linear" three times. See: All of neural networks.
  * Linear problems have practically all nice properties you could ever ask of any function.

Has made linear algebra by far the most bang/buck mathematics topic I've studied in my life. Close behind is asymptotic analysis.

New comment by flebron in "Archaeologists ask Netflix to reclassify Graham Hancock’s docuseries as fiction"

flebron — Sun, 04 Dec 2022 23:26:06 +0000

That seems a weird use of the word. By that notion, every convicted criminal is a victim, because the court imposed a sentence that they, presumably, do not like. This removes practically all meaning from the word "victim". In common parlance, well-understood consequences of one's own actions occurring does not make one a victim.

New comment by flebron in "A Sun-like star orbiting a black hole"

flebron — Tue, 08 Nov 2022 16:45:08 +0000

pc = Parsec :) https://en.wikipedia.org/wiki/Parsec

I don't know what the G stands for. Possibly luminosity of some kind?

New comment by flebron in "Gödel, Escher, Bach: an in-depth explainer"

flebron — Wed, 14 Sep 2022 02:51:46 +0000

Well, most interesting properties about computer programs are, in general for all programs, undecidable (https://en.wikipedia.org/wiki/Rice%27s_theorem). Undecidability is a closely related notion to unprovability (https://en.wikipedia.org/wiki/Undecidable_problem#Relationsh....

New comment by flebron in "Examples of common false beliefs in mathematics"

flebron — Mon, 31 Jan 2022 10:14:12 +0000

Consider a disconnected domain (say, union of a few open balls in R^n), and f being constant in each connected component, but having different values in each ball. The differential is indeed everywhere 0 in the entire domain.

Reversing an integer hash function

flebron — Fri, 14 Jan 2022 18:06:31 +0000

Article URL: https://taxicat1.github.io/hash6432shift_inversion.html

Comments URL: https://news.ycombinator.com/item?id=29937844

Points: 87

# Comments: 16

New comment by flebron in "Monads are monoids in the category of endofunctors"

flebron — Fri, 10 Sep 2021 04:44:18 +0000

I've always found the definition of monoid objects in a category of endofunctors to be tougher to grasp than the Kleisli-category definition of a monad, at least if one is being formal about "monoid object", and checking that the necessary diagrams commute.

Given a functor m of Hask, we define a squiggly arrow ~>, where a ~> b is a -> m b. If these ~> are the arrows in some category (which we call the Kleisli category for m), then we call m a monad. Here id :: a ~> a in that category is what we call return :: a -> m a, and the composition (.) :: (b ~> c) -> (a ~> b) -> (a ~> c) in that category is used to create bind :: m a -> (a -> m b) -> m b, where bind x f = (.) f (const x) (return ()), where (.) is the squiggly arrow (.) we just mentioned. Bind is what's spelled >>= in Haskell, and is what's behind the "x <- f" do-notation.

New comment by flebron in "Microsoft's forthcoming Minecraft Education Edition is written in C++"

flebron — Tue, 26 Jan 2016 17:33:45 +0000

Yes.

New comment by flebron in "Show HN: Intuitive nutrition information"

flebron — Tue, 18 Aug 2015 06:48:12 +0000

"1 bowl of rice" gets interpreted as 1 cup of Sake?

New comment by flebron in "Who owns copyright to Deep Dream images?"

flebron — Wed, 22 Jul 2015 01:07:11 +0000

To a sufficiently advanced alien species, _we_ are neural networks fed with some pictures (and sounds, smells, touches, and tastes) :)