<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mattalex</title><link>https://news.ycombinator.com/user?id=mattalex</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 01 Jul 2026 02:06:50 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mattalex" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mattalex in "GLM 5.2 Performance Benchmarks"]]></title><description><![CDATA[
<p>The issue with having a "no answer" option is that you implicitly add a decision problem into your test that depends on the "cost" of answering wrong.<p>Specifically, your model now has two "correct" classes p(class=y|x) and p(class=⊥|x). This makes the results ambiguous.
The way you resolve this is by adding in a cost of missclassification and a cost of answering wrong.<p>L(y, y') =<p>0 if y=y'
l_err if y≠y' and y'≠⊥
l_⊥  if y' = ⊥<p>You can then estimate the expected error over your dataset.
Notice that this now gives you additional degrees of freedom: Depending on how expensive answering wrong is compared to not answering at all, your predictor might be really bad or really good.<p>This means when benchmarking with a "no answer" action, you are often not actually benchmarking whether the model works well or not, but rather are benchmarking how well the model _happens_ to agree with the class-error weight you (implicitly) chose in your model.</p>
]]></description><pubDate>Wed, 17 Jun 2026 15:47:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48572125</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=48572125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48572125</guid></item><item><title><![CDATA[New comment by mattalex in "Do transformers need three projections? Systematic study of QKV variants"]]></title><description><![CDATA[
<p>Because the size of the attention matrix depends on the number of tokens (this is what makes attention N^2).
If you don't care about having a flexible number of input tokens (e.g. in image processing) you can learn a fixed routing matrix. This is known as an MLP mixer <a href="https://arxiv.org/pdf/2105.01601" rel="nofollow">https://arxiv.org/pdf/2105.01601</a> : you have one layer that processes each token in isolation ("vertical MLP") but ignores the inter-token connections, followed by a layer that combines between tokens ("horizontal MLP") that treats the internals of every token identically.</p>
]]></description><pubDate>Fri, 05 Jun 2026 17:19:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=48415526</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=48415526</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48415526</guid></item><item><title><![CDATA[New comment by mattalex in "Uber's $1,500/month AI limit is a useful signal for AI tool pricing"]]></title><description><![CDATA[
<p>Nothing is stopping them, it's just not worth it: Have a look at e.g. vast.ai's pricing (<a href="https://vast.ai/pricing" rel="nofollow">https://vast.ai/pricing</a>).<p>The V100 (2017 -> 9 years old) can be rented from $0.02 to $0.37/h (right now I can find a V100 with a Xeon Gold 6140 and 48GB RAM for $0.165/h). Let's assume the guy you rent it to pins it at its 250W TDP and let's ignore the running costs of CPU/RAM/etc...
Then you draw 1/4 kwh for that compute hour. The industrial electricity prices in the US vary between 7.5 and 25 ct per kwh (depending on state, time of day, etc...), so at 100% efficiency, assuming nothing ever breaks, and the CPU consumes 0W you earn about 14ct/h.<p>And remember: V100s hours are sometimes sold at 1/10th the price.<p>If I pick average conditions you need to start thinking of whether it is worth it to rent them out: Usually it isn't unless you have them anyways and just sell idle capacity.<p>It's barely worth it to run them in a pure "is it profitable" sense, if we also account for the opportunity cost of taking up a slot in your datacenter it seizes to be worth it really quickly.</p>
]]></description><pubDate>Wed, 03 Jun 2026 21:13:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48390161</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=48390161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48390161</guid></item><item><title><![CDATA[New comment by mattalex in "ESP32-S31"]]></title><description><![CDATA[
<p>Regarding specifically depth anything: You're not running this on a microcontroller.
In general, CNNs still reign supreme on microcontrollers since you have a way lower peak memory demand which is what usually kills you. Here in this case you have a couple of _kilobytes_ of SRAM, potentially extendable to a couple of megabytes of PSRAM.<p>Even for small CNNs you often need to do some quite complex interleaving of layers (i.e. running parts of layer 1 and layer 2 in parallel interleaved to take advantage of the downsampling of CNNs) to keep performance and memory impact reasonable (see e.g. <a href="https://openreview.net/pdf?id=2O8qbyxH6X" rel="nofollow">https://openreview.net/pdf?id=2O8qbyxH6X</a>).<p>Think more "image classifier" less "run an image to image transformer". For depth anything, a single layer's activation is probably significantly larger than the available SRAM (I think it is (224/16)^2 patches each with activations [48, 96, 192, 384] for depth anything small: You aren't running this.)</p>
]]></description><pubDate>Wed, 03 Jun 2026 20:53:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48389885</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=48389885</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48389885</guid></item><item><title><![CDATA[New comment by mattalex in "Spotify will start reserving concert tickets for fans"]]></title><description><![CDATA[
<p>Then allow resale through the platform at the original purchase price.<p>You still have no scalping, but you recover the ability to back out due to unforeseen events</p>
]]></description><pubDate>Fri, 22 May 2026 08:38:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=48233427</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=48233427</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48233427</guid></item><item><title><![CDATA[New comment by mattalex in "Softmax, can you derive the Jacobian? And should you care?"]]></title><description><![CDATA[
<p>It works the same way:
softmax is essentially just applying the normalization to the vector exp(x).
From an "engineering" POV this effectively ensures that the vector you normalize has strictly positive entries, so the result ends up being a proper distribution.<p>From a theory POV you get softmax like distributions (Gibbs distributions) by trying to balance following some energy E(x) and the entropy of the distribution. 
In essence the softmax is the answer to "I try to follow the maximum of a function E(x) but I need to maintain some level of uncertainy".<p>The balancing coefficient between entropy and picking the maximum of the function is called "temperature" (following the behavior of particles in a physical system: The colder the system, the lower the chance of having particles randomly walk away from the minimal energy state).<p>specifically, the temperature is<p>softmax(x/temp)<p>if you draw temp->0, your softmax slowly becomes an argmax (with temp=0 being a literal argmax). If you increase the temperature, you are closer to the "random fluctuations" leaving more room for sampling x values that are not the maximum of x. (this is why e.g. LLMs become deterministic as you decrease temp->0)<p>Using a different base other than e implicitly changes the temperature:<p>N^x = exp(ln(N) x)<p>The normalization works the same since you are still dividing a positive value N^x by the sum of all alternatives sum(N^x_i), which is a normalization by design</p>
]]></description><pubDate>Fri, 01 May 2026 16:27:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47976645</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47976645</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47976645</guid></item><item><title><![CDATA[New comment by mattalex in "Functional programmers need to take a look at Zig"]]></title><description><![CDATA[
<p>I'm what way is that different? 
You return early and the call Cascades up the call chain until you handle it (otherwise it's always an "either" results)<p>In practice you use something like an exception monad, which makes this a lot more ergonomic since you don't need to carry a case distinction around for every unwrap: an exception monad essentially has an implicit passthrough that says "if it's a value, apply the function, if it's an exception just keep that".
You only need to "catch" the exception if you actually need the value.
I'm this case the exception monad is not that different from annotating a function with "throws": your calling function either needs it's own throws (=error monad wrapper) in which case exceptions just roll through, or you remove the throws, but now need to handle the exception explicitly (=unwrap the monad).</p>
]]></description><pubDate>Thu, 30 Apr 2026 10:11:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47960372</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47960372</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47960372</guid></item><item><title><![CDATA[New comment by mattalex in "Functional programmers need to take a look at Zig"]]></title><description><![CDATA[
<p>Of course you can: you just have to define it in your type.
The output set becomes a union type of the normal output and whatever you want as an exception.<p>If you write this as a monad, your get very similar syntax to procedural code.</p>
]]></description><pubDate>Thu, 30 Apr 2026 07:10:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=47959220</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47959220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47959220</guid></item><item><title><![CDATA[New comment by mattalex in "Is Germany's gold safe in New York ?"]]></title><description><![CDATA[
<p>Effectively none. The US has a huge trade deficit with Germany/Europe so there is practically never a case where the US receives gold from Germany: It's always more then offset by the deficit.<p>The equivalent for the US would be the consumption goods that are already flowing into the US. I.e. US gets goods but doesn't sell enough to Germany, so the difference to maintain the total exchange rate is the Gold.<p>That's also why it was trivial for france to repatriate its gold compared to germany: Germany holds about 10x the amount of gold in the US compared to France (France was ~120 tons, Germany is roughly 1200 tons: France earned its gold through different trade).<p>That's also why it is such a complex thing to repatriate German reserves: France took almost 1 year to repatriate its gold. For Germany, the efforts would be decade spanning (though maybe with recent changes there is a little more urgency).</p>
]]></description><pubDate>Mon, 06 Apr 2026 19:45:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47665954</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47665954</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47665954</guid></item><item><title><![CDATA[New comment by mattalex in "Is Germany's gold safe in New York ?"]]></title><description><![CDATA[
<p>Germany already repatriated about half of its gold reserves between 2013 and 2017 from paris and new york to frankfurt.<p>There has been a recent (as in "18th of march" recent) petition to the Bundestag to repatriate the gold.<p>The reason not to repatriate the remaining gold back then is because Germany has substantial trade with the US, which is why Germany held gold in new york to begin with: It's the easiest way to resolve USD-Euro currency exchange at the central bank level (this is also why germany got rid of the paris gold reserves: with the euro you don't need currency exchange anymore).<p>Also, as you mentioned, the idea of "officially" repatriating gold with the current administration is quite dicey. It is very possible that the correct way of resolving this is to just stop buying gold in new york and let the currency exchange flux deal with the slow unwinding of the reserves without explicit repatriation.</p>
]]></description><pubDate>Mon, 06 Apr 2026 17:31:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47664101</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47664101</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47664101</guid></item><item><title><![CDATA[New comment by mattalex in "Is Germany's gold safe in New York ?"]]></title><description><![CDATA[
<p>Whether the US is capable of hiding their maleficence or not should not be an indicator of whether it is safe to deal with them. If your indicator for the US being a good partner in _anything_ is that "well we did corrupt things in the past, but people didn't use to care about it", then the US is still not a good partner.<p>It's not like the US has never e.g. openly threatened NATO allies with war: There is quite literally a standing law that allows the US president to invade the netherlands if any US military personnel is ever detained by the International Criminal Court.
This law has been on the books for over 20 years and has the publically announced intention to prevent the US from being prosecuted for all the other atrocities committed in e.g. Iraq. This bill was supported by both democrats and republicans.<p>The reality is that the US' stance towards the rest of the world has not changed with the recent administrations (nor would I expect it to: Trump does not happen in a vacuum). What did change was willingness of the rest of the world to act on the US' actions.</p>
]]></description><pubDate>Mon, 06 Apr 2026 14:22:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47661302</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47661302</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47661302</guid></item><item><title><![CDATA[New comment by mattalex in "BitNet: Inference framework for 1-bit LLMs"]]></title><description><![CDATA[
<p>There were plenty of models the size of gpt3 in industry.<p>The core insight necessary for chatgpt was not scaling (that was already widely accepted): the insight was that instead of finetuning for each individual task, you can finetune once for the meta-task of instruction following, which brings a problem specification directly into the data stream.</p>
]]></description><pubDate>Wed, 11 Mar 2026 18:29:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47339323</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47339323</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47339323</guid></item><item><title><![CDATA[New comment by mattalex in "OpenAI agrees with Dept. of War to deploy models in their classified network"]]></title><description><![CDATA[
<p>Assuming this is real: Why do you think anthropic was put on what is essentially an "enemy of the state" list and openai didn't?<p>The two things anthropic refused to do is mass surveillance and autonomous weapons, so why do _you_ think openai refused and still did not get placed on the exact same list.<p>It's fine to say "I'm not going to resign. I didn't even sign that letter", but thinking that openai can get away with not developing autonomous weapons or mass surveillance is naive at the very best.</p>
]]></description><pubDate>Sat, 28 Feb 2026 08:02:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47192010</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=47192010</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47192010</guid></item><item><title><![CDATA[New comment by mattalex in "Microsoft Favors Anthropic over OpenAI for Visual Studio Code"]]></title><description><![CDATA[
<p>It might be that they pay less for anthropic depending how many tokens are generated by each model: total cost is token cost times number of tokens. I haven't checked gpt5, but it is not impossible that price wise they might be very comparable if you account for reasoning tokens used.</p>
]]></description><pubDate>Tue, 16 Sep 2025 18:05:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45265657</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=45265657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45265657</guid></item><item><title><![CDATA[New comment by mattalex in "Into CPS, Never to Return"]]></title><description><![CDATA[
<p>This is essentially the principle behind algebraic effects (which, in practice, do get implemented as delimited continuations):<p>When you have an impure effect (e.g. check a database, generate a random number, write to a file, nondeterministic choices,...), instead of directly implementing the impure action, you instead have a symbol e.g "read", "generate number", ...<p>When executing the function, you also provide a context of "interpreters" that map the symbol to whatever action you want.
This is very useful, since the actual business logic can be analyzed in an isolated way.
For instance, if you want to test your application you can use a dummy interpreter for "check database" that returns whatever values you need for testing, but without needing to go to an actual SQL database.
It also allows you to switch backends rather easily: If your database uses the symbols "read", "write", "delete" then you just need to implement those calls in your backend. If you want to formally prove properties of your code, you can also do that by noting the properties of your symbols, e.g. `∀ key. read (delete key) = None`.<p>Since you always capture the symbol using an interpreter, you can also do fancy things like dynamically overriding the interpreter:
To implement a seeded random number generator, you can have an interpreter that always overrides itself using the new seed. The interpreter would look something like this<p>```<p>Pseudorandom_interpreter(seed)(argument, continuation):<p><pre><code>  rnd, new_seed <- generate_pseudorandom(seed, argument)
  with Pseudorandom_interpreter(new_seed):
       continuation(rnd)</code></pre>
```<p>You can clearly see the continuation passing style and the power of self-overriding your own interpreter.
In fact, this is a nice way of handeling state in a pure way: Just put something other than new_seed into the new interpreter.<p>If you want to debug a state machine, you can use an interpreter like this<p>```
replace_state_interpreter(state)(new_state, continuation):<p><pre><code>  with replace_state_interpreter(new_state ++ state):
       continuation(head state)</code></pre>
```<p>To trace the state.
This way the "state" always holds the entire history of state changes, which can be very nice for debugging.
During deployment, you can then replace use a different interpreter<p>```<p>replace_state_interpreter(state)(new_state, continuation):<p><pre><code>  with replace_state_interpreter(new_state):
       continuation(state)</code></pre>
```<p>which just holds the current state.</p>
]]></description><pubDate>Thu, 26 Dec 2024 11:33:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=42514613</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=42514613</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42514613</guid></item><item><title><![CDATA[New comment by mattalex in "A DSL for peephole transformation rules of integer operations in the PyPy JIT"]]></title><description><![CDATA[
<p>Once you have strong normalization you can just check local confluence and use Newman's lemma to get strong confluence. That should be pretty easy: just build all n^2 pairs and run them to termination (which you have proven before). If those pairs are confluent, so is the full rewriting scheme.</p>
]]></description><pubDate>Thu, 24 Oct 2024 11:36:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=41934523</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=41934523</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41934523</guid></item><item><title><![CDATA[New comment by mattalex in "AI engineers claim new algorithm reduces AI power consumption by 95%"]]></title><description><![CDATA[
<p>That entirely depends on what AMD device you look at: gaming GPUs are not well supported, but their instinct line of accelerators works just as well as cuda. keep in mind that, in contrast to Nvidia, AMD uses different architectures for compute and gaming (though they are changing that in the next generation)</p>
]]></description><pubDate>Sun, 20 Oct 2024 11:31:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=41894656</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=41894656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41894656</guid></item><item><title><![CDATA[New comment by mattalex in "Sony shutting down Concord, refunds after 2 week launch. 8 year dev, 25k sales"]]></title><description><![CDATA[
<p>To expand on that: there's also the issue that these games have to be (somewhat) competitive multiplayer games: multiplayer because otherwise there's no way to create enough content, and competitive since otherwise there's less of a reason to play the game for long periods of time.<p>If you've ever played a dead/dying competitive game as a newcomer you will know the problem this creates: since the people that stay around are either new or very dedicated players, the skill gap becomes gigantic, which turns of most new players.<p>if your game wins the Life-Service race, you draw other players in. If your game dies the very same structure that keep players around will prevent new players from joining.</p>
]]></description><pubDate>Wed, 04 Sep 2024 06:18:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=41442448</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=41442448</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41442448</guid></item><item><title><![CDATA[New comment by mattalex in "Iron as an inexpensive storage medium for hydrogen"]]></title><description><![CDATA[
<p>There are alternatives to iron that have higher efficiency and lower prices. For instance <a href="https://hydrogenious.net/" rel="nofollow">https://hydrogenious.net/</a> does exactly that but with benzene like structures. The advantage of this is that you can reuse existing infrastructure for transport and you have higher transport efficiency: while the square cube law exist, the same thing holds for the forces on the chamber walls which have to increase in thickness. Hydrogen tanks are also very expensive as they have to be manufactured to tight tolerances (and they need to be replaced rate often due to hydrogen creep weakening chamber walls)</p>
]]></description><pubDate>Sun, 01 Sep 2024 06:21:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=41414747</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=41414747</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41414747</guid></item><item><title><![CDATA[New comment by mattalex in "Encyclopedia of Optimization"]]></title><description><![CDATA[
<p>The paper I have mentioned can be found here <a href="https://arxiv.org/pdf/2206.09787" rel="nofollow">https://arxiv.org/pdf/2206.09787</a><p>There are so many things that have only been invented in the last couple of years like RINS, MCF cuts, conflict analysis, symmetry detection, dynamic search,... (see e.g. Tobias Achterberg's line of work).<p>On the other hand, hardware improvements were not as relevant for LP and MILP solvers as one would expect: For instance, as of now there is still no solver that really uses GPU compute (though people are working on that). The reason is that parallelization of simplex solvers is quite though since the algorithm is inherently sequential (it's a descend over simplex vertices) and the actual linear algebra is very sparse (if not entirely matrix free). You can do some things like lookahead for better pricing or row/column generation approaches, but you have to be very careful in that (interior point methods are arguably nicer to parallelize but in many cases have a penalty in performance compared to simplex).<p>MILP/MINLP solvers are much nicer to parallelize at first glance since you can parallelize across branches in the branch-and-bound, but in practice that is also pretty hard: Moderns solvers are so efficient that it can easily happen that you spend a lot of compute exploring a branch that is quickly proven to be unncessary to explore by a different branch (e.g. SCIP, the fastest open-source MINLP solver is completely single threaded and still _somewhat_ competetive). This means that a lot of the algorithmic improvements are hidden inside the parallelization improvements. I.e. a lot of time has been spent on the question of "What do we have to do to parallelize the solver without just wasting the additional threads".</p>
]]></description><pubDate>Sun, 18 Aug 2024 11:15:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=41281640</link><dc:creator>mattalex</dc:creator><comments>https://news.ycombinator.com/item?id=41281640</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41281640</guid></item></channel></rss>