<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: thesz</title><link>https://news.ycombinator.com/user?id=thesz</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 06:42:06 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=thesz" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by thesz in "Transformers are inherently succinct"]]></title><description><![CDATA[
<p><p><pre><code>  > What kind of models are you including here?
</code></pre>
Logistic regression, non-linear logistic regression, simple neural networks, etc. Not the code as in "bunch of ifs" or decision trees.<p>The truth tables? Predict zeroth bit in enwik9 from, say, bits of 64 previous bytes. Or multiply two 16 bit numbers and predict bit 16 of the multiplication result.</p>
]]></description><pubDate>Sat, 06 Jun 2026 09:44:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48423162</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48423162</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48423162</guid></item><item><title><![CDATA[New comment by thesz in "Transformers are inherently succinct"]]></title><description><![CDATA[
<p><p><pre><code>  > doesn't that mean we're approaching optimality?
</code></pre>
No.<p>Transformers are Markov chains [1]. Somewhere around this fascinating site [2] I read that stateful models have an advantage. Author provided an example, a state machine with two states A and B, where at state A transitions are to state A (output 0) and to state B (output 1) with equal probability and at state B the transition is always to state A and output is always 1.<p>For this state machine just one bit of memory can make an optimal prediction that ones always go in pairs, whereas Markov chain will approximate this prediction and never reach optimality.<p><pre><code>  [1] https://arxiv.org/abs/2410.02724
  [2] https://bactra.org/</code></pre></p>
]]></description><pubDate>Fri, 05 Jun 2026 20:51:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=48418104</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48418104</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48418104</guid></item><item><title><![CDATA[New comment by thesz in "Transformers are inherently succinct"]]></title><description><![CDATA[
<p>My comment in the previous discussion of that paper: <a href="https://news.ycombinator.com/item?id=48014197">https://news.ycombinator.com/item?id=48014197</a><p>Authors used LTL (linear temporal logic) to express, basically, non-reduced non-ordered binary decision diagrams. Or just binary decision diagrams, BDDs.<p>BDDs are almost guaranteed to have exponential size because they do not employ reduction (sharing of common expressions). Reduced BDDs are more succinct and reduced ordered BDDs are even more succinct.<p>Also, transformers in the paper are constructed, not trained. Training any model to express some truth table is very hard. They also did not perform comparison with, say, Kolmogorov-Arnold representation, which is also universal approximator.<p>So this paper is not as deep as one may think it is.</p>
]]></description><pubDate>Fri, 05 Jun 2026 20:35:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=48417877</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48417877</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48417877</guid></item><item><title><![CDATA[New comment by thesz in "CQL: Categorical Databases"]]></title><description><![CDATA[
<p>Most probably you do not want two of your SELECT statements to have the same seed during the execution.<p>This necessitates some state, most probably, persisted one. Hence, imperativeness.</p>
]]></description><pubDate>Thu, 04 Jun 2026 14:29:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=48399219</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48399219</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48399219</guid></item><item><title><![CDATA[New comment by thesz in "CQL: Categorical Databases"]]></title><description><![CDATA[
<p>While it is true on the schema level, it is not true at the query level.<p>The most frequently used kind of join is a general relation.<p>[1] <a href="https://github.com/agirish/tpcds/blob/master/query1.sql" rel="nofollow">https://github.com/agirish/tpcds/blob/master/query1.sql</a><p>Query 1 from TPC-DS creates a multi-column relation by using GROUP BY. Which relation is then partially constrained by different columns from different tables.</p>
]]></description><pubDate>Tue, 02 Jun 2026 16:10:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48372179</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48372179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48372179</guid></item><item><title><![CDATA[New comment by thesz in "CQL: Categorical Databases"]]></title><description><![CDATA[
<p>Imperative languages such as C/C++ specify "microtransactions" - an ordering over memory accesses (including (de)allocations) within some statement or group of statements.<p>Compilers are free to rearrange these accesses if the final result is same as if executed by these ordered microtransactions.<p>Consider loop fusion, loop splitting and/or loop skewing.</p>
]]></description><pubDate>Tue, 02 Jun 2026 16:04:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48372066</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48372066</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48372066</guid></item><item><title><![CDATA[New comment by thesz in "CQL: Categorical Databases"]]></title><description><![CDATA[
<p>For all practical purposes, it very much is, consider random number generation.</p>
]]></description><pubDate>Tue, 02 Jun 2026 15:58:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=48371963</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48371963</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48371963</guid></item><item><title><![CDATA[New comment by thesz in "Only 17% of all 64-bit Integers are products of two 32-bit integers"]]></title><description><![CDATA[
<p>Some of them, with notable exception of perfect squares of primes, can be expressed by products of different combinations of factors.<p>E.g., 6^2 = (2<i>2</i>3)<i>3 = 2</i>(2<i>3</i>3).<p>One of ~22 (ln(2^32)) perfect squares will be a square of perfect prime. Most won't.</p>
]]></description><pubDate>Mon, 01 Jun 2026 19:07:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48361210</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48361210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48361210</guid></item><item><title><![CDATA[New comment by thesz in "Why AI Agents Cannot Change Software Systems"]]></title><description><![CDATA[
<p>PSP/TSP have everything to do with the discussion.</p>
]]></description><pubDate>Fri, 29 May 2026 11:19:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=48321700</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48321700</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48321700</guid></item><item><title><![CDATA[New comment by thesz in "A Eureka machine that thinks like nature and explores what AI cannot"]]></title><description><![CDATA[
<p><p><pre><code>  > gradient descent isn't good at combinatorial optimisation.
</code></pre>
If you convolve your problem with sufficiently wide Gaussian, you can use gradient descent. The approach is called Natural Evolution Strategies [1].<p>[1] <a href="https://en.wikipedia.org/wiki/Natural_evolution_strategy#Natural_gradient_ascent" rel="nofollow">https://en.wikipedia.org/wiki/Natural_evolution_strategy#Nat...</a><p>It requires O(N^4) evaluations to compute Fisher Information Matrix for N-dimensional parameterization of the problem in original formulation. But there are closed form solutions and more economical representations of covariance matrix (LoRA, hehe).</p>
]]></description><pubDate>Thu, 28 May 2026 18:16:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=48313143</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48313143</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48313143</guid></item><item><title><![CDATA[New comment by thesz in "I think Anthropic and OpenAI have found product-market fit"]]></title><description><![CDATA[
<p>Multiply "inference + backwards pass (~2x inference cost) + activations (vram overhead)" by batch size (thousands) to get to the actual RAM and compute cost. Optimizer like ADAM adds only two or three model-sized overhead.<p>And last, but not least, you need only one hidden layer kept in RAM for inference, but you need all of them (61 for Deepseek models) kept in RAM for computing gradient for one sample.</p>
]]></description><pubDate>Wed, 27 May 2026 21:29:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=48301007</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48301007</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48301007</guid></item><item><title><![CDATA[New comment by thesz in "I think Anthropic and OpenAI have found product-market fit"]]></title><description><![CDATA[
<p>The difference between training and inference is 1) one have to keep intermediate results for backward pass in training and 2) computation for training double because of the backward pass.<p>Training is also done over batches, which increase memory requirements by several orders of magnitude. This is why training needs costly compute.<p>One of the ways out of this unfortunate situation is to use something like Stochastic Average Gradient Descent [1]. Examples there are mostly concerned with regularized logistic regression, which makes problem more or less convex. Neural networks are inherently non-convex. Still, maybe some ideas from there can be utilized in the context of neural networks, like use of estimated Lipshitz constant to derive curvature and appropriate learning step.<p><pre><code>  [1] https://www.cs.ubc.ca/~schmidtm/Courses/540-W19/L12.pdf</code></pre></p>
]]></description><pubDate>Wed, 27 May 2026 20:31:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48300200</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48300200</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48300200</guid></item><item><title><![CDATA[New comment by thesz in "Why AI Agents Cannot Change Software Systems"]]></title><description><![CDATA[
<p>English is not my native language.<p>PSP/TSP (Personal and Team Software Processes) and RUP (Rational's Uniform Process, to a lesser extent) are very valuable approaches to software engineering. Please, read about them, they are very interesting.</p>
]]></description><pubDate>Wed, 27 May 2026 18:30:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48298426</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48298426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48298426</guid></item><item><title><![CDATA[New comment by thesz in "Why AI Agents Cannot Change Software Systems"]]></title><description><![CDATA[
<p>LLMs are Markov Chains [1]. "Emergent abilities" of LLMs can be explained by decrease of perplexity in text prediction [2].<p><pre><code>  [1] https://arxiv.org/abs/2410.02724
  [2] https://arxiv.org/abs/2304.15004</code></pre></p>
]]></description><pubDate>Wed, 27 May 2026 14:50:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48295243</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48295243</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48295243</guid></item><item><title><![CDATA[New comment by thesz in "Why AI Agents Cannot Change Software Systems"]]></title><description><![CDATA[
<p><p><pre><code>  > central concept or software engineering is architecture patterns.
</code></pre>
Both RUP and PSP/TSP do stand on the ground of defect prevention. All sorts of defects, from incorrect sets of requirements to memory corruption.<p>Architecture patterns can be of help in that regard and they also can be very error-prone, as right now I am in the process of removing a bug introduced through misunderstanding of one rather old singleton.</p>
]]></description><pubDate>Wed, 27 May 2026 14:28:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48294939</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48294939</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48294939</guid></item><item><title><![CDATA[New comment by thesz in "Electrobun 2.0 will be decoupled from Bun due to the Rust rewrite"]]></title><description><![CDATA[
<p><p><pre><code>  > As long as the tests pass and the performance doesn't regress...
</code></pre>
AMD had 20+ million tests for their FPUs back in the day of Intel's FDIV bug and ACL2 found bugs in their implementations of floating point computation.<p>Agentic vibe coding is not an application of ACL2 theorem prover to anything. Agentic vibe coding is an opposite of it, it will make its way to pass the tests with any means necessary, be it patching the code, the tests, or expected results.<p><pre><code>   > it's secure
</code></pre>
You can't say that before formal verification. Which is an opposite of what vibe coding is.</p>
]]></description><pubDate>Tue, 26 May 2026 00:36:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=48273595</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48273595</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48273595</guid></item><item><title><![CDATA[New comment by thesz in "Haskell Foundation 2026 Update"]]></title><description><![CDATA[
<p>The steepness of Haskell learning curve is exaggerated.<p>I have experience where completely unmotivated (due to then ongoing very bad divorce) former Java programmer took a description of CPU for model code generation in Haskell's eDSL and successfully extended it in a month. The model had tricky (at the time) type level programming and type classes with associated types.</p>
]]></description><pubDate>Thu, 21 May 2026 13:14:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=48222110</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48222110</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48222110</guid></item><item><title><![CDATA[New comment by thesz in "Haskell Foundation 2026 Update"]]></title><description><![CDATA[
<p>I think you may find helpful this [1] almost 20 years old [2] agent.<p><pre><code>  [1] https://github.com/augustss/djinn
  [2] http://lambda-the-ultimate.org/node/1178
</code></pre>
The context window it requires for AI coding can be as short as half a dozen of tokens.</p>
]]></description><pubDate>Thu, 21 May 2026 13:06:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48222006</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48222006</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48222006</guid></item><item><title><![CDATA[New comment by thesz in "Learnings from 100K lines of Rust with AI (2025)"]]></title><description><![CDATA[
<p><p><pre><code>  > LLMs have a temperature parameter. At zero temperature they are deterministic: they always choose the most likely next token at each step based on what came before and the model weights, and they will always generate the same output given the same input.
</code></pre>
<a href="https://en.wikipedia.org/wiki/Softmax_function" rel="nofollow">https://en.wikipedia.org/wiki/Softmax_function</a><p>"A value proportional to the reciprocal of β is sometimes referred to as the temperature: β = 1/kT, where k is typically 1 or the Boltzmann constant and T is the temperature. A higher temperature results in a more uniform output distribution (i.e. with higher entropy; it is "more random"), while a lower temperature results in a sharper output distribution, with one value dominating."<p>"Temperature" in the context of softmax does not change a "winning" token, it changes how much probable (in the sense of softmax distribution) winning token will be. If the winning token is "New York", it will be a winner with temperature close to 0 and with temperature of 1e9.<p>The actual selection of the random token is done separately by using inputs outside of the softmax distribution, for example, by using random number generator. I believe most of LLM configs have a seed for the random number generator.<p>More than that, generation of code in most programming languages is done with the more guardrails such as beam search guided by schema, syntax and semantics.</p>
]]></description><pubDate>Wed, 20 May 2026 20:35:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=48213735</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48213735</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48213735</guid></item><item><title><![CDATA[New comment by thesz in "Amazon employees are "tokenmaxxing" due to pressure to use AI tools"]]></title><description><![CDATA[
<p>Have you tried to change your HDL to something more modern like Bluespec System Verilog or, god forbid, anything embedded into Haskell or Scala?<p>I read that BSV source code is about three times shorter than similar design in Verilog and also has three times smaller defect density (defects per significant line of code). So just by changing the HDL from Verilog to BSV one can have nine (9) times less defects in the design.</p>
]]></description><pubDate>Tue, 12 May 2026 21:33:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48114901</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=48114901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48114901</guid></item></channel></rss>