<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: thesz</title><link>https://news.ycombinator.com/user?id=thesz</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 15 Apr 2026 18:10:33 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=thesz" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by thesz in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"]]></title><description><![CDATA[
<p><p><pre><code>  > I think over-thinking is only solved by thinking more, not less.
</code></pre>
Despite "thinking" tokens being determined by the preceding tokens, they still are taken from some probability distribution, just a complex one. This means that at each token selection step there is a probability P_e of an error, of selecting a wrong token.<p>These errors compound exponentially: the probability of not selecting wrong token for N steps is 1-(1-P_e)^N.<p>The shorter "thinking" is, the less is the probability of it going astray.</p>
]]></description><pubDate>Wed, 08 Apr 2026 06:20:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47686079</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47686079</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47686079</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p>It is important to question "how to judge," not "who is to judge."<p>My answer of "how to judge?" question is the question "how easy is it to implement new unforeseen functionality with the code under scrutiny?"</p>
]]></description><pubDate>Tue, 07 Apr 2026 14:21:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47675842</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47675842</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47675842</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p><p><pre><code>  >  Most programmers suck at programming so badly that LLM/AI production IS better than 90+% (possibly 99%+).
</code></pre>
How do you know?</p>
]]></description><pubDate>Tue, 07 Apr 2026 14:15:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47675756</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47675756</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47675756</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p><p><pre><code>  > AI hasn't observably reduced the quality of software I use everyday in a way that is meaningfully separable from normal incidents in the past.
</code></pre>
Most probably, you are not looking into that well enough.<p>Average duration for AWS outages [2] was 1.5 hours per outage, 38 hours total. Most recent AWS outage in 2026 [1] downed AWS for 13 hours, a third of 38 hours spanning an year before, and was caused by AWS LLM coding tool.<p>[1] <a href="https://www.theguardian.com/technology/2026/feb/20/amazon-cloud-outages-ai-tools-amazon-web-services-aws" rel="nofollow">https://www.theguardian.com/technology/2026/feb/20/amazon-cl...</a><p>[2] <a href="https://www.cherryservers.com/blog/cloud-outages" rel="nofollow">https://www.cherryservers.com/blog/cloud-outages</a></p>
]]></description><pubDate>Tue, 07 Apr 2026 14:13:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47675718</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47675718</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47675718</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p><p><pre><code>  > I have been writing a library called "assume" where you can specify a type signature, give it a prompt, and it generates a function on the fly in the background with Claude Code, so you still write some code, but whenever you need a function you "assume" that such a function exists.
</code></pre>
This is very much like good old djinn [1], which would generate code from Haskell type specification.<p>[1] <a href="https://mail.haskell.org/pipermail/haskell/2005-December/017098.html" rel="nofollow">https://mail.haskell.org/pipermail/haskell/2005-December/017...</a><p>And this is why I boldly compare current LLM craze to the much less hyped craze of strong type systems. I was a part of that strong type system discussion, advocating for them. ;)</p>
]]></description><pubDate>Tue, 07 Apr 2026 14:04:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47675584</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47675584</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47675584</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p><p><pre><code>  > If you make the prompts specific enough and provide tests that it has to run before it passes, then it should be fairly close to deterministic.
</code></pre>
Your prompt may work on the specific state of code base and not before or after some changes. Your tests can check for the specific behavior but not for the absence of undesirable behaviors induced by absence of some specific code or by addition of other specific code.<p><pre><code>  > I assure you that's already happening. A lot.
</code></pre>
Thank you for assurance. Can we have less of it? Thank you again.</p>
]]></description><pubDate>Tue, 07 Apr 2026 13:58:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47675499</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47675499</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47675499</guid></item><item><title><![CDATA[New comment by thesz in "A cryptography engineer's perspective on quantum computing timelines"]]></title><description><![CDATA[
<p>Given that quantum computing (QC) can speed up training of neural networks (LLMs), it would be wise for Google to invest into QC as much as possible.<p>Google with Softbank invested about $230M into QC last year. Microsoft, IBM and Google have spent on QC $15B combined, through all of the time they researched it. $15B spent in 20 years, less than $1B per year, by three companies.<p>Google spent upwards of $150B last year in datacenters.<p>This may tell us something about how close we are to a working quantum computing.</p>
]]></description><pubDate>Tue, 07 Apr 2026 07:42:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47671928</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47671928</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47671928</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p>It is hard to find market shares of Viaweb and Yahoo at the time of Viaweb's purchase, the best I've found is that Viaweb was bought for 1/4 of Yahoo's net revenue at the time ($49M price/$203 net revenue in 1998). Viaweb was not profitable at the time of purchase, but it had about four people and quite modest hardware costs.<p>While there are no companies with $1.5 trillions (4*$380B) of net revenue, the difference is that Anthropic is cash net-negative, has more than 4 people in staff (none of them are hungry artists like PG) and hardware use spendings, I think, are astronomical. They are cash net-negative because of hardware needed to train models.<p>There should be more than one company able to offer good purchase terms to Anthropic's owners.<p>I also think that Anthropic, just like OpenAI and most of other LLM companies and companies' departments, ride "test set leakage," hoping general public and investors do not understand. Their models do not generalize well, being unable to generate working code in Haskell [1] at the very least.<p>[1] <a href="https://haskellforall.com/2026/03/a-sufficiently-detailed-spec-is-code" rel="nofollow">https://haskellforall.com/2026/03/a-sufficiently-detailed-sp...</a><p>PG's Viaweb had an awful code as a liability. Anthropic's Claude Code has an awful implementation (code) and produces awful code, with more liability than code written by human.</p>
]]></description><pubDate>Mon, 06 Apr 2026 22:21:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668099</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47668099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668099</guid></item><item><title><![CDATA[New comment by thesz in "The cult of vibe coding is dogfooding run amok"]]></title><description><![CDATA[
<p>This product rides a hype wave. This is why it is crazy popular and successful.<p>The situation there is akin to Viaweb - Viaweb also rode hype wave and code situation was awful as well (see PG's stories about fixing bugs during customer's issue reproduction theater).<p>What did Viaweb's buyer do? They rewrote thing in C++.<p>If history rhymes, then buyer of Anthropic would do something close to "rewrite it in C++" to the current Claude Code implementation.</p>
]]></description><pubDate>Mon, 06 Apr 2026 19:35:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=47665820</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47665820</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47665820</guid></item><item><title><![CDATA[New comment by thesz in "Finnish sauna heat exposure induces stronger immune cell than cytokine responses"]]></title><description><![CDATA[
<p>It may originate from Roman's thermae: <a href="https://en.wikipedia.org/wiki/Thermae" rel="nofollow">https://en.wikipedia.org/wiki/Thermae</a></p>
]]></description><pubDate>Sun, 05 Apr 2026 16:19:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=47650939</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47650939</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47650939</guid></item><item><title><![CDATA[New comment by thesz in "Finnish sauna heat exposure induces stronger immune cell than cytokine responses"]]></title><description><![CDATA[
<p>Hammam is not as hot as sauna and not as dry. Sauna's air temperatures can reach above 100 degress Celsius and humidity is usually relatively low (around 20%).<p>[1] <a href="https://en.wikipedia.org/wiki/Sauna" rel="nofollow">https://en.wikipedia.org/wiki/Sauna</a><p>Hammam's temperatures are around 40-50 degrees Celsius and humidity is close to 100%.<p>These are very different conditions, with very different body response.</p>
]]></description><pubDate>Sun, 05 Apr 2026 16:17:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47650924</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47650924</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47650924</guid></item><item><title><![CDATA[New comment by thesz in "The revenge of the data scientist"]]></title><description><![CDATA[
<p><p><pre><code>  > You recognize that you haven't really needed strong mathematical (or coding) skills to create models for some time.
</code></pre>
And then there goes something like this [1], where researchers failed to control for p-value: "In this particular setting, emergent abilities claims are possibly infected by a failure to control for multiple comparisons. In BIG-Bench alone, there are ≥220 tasks, ∼40 metrics per task, ∼10 model families, for a total of ∼10^6 task-metric-model family triplets, meaning probability that no task-metric-model family triplet exhibits an emergent ability by random chance might be small."<p>[1] <a href="https://arxiv.org/abs/2304.15004" rel="nofollow">https://arxiv.org/abs/2304.15004</a></p>
]]></description><pubDate>Thu, 02 Apr 2026 19:12:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47618854</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47618854</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47618854</guid></item><item><title><![CDATA[New comment by thesz in "Slop is not necessarily the future"]]></title><description><![CDATA[
<p><p><pre><code>  > No one has ever made a purchasing decision based on how good your code is.
</code></pre>
People make purchasing decisions on the availability of source code all the time, preferring source code available and be able to use it. It is safe to assume that they can perform purchase decisions on the quality of source code, given all is equal.</p>
]]></description><pubDate>Wed, 01 Apr 2026 17:58:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47604252</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47604252</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47604252</guid></item><item><title><![CDATA[New comment by thesz in "Google's 200M-parameter time-series foundation model with 16k context"]]></title><description><![CDATA[
<p><p><pre><code>  > How can the same model predict egg prices in Italy, and global inflation in a reliable way?
</code></pre>
For one, there's Benford's law: <a href="https://en.wikipedia.org/wiki/Benford%27s_law" rel="nofollow">https://en.wikipedia.org/wiki/Benford%27s_law</a><p>So, predict sign (branch predictors in modern CPUs also use neural networks of sorts), exponent (most probably it changes slowly) and then predict mantissa using Benford's law.</p>
]]></description><pubDate>Tue, 31 Mar 2026 11:54:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47586032</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47586032</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47586032</guid></item><item><title><![CDATA[New comment by thesz in "Reports of code's death are greatly exaggerated"]]></title><description><![CDATA[
<p><p><pre><code>  > A system that is not even Turing complete is extremely limited.
</code></pre>
Agda is not Turing-complete, yet it is very useful.</p>
]]></description><pubDate>Mon, 30 Mar 2026 18:54:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47578243</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47578243</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47578243</guid></item><item><title><![CDATA[New comment by thesz in "VHDL's Crown Jewel"]]></title><description><![CDATA[
<p><p><pre><code>  > The Delta Cycle logic is actually quite similar to functional reactive programming. It separates how a value changes from when a process responds to that change.
</code></pre>
This is what I use when I play with hardware simulation in Haskell:<p><pre><code>  type S a = [a]

  register :: a -> S a -> S a
  register a0 as = a0:as

  -- combinational logic can be represented as typical pure
  -- functions and then glued into "circuits" with register's
  -- and map/zip/unzip functions.
</code></pre>
This thing also separates externally visible events recorded in the (infinite) list of values from externally unobservable pure (combinational) logic. But, one can test combinational logic separately, with property based testing, etc.</p>
]]></description><pubDate>Mon, 30 Mar 2026 12:27:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47573408</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47573408</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47573408</guid></item><item><title><![CDATA[New comment by thesz in "Douglas Lenat's Automated Mathematician Source Code"]]></title><description><![CDATA[
<p>Automated Mathematician was what lead to Eurisko: <a href="https://en.wikipedia.org/wiki/Eurisko" rel="nofollow">https://en.wikipedia.org/wiki/Eurisko</a><p>Eurisko demonstrated superhuman abilities to play strategy games in early 1980-th, and even used strategies from VLSI place-and-route task in planning fleet placement in games. This is knowledge transfer between tasks.</p>
]]></description><pubDate>Mon, 30 Mar 2026 10:24:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47572541</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47572541</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47572541</guid></item><item><title><![CDATA[New comment by thesz in "Further human + AI + proof assistant work on Knuth's "Claude Cycles" problem"]]></title><description><![CDATA[
<p>It still is [1].<p>[1] <a href="https://www.vice.com/en/article/a-human-amateur-beat-a-top-go-playing-ai-using-a-simple-trick/" rel="nofollow">https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...</a></p>
]]></description><pubDate>Sun, 29 Mar 2026 07:18:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47561046</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47561046</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47561046</guid></item><item><title><![CDATA[New comment by thesz in "Reports of code's death are greatly exaggerated"]]></title><description><![CDATA[
<p><p><pre><code>  > This is meaningless.
</code></pre>
Turing machines grew from the constructive mathematics [1], where proofs are constructions of the objects or, in other words, algorithms to compute them.<p><pre><code>  [1] https://en.wikipedia.org/wiki/Constructivism_(philosophy_of_mathematics)#Constructive_mathematics
</code></pre>
Saying that there is no difference between things that can be constructed (quantum oracles) and things that are given and cannot be constructed (Turing oracles - they are not even machines of any sort) is a direct refutation of the very base of the Turing machine theoretical base.</p>
]]></description><pubDate>Sat, 28 Mar 2026 22:38:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=47558705</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47558705</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47558705</guid></item><item><title><![CDATA[New comment by thesz in "CERN uses ultra-compact AI models on FPGAs for real-time LHC data filtering"]]></title><description><![CDATA[
<p>Thanks.<p>The paper [1] referenced in your link follows the lagacy of the paper on the HIGGS dataset, and does not operate with quantities like accuracy and/or perplexity. HIGGS dataset paper provided area under ROC, from which one had to approximate accuracy. I used accuracy from the ADMM paper [2] to compare my results with. As I checked later, area under ROC in [1] mostly agrees with [2] SGD training results on HIGGS.<p><pre><code>  [1] https://arxiv.org/pdf/2505.19689
  [2] https://proceedings.mlr.press/v48/taylor16.pdf
</code></pre>
I think that perplexity measure is appropriate there in [1] because we need to discern between three outcomes. This calls for softmax and for perplexity as a standard measure.<p>So, my questions are: 1) what perplexity should I target when dealing with "mc-flavtag-ttbar-small" dataset? And 2) what is the split of train/validate/test ratio there?</p>
]]></description><pubDate>Sat, 28 Mar 2026 22:30:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=47558664</link><dc:creator>thesz</dc:creator><comments>https://news.ycombinator.com/item?id=47558664</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47558664</guid></item></channel></rss>