<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: techbruv</title><link>https://news.ycombinator.com/user?id=techbruv</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 05:56:22 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=techbruv" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by techbruv in "How uv got so fast"]]></title><description><![CDATA[
<p>At a previous job, I recall updating a dependency via poetry would take on the order of ~5-30m. God forbid after 30 minutes something didn’t resolve and you had to wait another 30 minutes to see if the change you made fixed the problem. Was not an enjoyable experience.<p>uv has been a delight to use</p>
]]></description><pubDate>Fri, 26 Dec 2025 21:30:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=46396489</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=46396489</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46396489</guid></item><item><title><![CDATA[New comment by techbruv in "Claude’s memory architecture is the opposite of ChatGPT’s"]]></title><description><![CDATA[
<p>I don’t understand the argument “AI is just XYZ mechanism, therefore it cannot be intelligent”.<p>Does the mechanism really disqualify it from intelligence if behaviorally, you cannot distinguish it from “real” intelligence?<p>I’m not saying that LLMs have certainly surpassed the “cannot distinguish from real intelligence” threshold, but saying there’s not even a little bit of intelligence in a system that can solve more complex math problems than I can seems like a stretch.</p>
]]></description><pubDate>Thu, 11 Sep 2025 20:32:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=45215835</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=45215835</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45215835</guid></item><item><title><![CDATA[New comment by techbruv in "Better and Faster Large Language Models via Multi-Token Prediction"]]></title><description><![CDATA[
<p>> So it will not get worse in performance but only faster<p>A bit confused by this statement. Speculative decoding does not decrease the performance of the model in terms of "accuracy" or "quality" of output. Mathematically, the altered distribution being sampled from is identical to the original distribution if you had just used regular autoregressive decoding. The only reason you get variability between autoregressive vs speculative is simply due to randomness.<p>Unless you meant performance as in "speed", in which case it's possible that speculative decoding could degrade speed (but on most inputs, and with a good selection of the draft model, this shouldn't be the case).</p>
]]></description><pubDate>Wed, 01 May 2024 13:42:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=40223112</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=40223112</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40223112</guid></item><item><title><![CDATA[New comment by techbruv in "What is a transformer model? (2022)"]]></title><description><![CDATA[
<p>Some other good resources:<p>[0]: The original paper: <a href="https://arxiv.org/abs/1706.03762" rel="nofollow noreferrer">https://arxiv.org/abs/1706.03762</a><p>[1]: Full walkthrough for building a GPT from Scratch: <a href="https://www.youtube.com/watch?v=kCc8FmEb1nY">https://www.youtube.com/watch?v=kCc8FmEb1nY</a><p>[2]: A simple inference only implementation in just NumPy, that's only 60 lines: <a href="https://jaykmody.com/blog/gpt-from-scratch/" rel="nofollow noreferrer">https://jaykmody.com/blog/gpt-from-scratch/</a><p>[3]: Some great visualizations and high-level explanations: <a href="http://jalammar.github.io/illustrated-transformer/" rel="nofollow noreferrer">http://jalammar.github.io/illustrated-transformer/</a><p>[4]: An implementation that is presented side-by-side with the original paper: <a href="https://nlp.seas.harvard.edu/2018/04/03/attention.html" rel="nofollow noreferrer">https://nlp.seas.harvard.edu/2018/04/03/attention.html</a></p>
]]></description><pubDate>Fri, 23 Jun 2023 18:55:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=36450844</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=36450844</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36450844</guid></item><item><title><![CDATA[New comment by techbruv in "PaLM 2 Technical Report [pdf]"]]></title><description><![CDATA[
<p>The idea that GPT-4 is 1 trillion parameters has been refuted by Sam Altman himself on the Lex Fridman podcast (THIS IS WRONG, SEE CORRECTION BELOW).<p>These days, the largest models that have been trained optimally (in terms of model size w.r.t. tokens) typically hover around 50B (likely PaLM 2-L size and LLaMa is maxed at 70B). We simply do not have enough pre-training data to optimally train a 1T parameter model. For GPT-4 to be 1 trillion parameters, OpenAI would have needed to:<p>1) somehow magically unlocked 20x the amount of data (1T tokens -> 20T tokens)
2) somehow engineered an incredibly fast inference engine for a 1T GPT model that significantly better than anything anyone else has built
3) is somehow is able to eat the cost of hosting 1T parameter models<p>The probability that all the above 3 have happened seem incredibly low.<p>CORRECTION: The refutation for the size of GPT-4 on the lex fridman podcast was that GPT-4 was 100T parameters (and not directly, they were just joking about it), not 1T, however, the above 3 points still stand.</p>
]]></description><pubDate>Wed, 10 May 2023 19:51:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=35892585</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=35892585</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35892585</guid></item><item><title><![CDATA[New comment by techbruv in "PaLM 2 Technical Report [pdf]"]]></title><description><![CDATA[
<p>> "We then train several models from 400M to 15B on the same pre-training mixture for up to 1 × 1022 FLOPs."<p>Seems that for the last year or so these models are getting smaller. I would be surprised if GPT-4 had > the number of parameters as GPT-3 (i.e. 175B).<p>Edit: Seems those numbers are just for their scaling laws study. They don't explicitly say the size of PaLM 2-L, but they do say "The largest model in the PaLM 2 family, PaLM 2-L, is significantly smaller than the largest PaLM model but uses more training compute.". So likely on the range of 10B - 100B.</p>
]]></description><pubDate>Wed, 10 May 2023 19:23:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=35892214</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=35892214</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=35892214</guid></item><item><title><![CDATA[New comment by techbruv in "What is ChatGPT doing and why does it work?"]]></title><description><![CDATA[
<p>ChatGPT and other LLMs for that matter are most definitely not using beam search or greedy sampling.<p>Greedy sampling is prone to repetition and just in general gives pretty subpar results that make no sense.<p>While beam search is better than greedy sampling, it's too expensive (beam search with a beam width of 4 is 4x more expensive) and performs worse than other methods.<p>In practice, you probably just wanna sample from the distribution directly after applying something like top-p: <a href="https://arxiv.org/pdf/1904.09751.pdf" rel="nofollow">https://arxiv.org/pdf/1904.09751.pdf</a></p>
]]></description><pubDate>Tue, 14 Feb 2023 23:08:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=34797548</link><dc:creator>techbruv</dc:creator><comments>https://news.ycombinator.com/item?id=34797548</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=34797548</guid></item></channel></rss>