<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: araghuvanshi</title><link>https://news.ycombinator.com/user?id=araghuvanshi</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 15 Apr 2026 09:27:36 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=araghuvanshi" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by araghuvanshi in "Show HN: Plug and play computer vision tool"]]></title><description><![CDATA[
<p>Wait this is actually pretty good! What interesting use cases have you seen so far?</p>
]]></description><pubDate>Mon, 16 Jun 2025 03:29:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=44286421</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=44286421</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44286421</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Look man, Claude 3, GPT4 etc didn't work for my startup out of the box. I thought it would be helpful to tell others what I went through. Why hate on the truth?</p>
]]></description><pubDate>Wed, 24 Apr 2024 06:42:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=40141385</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40141385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40141385</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Fair, valid point. I do admit that this is far from a perfect analysis. I do hope, though, that it helps people at least classify their problems into categories where they need to design around the flaw rather than just assuming that the thing “just works”. I appreciate the discussion though!</p>
]]></description><pubDate>Wed, 24 Apr 2024 02:42:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=40139883</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40139883</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40139883</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Why should I? If a person told you that they can multiply, divide, add and subtract, would you not also assume that they can at least count?<p>The point here is: the justifications from AI engineers for why counting vs math aren't the same task, while valid, are irrelevant because marketing never brings up the limitation in the first place. So any logical person who doesn't know a lot about AI will arrive at a logical, albeit practically incorrect conclusion.</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:58:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137838</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137838</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137838</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Well LLMs are claimed to be good at math too, and yet they can't count. Same point with the long contexts. And our actual use case (insurance) does need it to do both.<p>My hope from this article is to help non-AI experts figure out when they need to design around a flaw versus believe what's marketed.</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:35:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137596</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137596</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Well the same principle of false advertising re: context window sizes also applies to its inability to count, no? AI companies claim that their models can do math, so wouldn't a regular developer assume that they can also count?<p>And if I can't trust a so-called SOTA model to partially answer - say, recall each mention of the word "wizard" instead of just giving me the wrong answer - then why should I trust it to list out specific scenes? That's even harder to benchmark.</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:18:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137407</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137407</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137407</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Direct quote from Anthropic's website: "Opus -Our most intelligent model, which can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks."<p>So you tell me: if a regular developer reads the above, how can they surmise that the model which can do higher-order math can't count?</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:12:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137328</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137328</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137328</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>I don't think that this is obvious at all. Yes, AI people who read papers on arxiv and know what "SOTA" stands for know it, but that is no longer the main user base of LLMs.<p>This is meant to be for the developer who doesn't fit the above profile and thinks a model that has a million token context window and "can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks" (direct quote from Anthropic's website), actually can do those things.</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:10:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137307</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137307</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137307</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>How much context? One sentence? Two? One paragraph? One page? It's very similar to the insurance policy problem - the text surrounding the information you're looking for, which could be surrounding it by one sentence or 10 pages, is just as important as the information itself</p>
]]></description><pubDate>Tue, 23 Apr 2024 21:08:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=40137275</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40137275</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40137275</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Totally agree there. And that's one of my points: you have to design around this flaw by doing things like what you proposed (or build an ontology like we did, which is also helpful). And the first step in this process is figuring out whether your task falls into a category like the ones I described.<p>The structured output element is really important too - subject for another post though!</p>
]]></description><pubDate>Tue, 23 Apr 2024 20:04:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136625</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136625</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>I'm talking about the fact that they boast about their models having large context windows. And Anthropic says: "Opus - Our most intelligent model, which can handle complex analysis, longer tasks with multiple steps, and higher-order math and coding tasks." So if I were a non-AI expert, would I not infer that because it can do "higher order math tasks" it can also count?</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:56:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136532</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136532</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136532</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>That's true, but the problem of long context understanding (say, "summarize each of the situations where the word 'wizard' is mentioned") remains. And that gets much closer to the insurance policy thing.</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:49:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136416</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136416</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136416</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Then why do the creators of this vacuum advertise the fact that it's really good at raking? And unlike your analogy, to actually figure out that it's bad at raking you have to read a bunch of academic papers?</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:39:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136278</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136278</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136278</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Do most readers know that if you give a so-called million token context model that many tokens, it'll actually stop paying attention after the first ~30k tokens? And that if they were to try to use this product for anything serious, they would encounter hallucinations and incompleteness that could have material implications?<p>Not everything needs to be entertaining to be useful.</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:36:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136237</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136237</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136237</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Please see my comment below, and the "Why should I care" section of the post. Yes you can count the number of times the word "wizard" is mentioned, but for tasks that aren't quite as cut-and-dry (say, listing out all of the core arguments of a 100-page legal case), you cannot just write a Python script.<p>The agentic approach falls apart because again, a self-querying mechanism or a multi-agent framework still needs to know where in the document to look for each subset of information. That's why I argue that you need an ontology. And at that point, agents are moot. A small 7b model with a simple prompt suffices, without any of the unreliability of agents. I suggest trying agents on an actually serious document, the problems are pretty evident. That said, I do hope that they get there one day because it will be cool.</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:32:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136202</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136202</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136202</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Some counterarguments:
1. If an AI company promises that their LLM has a million token context window, but in practice it only pays attention to the first and last 30k tokens, and then hallucinates, that is a bad practice. And prompt construction does not help here - the issue is with the fundamentals of how LLMs actually work. Proof: <a href="https://arxiv.org/abs/2307.03172" rel="nofollow">https://arxiv.org/abs/2307.03172</a>
2. Regarding writing the code snippet: as I described in my post, the main issue is that the model does not understand the relationships between information in the long document. So yes, it can write a script that counts the number of times the word "wizard" appears, but if I gave it a legal case of similar length, how would it write a script that extracts all of the core arguments that live across tens of pages?</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:24:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=40136103</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40136103</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40136103</guid></item><item><title><![CDATA[New comment by araghuvanshi in "LLMs and the Harry Potter problem"]]></title><description><![CDATA[
<p>Good share, thank you! Yeah I think Contextual AI has also been doing some interesting work in this area. Glossary is definitely interesting and an area we're looking into. Curious to see what work is being done with building knowledge graphs, that's another area where we've seen positive results.</p>
]]></description><pubDate>Tue, 23 Apr 2024 19:16:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=40135998</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40135998</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40135998</guid></item><item><title><![CDATA[LLMs and the Harry Potter problem]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.pyqai.com/blog/llms-and-the-harry-potter-problem">https://www.pyqai.com/blog/llms-and-the-harry-potter-problem</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=40134392">https://news.ycombinator.com/item?id=40134392</a></p>
<p>Points: 65</p>
<p># Comments: 61</p>
]]></description><pubDate>Tue, 23 Apr 2024 17:13:42 +0000</pubDate><link>https://www.pyqai.com/blog/llms-and-the-harry-potter-problem</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=40134392</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40134392</guid></item><item><title><![CDATA[New comment by araghuvanshi in "Show HN: Unify – Dynamic LLM Benchmarks and SSO for Multi-Vendor Deployment"]]></title><description><![CDATA[
<p>The cost comparisons for the same model are interesting. I'm curious about why certain providers are a lot cheaper than others - for example, mistral-8-7b on OctoAI costs $0.2/1m tokens whereas it's $0.66 using mistral's own inference. My best guess is that one includes a cold start. Any thoughts there?</p>
]]></description><pubDate>Wed, 07 Feb 2024 21:34:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=39294616</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=39294616</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39294616</guid></item><item><title><![CDATA[New comment by araghuvanshi in "Show HN: depot.ai – easily embed ML / AI models in your Dockerfile"]]></title><description><![CDATA[
<p>We use this product at pyq and I have to say, the speedup in our build times is amazing. Especially when HuggingFace is slow/backed up</p>
]]></description><pubDate>Tue, 18 Jul 2023 18:29:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=36777163</link><dc:creator>araghuvanshi</dc:creator><comments>https://news.ycombinator.com/item?id=36777163</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36777163</guid></item></channel></rss>