<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ankit219</title><link>https://news.ycombinator.com/user?id=ankit219</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 14 Apr 2026 22:18:23 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ankit219" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ankit219 in "Addressing Antigravity Bans and Reinstating Access"]]></title><description><![CDATA[
<p>this is good.<p>problem is google's security concerns. when people connect gmail to openclaw, google flags the activity as weird and suspend the account because of unusual activity. Many whose accounts got locked because of this and they thought it was because they connected it to antigravity use against the policy (which happened in some cases). We will still see google account suspensions, and that would keep making news. and it wont be because of antigravity usage.</p>
]]></description><pubDate>Sat, 28 Feb 2026 17:10:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=47197703</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=47197703</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47197703</guid></item><item><title><![CDATA[New comment by ankit219 in "Gemini 3.1 Pro"]]></title><description><![CDATA[
<p>not much to do with self improvement as such. openai has increased its pace, others are pretty much consistent. Google last year had three versions of gemini-2.5-pro each within a month of each other. Anthropic released claude 3 in march 24, sonnet 3.5 in june 24, 3.5 new in oct 24, and then 3.7 in feb 25, where they went to 4 series in May 25. then followed by opus 4.1 in august, sonnet 4.5 in oct, opus 4.5 in nov, 4.6 in feb, sonnet 4.6 in feb itself. Yes, they released both within weeks of each other, but originally they only released it together. This staggered release is what creates the impression of fast releases. its as much a function of training as a function of available compute, and they have ramped up in that regard.</p>
]]></description><pubDate>Thu, 19 Feb 2026 18:17:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=47077022</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=47077022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47077022</guid></item><item><title><![CDATA[New comment by ankit219 in "Two different tricks for fast LLM inference"]]></title><description><![CDATA[
<p>> Batching multiple users up thus increases overall throughput at the cost of making users wait for the batch to be full.<p>writer has not heard of continuous batching. this is no longer an issue. this is what makes claude code that affordable. <a href="https://huggingface.co/blog/continuous_batching" rel="nofollow">https://huggingface.co/blog/continuous_batching</a></p>
]]></description><pubDate>Sun, 15 Feb 2026 17:41:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47025656</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=47025656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47025656</guid></item><item><title><![CDATA[New comment by ankit219 in "Two different tricks for fast LLM inference"]]></title><description><![CDATA[
<p>People are misunderstanding Anthropic's fast mode because they chose to name it that way. The hints all point to a specific thing they did. The setup is costlier, its also smarter and better on tougher problems which is unheard of in terms of speed. This paper[1] fits perfectly:<p>The setup is parallel distill and refine. You start with parallel trajectories instead of one, then distill from them, and refine that to get to an answer. Instead of taking all trajectories to completion, they distill it quickly and refine so it gives outputs fast and yet smarter.<p>- paper came out in nov 2025<p>- three months is a good research to production pipeline<p>- one of the authors is at anthropic<p>- this approach will definitely burn more tokens than a usual simple run.<p>- > Anthropic explicitly warns that time to first token might still be slow (or even slower)<p>To what people are saying, speculative decoding wont be smarter or make any difference. Batching could be faster, but then not as costly.<p>Gemini Deepthink and gpt-5.2-pro use the same underlying parallel test time compute but they take each trajectory to completion before distilling and refining for the user.<p>[1]: <a href="https://arxiv.org/abs/2510.01123" rel="nofollow">https://arxiv.org/abs/2510.01123</a></p>
]]></description><pubDate>Sun, 15 Feb 2026 17:33:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47025596</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=47025596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47025596</guid></item><item><title><![CDATA[New comment by ankit219 in "Gemini 3 Deep Think"]]></title><description><![CDATA[
<p>Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets.<p>(i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking)</p>
]]></description><pubDate>Thu, 12 Feb 2026 22:28:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46996197</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46996197</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46996197</guid></item><item><title><![CDATA[Show HN: Open-Source SDK for AI Knowledge Work]]></title><description><![CDATA[
<p>GitHub: <a href="https://github.com/ClioAI/kw-sdk" rel="nofollow">https://github.com/ClioAI/kw-sdk</a><p>Most AI agent frameworks target code. Write code, run tests, fix errors, repeat. That works because code has a natural verification signal. It works or it doesn't.<p>This SDK treats knowledge work like an engineering problem:<p>Task → Brief → Rubric (hidden from executor) → Work → Verify → Fail? → Retry → Pass → Submit<p>The orchestrator coordinates subagents, web search, code execution, and file I/O. then checks its own work against criteria it can't game (the rubric is generated in a separate call and the executor never sees it directly).<p>We originally built this as a harness for RL training on knowledge tasks. The rubric is the reward function. If you're training models on knowledge work, the brief→rubric→execute→verify loop gives you a structured reward signal for tasks that normally don't have one.<p>What makes Knowledge work different from code? (apart from feedback loop)
I believe there is some functionality missing from today's agents when it comes to knowledge work. I tried to include that in this release. Example:<p>Explore mode: Mapping the solution space, identifying the set level gaps, and giving options.<p>Most agents optimize for a single answer, and end up with a median one. For strategy, design, creative problems, you want to see the options, what are the tradeoffs, and what can you do? Explore mode generates N distinct approaches, each with explicit assumptions and counterfactuals ("this works if X, breaks if Y"). The output ends with set-level gaps ie what angles the entire set missed. The gaps are often more valuable than the takes. I think this is what many of us do on a daily basis, but no agent directly captures it today. See <a href="https://github.com/ClioAI/kw-sdk/blob/main/examples/explore_mode.py" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/examples/explore_...</a> and the output for a sense of how this is different.<p>Checkpointing: With many ai agents and especially multi agent systems, i can see where it went wrong, but cant run inference from same stage. (or you may want multiple explorations once an agent has done some tasks like search and is now looking at ideas). I used this for rollouts a lot, and think its a great feature to run again, or fork from a specific checkpoint.<p>A note on Verification loop:
The verify step is where the real leverage is. A model that can accurately assess its own work against a rubric is more valuable than one that generates slightly better first drafts. The rubric makes quality legible — to the agent, to the human, and potentially to a training signal.<p>Some things i like about this: 
- You can pass a remote execution environment (including your browser as a sandbox) and it would work. It can be docker, e2b, your local env, anything, the model will execute commands in your context, and will iterate based on feedback loop. Code execution is a protocol here.<p>- Tool calling: I realize you don't need complex functions. Models are good at writing terminal code, and can iterate based on feedback, so you can just pass either functions in context and model will execute or you can pass docs and model will write the code. (same as anthropic's programmatic tool calling). Details: <a href="https://github.com/ClioAI/kw-sdk/blob/main/TOOL_CALLING_GUIDE.md" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/TOOL_CALLING_GUID...</a><p>Lastly, some guides: 
- SDK guide: <a href="https://github.com/ClioAI/kw-sdk/blob/main/SDK_GUIDE.md" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/SDK_GUIDE.md</a>
- Extensible. See bizarro example where i add a new mode: <a href="https://github.com/ClioAI/kw-sdk/blob/main/examples/custom_mode_bizarro.py" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/examples/custom_m...</a>
- working with files: <a href="https://github.com/ClioAI/kw-sdk/blob/main/examples/with_files.py" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/examples/with_fil...</a> 
- this is simple but i love the csv example: <a href="https://github.com/ClioAI/kw-sdk/blob/main/examples/csv_research_and_calc.py" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/examples/csv_rese...</a>
- remote execution: <a href="https://github.com/ClioAI/kw-sdk/blob/main/examples/with_custom_executor.py" rel="nofollow">https://github.com/ClioAI/kw-sdk/blob/main/examples/with_cus...</a><p>And a lot more. This was completely refactored by opus and given the rework, probably would have taken a lot of time to release it.<p>MIT licensed. Would love your feedback.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46963026">https://news.ycombinator.com/item?id=46963026</a></p>
<p>Points: 21</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 10 Feb 2026 17:06:00 +0000</pubDate><link>https://github.com/ClioAI/kw-sdk</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46963026</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46963026</guid></item><item><title><![CDATA[New comment by ankit219 in "Experts Have World Models. LLMs Have Word Models"]]></title><description><![CDATA[
<p>(author here) great paper to cite.<p>What i think you are referring to is hidden state as in internal representations. I refer to hidden state in game theoretic terms like a private information only one party has. I think we both agree alphazero has hidden states in first sense.<p>Concepts like king safety are objectively useful for winning at chess so alphazero developed it too, no wonder about that. Great example of convergence. However, alphazero did not need to know what i am thinking or how i play to beat me. In poker, you must model a player's private cards and beliefs.</p>
]]></description><pubDate>Mon, 09 Feb 2026 16:18:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=46946938</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46946938</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46946938</guid></item><item><title><![CDATA[New comment by ankit219 in "Experts Have World Models. LLMs Have Word Models"]]></title><description><![CDATA[
<p>Bounded domains require scaling reasoning/compute. Two separate scenarios - one where you have hidden information, other where you have high number of combinations. Reasoning works in second case because it narrows the search space. Eg: a doctor trying to diagnose a patient is just looking at number of possibilities. If not today, when we scale it up, a model will be able to arrive at the right answer. Same goes with Math, the variance or branching for any given problem is very high. But LLMs are good at it. and getting better. A negotiation is not a high variance thing, and low number of combinations, but llms would be repeated bad at it.</p>
]]></description><pubDate>Mon, 09 Feb 2026 16:01:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46946690</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46946690</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46946690</guid></item><item><title><![CDATA[New comment by ankit219 in "Experts Have World Models. LLMs Have Word Models"]]></title><description><![CDATA[
<p>(Author here)<p>I address that in part right there itself. Programming has parts like chess (ie bounded) which is what people assume to be actual work. Understanding future requiremnts / stakeholder incentives is part of the work which LLMs dont do well.<p>> many domains are chess-like in their technical core but become poker-like in their operational context.<p>This applies to programming too.</p>
]]></description><pubDate>Sun, 08 Feb 2026 22:57:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=46939456</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46939456</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46939456</guid></item><item><title><![CDATA[New comment by ankit219 in "OpenClaw is what Apple intelligence should have been"]]></title><description><![CDATA[
<p>> And they would have won the AI race not by building the best model, but by being the only company that could ship an AI you’d actually trust with root access to your computer.<p>and the very next line (because i want to emphasize it<p>> That trust—built over decades—was their moat.<p>This just ignores the history of os development at apple. The entire trajectory is moving towards permissions and sandboxing even if it annoys users to no end. To give access to an llm (any llm, not just a trusted one acc to author) the root access when its susceptible to hallucinations, jailbreak etc. goes against everything Apple has worked for.<p>And even then the reasoning is circular. "So you build all your trust, now go ahead and destroy it on this thing which works, feels good to me, but could occasionally fuck up in a massive way".<p>Not defending Apple, but this article is so far detached from reality that its hard to overstate.</p>
]]></description><pubDate>Thu, 05 Feb 2026 01:38:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=46894507</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46894507</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46894507</guid></item><item><title><![CDATA[New comment by ankit219 in "World Models"]]></title><description><![CDATA[
<p>you are comparing post hoc narratives in the training data to real time learning from causal dynamics. The objectives are different. They may look the same in scenarios where its heavily and accurately documented, but most narratives suffer from survivorship bias and reasoning post facto, eulogising the given outcomes.</p>
]]></description><pubDate>Thu, 29 Jan 2026 17:45:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46813616</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46813616</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46813616</guid></item><item><title><![CDATA[World Models]]></title><description><![CDATA[
<p>Article URL: <a href="https://ankitmaloo.com/world-models/">https://ankitmaloo.com/world-models/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46752138">https://news.ycombinator.com/item?id=46752138</a></p>
<p>Points: 28</p>
<p># Comments: 4</p>
]]></description><pubDate>Sun, 25 Jan 2026 09:03:26 +0000</pubDate><link>https://ankitmaloo.com/world-models/</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46752138</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46752138</guid></item><item><title><![CDATA[New comment by ankit219 in "Auto-compact not triggering on Claude.ai despite being marked as fixed"]]></title><description><![CDATA[
<p>think this particular complaint is about claude ai - the website - and not claude code. I see your point though.</p>
]]></description><pubDate>Fri, 23 Jan 2026 20:48:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46737693</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46737693</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46737693</guid></item><item><title><![CDATA[New comment by ankit219 in "I was banned from Claude for scaffolding a Claude.md file?"]]></title><description><![CDATA[
<p>Its a combination. All caps is used in prompts for extra insistence, and has been common in cases of prompt hijacking. OP was doing it in combination with attempting to direct claude a certain way, multiple times, which <i>might have looked</i> similar to attempting to bypass teh system prompt.</p>
]]></description><pubDate>Thu, 22 Jan 2026 22:08:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46725757</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46725757</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46725757</guid></item><item><title><![CDATA[New comment by ankit219 in "I was banned from Claude for scaffolding a Claude.md file?"]]></title><description><![CDATA[
<p>from what i know, it used to be that if you want to assertively instruct, you used all caps. I don't know if it succeeds today. I still see prompts where certain words are capitalized to ensure model pays attention. What i mean was not just capitalization, but a combination of both capitalization and changing the behavior of the model for trying to get it to do something.<p>if you were to design a system to prevent prompt injections and one of surefire ways is to repeatedly give instructions in caps, you would have systems dealing with it. And with instructions to change behavior, it cascades.</p>
]]></description><pubDate>Thu, 22 Jan 2026 20:47:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46724932</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46724932</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46724932</guid></item><item><title><![CDATA[New comment by ankit219 in "I was banned from Claude for scaffolding a Claude.md file?"]]></title><description><![CDATA[
<p>My rudimentary guess is this. When you write in all caps, it triggers sort of a alert at Anthropic, especially as an attempt to hijack system prompt. When one claude was writing to other, it resorted to all caps, which triggered the alert, and then the context was instructing the model to do something (which likely would be similar to a prompt injection attack) and that triggered the ban. not just caps part, but that in combination of trying to change the system characteristics of claude. OP does not know much better because it seems he wasn't closely watching what claude was writing to other file.<p>if this is true, the learning is opus 4.5 can hijack system prompts of other models.</p>
]]></description><pubDate>Thu, 22 Jan 2026 19:59:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=46724413</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46724413</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46724413</guid></item><item><title><![CDATA[Every big lab is putting resources in building world models]]></title><description><![CDATA[
<p>Article URL: <a href="https://ankitmaloo.com/world-models/">https://ankitmaloo.com/world-models/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46710152">https://news.ycombinator.com/item?id=46710152</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Wed, 21 Jan 2026 19:17:28 +0000</pubDate><link>https://ankitmaloo.com/world-models/</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46710152</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46710152</guid></item><item><title><![CDATA[New comment by ankit219 in "Cursor's latest “browser experiment” implied success without evidence"]]></title><description><![CDATA[
<p>Like it or not, it's a fundraising strategy. They have followed it mutliple times (eg: vague posts about how much their inhouse model is writing code, online RL, and lines of code etc. earlier) and it was less vague before. They released a model and did not give us the exact benchmarks or even tell us the base model for the same. This is not to imply there is no substance behind it, but they are not as public about their findings as one would like them to be. Not a criticism, just an observation.</p>
]]></description><pubDate>Fri, 16 Jan 2026 20:32:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=46651825</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46651825</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46651825</guid></item><item><title><![CDATA[New comment by ankit219 in "Cloudflare threatens Italy exit over €14M fine"]]></title><description><![CDATA[
<p>While the threat is unreasonable, why does Italy wants a site banned globally? Why is it even considered a debate?</p>
]]></description><pubDate>Fri, 16 Jan 2026 04:45:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=46643069</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46643069</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46643069</guid></item><item><title><![CDATA[New comment by ankit219 in "Anthropic Explicitly Blocking OpenCode"]]></title><description><![CDATA[
<p>Not the same.<p>they have usage limits on subscription. I dont know about rate limits. Certainly not per request.</p>
]]></description><pubDate>Thu, 15 Jan 2026 02:27:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46627260</link><dc:creator>ankit219</dc:creator><comments>https://news.ycombinator.com/item?id=46627260</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46627260</guid></item></channel></rss>