<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: kcorbitt</title><link>https://news.ycombinator.com/user?id=kcorbitt</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 02 May 2026 11:54:13 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=kcorbitt" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Codex, File My Taxes. Make No Mistakes]]></title><description><![CDATA[
<p>Article URL: <a href="https://corbt.com/posts/codex-file-my-taxes-make-no-mistakes">https://corbt.com/posts/codex-file-my-taxes-make-no-mistakes</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47327098">https://news.ycombinator.com/item?id=47327098</a></p>
<p>Points: 5</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 10 Mar 2026 18:33:14 +0000</pubDate><link>https://corbt.com/posts/codex-file-my-taxes-make-no-mistakes</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=47327098</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47327098</guid></item><item><title><![CDATA[A Pocket Guide to Surviving the Robot Apocalypse]]></title><description><![CDATA[
<p>Article URL: <a href="https://corbt.com/posts/a-pocket-guide-to-surviving-the-robot-apocalypse/">https://corbt.com/posts/a-pocket-guide-to-surviving-the-robot-apocalypse/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47040155">https://news.ycombinator.com/item?id=47040155</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 16 Feb 2026 20:50:03 +0000</pubDate><link>https://corbt.com/posts/a-pocket-guide-to-surviving-the-robot-apocalypse/</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=47040155</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47040155</guid></item><item><title><![CDATA[New comment by kcorbitt in "My AI Adoption Journey"]]></title><description><![CDATA[
<p>And lately, the sweet spot has been moving upwards every 6-8 weeks with the model release cycle.</p>
]]></description><pubDate>Thu, 05 Feb 2026 22:52:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46906564</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=46906564</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46906564</guid></item><item><title><![CDATA[New comment by kcorbitt in "Do not mistake a resilient global economy for populist success"]]></title><description><![CDATA[
<p>Is it?</p>
]]></description><pubDate>Fri, 09 Jan 2026 07:05:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=46550890</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=46550890</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46550890</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: RULER – Easily apply RL to any agent"]]></title><description><![CDATA[
<p>Dang, hadn't seen that. Namespace collision strikes again.</p>
]]></description><pubDate>Fri, 11 Jul 2025 23:41:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44537934</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44537934</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44537934</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: RULER – Easily apply RL to any agent"]]></title><description><![CDATA[
<p>I really like RLPR for when you have a known-good answer to compare to as well!</p>
]]></description><pubDate>Fri, 11 Jul 2025 23:41:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=44537930</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44537930</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44537930</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: RULER – Easily apply RL to any agent"]]></title><description><![CDATA[
<p>No, we don't do anything. Theoretically we could judge several times with different ordering.<p>We could measure order bias really easily though; we just need to look at the average score by rollout position across many runs. I'll add that to my list of experiments!</p>
]]></description><pubDate>Fri, 11 Jul 2025 23:40:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44537925</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44537925</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44537925</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: RULER – Easily apply RL to any agent"]]></title><description><![CDATA[
<p>Thank! If there are any topics that you'd find particularly interesting, let me know and I can try to find time. :)</p>
]]></description><pubDate>Fri, 11 Jul 2025 21:06:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=44536788</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44536788</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44536788</guid></item><item><title><![CDATA[Show HN: RULER – Easily apply RL to any agent]]></title><description><![CDATA[
<p>Hey HN, Kyle here, one of the co-founders of OpenPipe.<p>Reinforcement learning is one of the best techniques for making agents more reliable, and has been widely adopted by frontier labs. However, adoption in the outside community has been slow because it's so hard to implement.<p>One of the biggest challenges when adapting RL to a new task is the need for a task-specific "reward function" (way of measuring success). This is often difficult to define, and requires either high-quality labeled data and/or significant domain expertise to generate.<p>RULER is a drop-in reward function that works across different tasks without any of that complexity.<p>It works by showing N trajectories to an LLM judge and asking it to rank them relative to each other. This sidesteps the calibration issues that plague most LLM-as-judge approaches. Combined with GRPO (which only cares about relative scores within groups), it just works (surprisingly well!).<p>We have a full writeup on the blog, including results on 4 production tasks. On all 4 tasks, small Qwen 2.5 models trained with RULER+GRPO beat the best prompted frontier model, despite being significantly smaller and cheaper to run. Surprisingly, they even beat models trained with hand-crafted reward functions on 3/4 tasks! <a href="https://openpipe.ai/blog/ruler">https://openpipe.ai/blog/ruler</a><p>Repo: <a href="https://github.com/OpenPipe/ART">https://github.com/OpenPipe/ART</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44535078">https://news.ycombinator.com/item?id=44535078</a></p>
<p>Points: 81</p>
<p># Comments: 11</p>
]]></description><pubDate>Fri, 11 Jul 2025 17:47:36 +0000</pubDate><link>https://openpipe.ai/blog/ruler</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44535078</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44535078</guid></item><item><title><![CDATA[New comment by kcorbitt in "Lossless LLM 3x Throughput Increase by LMCache"]]></title><description><![CDATA[
<p>Looks cool! With vLLM v1, prefix caching is enabled by default and seems quite performant. Is the advantage of LMCache the fact that you can offload to CPU and disk as well? How much is throughput/latency affected if you need to pull a large KV cache from disk/cpu instead of GPU RAM?<p>Also, how realistic would it be to share the KV cache across vllm nodes within a data center? It would be really nice to be able to freely distribute requests to a pool of vLLM workers without worrying about prefix-aware routing, but maybe that isn't the right approach because moving the KV cache around would be too slow?</p>
]]></description><pubDate>Sat, 28 Jun 2025 14:57:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=44405139</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44405139</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44405139</guid></item><item><title><![CDATA[New comment by kcorbitt in "Fault Tolerant Llama training"]]></title><description><![CDATA[
<p>I was curious about this so I had o3 do a bit of research. Turns out 300 L40s have more compute than any supercomputer before 2013 (and arguably before 2016, depending on how you count reduced-precision FLOPs).<p><a href="https://chatgpt.com/share/685dea79-26ec-8002-bd62-7ed83aedf4a5" rel="nofollow">https://chatgpt.com/share/685dea79-26ec-8002-bd62-7ed83aedf4...</a></p>
]]></description><pubDate>Fri, 27 Jun 2025 00:49:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44392891</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44392891</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44392891</guid></item><item><title><![CDATA[New comment by kcorbitt in "Self-Adapting Language Models"]]></title><description><![CDATA[
<p>The real answer is that nobody trusts their automated evals enough to be confident that any given automatically-trained release actually improves performance, even if eval scores go up. So for now everyone batches up updates and vibe-checks them before rolling them out.</p>
]]></description><pubDate>Fri, 13 Jun 2025 23:26:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=44273151</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44273151</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44273151</guid></item><item><title><![CDATA[Everything I know about reward hacking]]></title><description><![CDATA[
<p>Article URL: <a href="https://openpipe.ai/blog/reward-hacking">https://openpipe.ai/blog/reward-hacking</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44260189">https://news.ycombinator.com/item?id=44260189</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 12 Jun 2025 17:14:46 +0000</pubDate><link>https://openpipe.ai/blog/reward-hacking</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44260189</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44260189</guid></item><item><title><![CDATA[New comment by kcorbitt in "Look Ma, No Bubbles: Designing a Low-Latency Megakernel for Llama-1B"]]></title><description><![CDATA[
<p>It seems like the speedups here are most useful for small models, since on larger models a smaller fraction of the total time would be spent swapping between kernels? Would be interesting to see at least theoretical results for LLMs in the 14-70B parameter range, which is what most folks deploy in practice.<p>And of course the effect on throughput at larger batch sizes, which they allude to at the end.<p>Overall a very interesting result!</p>
]]></description><pubDate>Wed, 28 May 2025 02:22:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=44112248</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44112248</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44112248</guid></item><item><title><![CDATA[New comment by kcorbitt in "Sorry, grads: Entry-level tech jobs are getting wiped out"]]></title><description><![CDATA[
<p>There are <i>many</i> industries where you need lots of experience before you're a net contributor to productivity. This is true for everything from hairdressers to doctors. We have ways of dealing with this (eg. taking out loans to undergo years of training).<p>The problem comes if the number of years of experience you need to outperform the frontier AI models advances at more than 1 per year, which is not out of the question.</p>
]]></description><pubDate>Thu, 22 May 2025 14:22:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=44062358</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44062358</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44062358</guid></item><item><title><![CDATA[New comment by kcorbitt in "Gemma 3n preview: Mobile-first AI"]]></title><description><![CDATA[
<p>I wonder if they've trained the model to operate with a shallower stack; eg. the full model may be composed of 24 transformer blocks, but they've also trained it to accept embeddings at layer 8, so it can be operated with just 16 transformer blocks on lower-resourced devices.<p>Experimenters in the open source tinkering community have done the opposite (copy/pasting layers in existing models to make them deeper) and it seems to work... fine, with minimal post-training on the new, deeper model required to exceed the performance of the original model. So it's not a crazy idea.</p>
]]></description><pubDate>Tue, 20 May 2025 20:46:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=44045751</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44045751</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44045751</guid></item><item><title><![CDATA[New comment by kcorbitt in "Windsurf SWE-1: Our First Frontier Models"]]></title><description><![CDATA[
<p>It's very unlikely that they're doing their own pre-training, which is the longest and most expensive part of creating a frontier model (if they were, they'd likely brag about it).<p>Most likely they built this as a post-train of an open model that is already strong on coding like Qwen 2.5.</p>
]]></description><pubDate>Fri, 16 May 2025 06:18:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=44002309</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=44002309</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44002309</guid></item><item><title><![CDATA[New comment by kcorbitt in "The unreasonable effectiveness of an LLM agent loop with tool use"]]></title><description><![CDATA[
<p>For "that last 10% of reliability" RL is actually working pretty well right now too! <a href="https://openpipe.ai/blog/art-e-mail-agent">https://openpipe.ai/blog/art-e-mail-agent</a></p>
]]></description><pubDate>Thu, 15 May 2025 21:40:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=43999593</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=43999593</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43999593</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: ART – a new open-source RL framework for training agents"]]></title><description><![CDATA[
<p>Ok good questions here.<p>By fine-tuning in this context I assume you mean "supervised fine-tuning", or SFT. SFT trains a model to produce a specific string of output tokens, given an input. With SFT, if you were trying to train an assistant to solve math problems using a code interpreter, you might train it on a dataset that looks like:<p><pre><code>    input: 'What is 934+1208'  
    output: `print(934+1208)`

    input: 'how many "r"s in strawberry'
    output: `print(len([l for l in "strawberry" if l == 'r'])`
</code></pre>
etc, etc.<p>RL, on the other hand, just means training a model not to produce a concrete string of output tokens, but rather to create an output that maximizes some reward function (you get to decide on the reward).<p>For the example above, you might create the following dataset for RL training:<p><pre><code>    input: 'What is 934+1208'
    ground_truth: 2142

    input: 'how many "r"s in strawberry'
    ground_truth: 3
</code></pre>
You would then train the model to write python code that produces the ground_truth output. Your training code would take the model's output, run the python it produced, and then check whether the output matches the expected ground_truth. Importantly, this doesn't require you actually writing the code to solve the problem (you don't even have to know if it's solvable, technically!). Over time, the training loop would make the model more likely to produce outputs that get high rewards, which hopefully means it gets better at producing valid and applicable python.<p>This is useful in lots of domains where it's easier to check the answer than actually produce it. In the blog post[1] linked above, we train the agent to effectively use keyword search to try to find the correct emails in an inbox. As the model trainer, I didn't actually know what the right strategy was to choose keywords that would most quickly find the relevant email, but through training with RL, the model was able to figure it out on its own!<p>[1]: <a href="https://openpipe.ai/blog/art-e-mail-agent?refresh=1746030513873">https://openpipe.ai/blog/art-e-mail-agent?refresh=1746030513...</a></p>
]]></description><pubDate>Wed, 30 Apr 2025 19:30:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=43849667</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=43849667</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43849667</guid></item><item><title><![CDATA[New comment by kcorbitt in "Show HN: ART – a new open-source RL framework for training agents"]]></title><description><![CDATA[
<p>Figured now was a good time to post this since we recently got surprisingly good results on training an email research agent. Link is above, but will put it here as well since I think it's a good example of RL's promise: <a href="https://openpipe.ai/blog/art-e-mail-agent">https://openpipe.ai/blog/art-e-mail-agent</a></p>
]]></description><pubDate>Wed, 30 Apr 2025 17:47:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=43848552</link><dc:creator>kcorbitt</dc:creator><comments>https://news.ycombinator.com/item?id=43848552</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43848552</guid></item></channel></rss>