<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: starzmustdie</title><link>https://news.ycombinator.com/user?id=starzmustdie</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 14:09:06 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=starzmustdie" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/zafstojano/policy-gradients">https://github.com/zafstojano/policy-gradients</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46659553">https://news.ycombinator.com/item?id=46659553</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 17 Jan 2026 16:53:45 +0000</pubDate><link>https://github.com/zafstojano/policy-gradients</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=46659553</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46659553</guid></item><item><title><![CDATA[New comment by starzmustdie in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>GitHub: <a href="https://github.com/open-thought/reasoning-gym">https://github.com/open-thought/reasoning-gym</a></p>
]]></description><pubDate>Mon, 02 Jun 2025 09:44:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=44157157</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=44157157</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44157157</guid></item><item><title><![CDATA[Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/open-thought/reasoning-gym">https://github.com/open-thought/reasoning-gym</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44110148">https://news.ycombinator.com/item?id=44110148</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 27 May 2025 19:55:06 +0000</pubDate><link>https://github.com/open-thought/reasoning-gym</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=44110148</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44110148</guid></item><item><title><![CDATA[New comment by starzmustdie in "Ask HN: What are you working on? (March 2025)"]]></title><description><![CDATA[
<p>Reasoning Gym (<a href="https://github.com/open-thought/reasoning-gym" rel="nofollow">https://github.com/open-thought/reasoning-gym</a>)<p>A library that procedurally generates datasets for training reasoning models (like o1/r1) with verifiable rewards.</p>
]]></description><pubDate>Sun, 30 Mar 2025 23:27:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=43528844</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=43528844</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43528844</guid></item><item><title><![CDATA[Show HN: Word Game Bench – evaluating language models on word puzzles]]></title><description><![CDATA[
<p>Hey HN!<p>Word Game Bench is a fun benchmark for evaluating language models on word puzzle games. It is a relatively hard benchmark, where no model currently scores above 50% average win rate.<p>Currently, the models are evaluated on 2 tasks:<p>1. Wordle is a word puzzle game where the player has to guess a 5-letter word in 6 attempts. For each letter in the guessed word, the player receives feedback on whether the letter is in the target word and in the correct position.<p>2. Connections is a word association game where the player has to group 16 words into 4 categories of 4 words each. The player doesn't know the categories beforehand, and has to group the shuffled words based on their associations<p>I believe there are several advantages of this benchmark:<p>- Instead of prompting the model once and getting back a response, in this benchmark the model interacts with the game and produces its final output as a result of its own actions/predictions in the previous steps of the game, as well as the feedback it receives from the environment.<p>- Tokenizers are one of the main pain points of language models today. Wordle, by providing character level feedback for the guessed word, tests how well the model incorporates this new knowledge into making a next guess satisfying the constraints of the environment.<p>- On the other hand, Connections is a game that requires the model to reason about the abstract relationships between words and group them into categories.<p>- "Controversially", I don't plan to maintain a fixed evaluation set for reproducibility purposes because of the commonly occurring test set leakage. Each daily puzzle is evaluated only once!<p>Let me know what you think!<p>Page: <a href="https://wordgamebench.github.io" rel="nofollow">https://wordgamebench.github.io</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41401850">https://news.ycombinator.com/item?id=41401850</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 30 Aug 2024 15:51:51 +0000</pubDate><link>https://wordgamebench.github.io/</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=41401850</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41401850</guid></item><item><title><![CDATA[Show HN: Answers to Chip Huyen's ML Interview Questions]]></title><description><![CDATA[
<p>Hi HN,<p>When I was preparing for ML interviews in my first job hunt, I came across Chip Huyen's ML interview questions [1] which I found incredibly helpful.<p>Over several weeks I compiled my answers into a LaTeX document, which I have since open sourced.<p>I thought this document would be useful to other people preparing for their ML roles, especially because there is no centralized and comprehensive repository for most of the answers.<p>Best,
Zafir<p>[1] <a href="https://huyenchip.com/ml-interviews-book/" rel="nofollow">https://huyenchip.com/ml-interviews-book/</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=39716022">https://news.ycombinator.com/item?id=39716022</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 15 Mar 2024 14:17:50 +0000</pubDate><link>https://github.com/zafstojano/ml-interview-questions-and-answers</link><dc:creator>starzmustdie</dc:creator><comments>https://news.ycombinator.com/item?id=39716022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39716022</guid></item></channel></rss>