Hacker News: starzmustdie

A minimal hackable implementation of policy gradients (GRPO, PPO, REINFORCE)

starzmustdie — Sat, 17 Jan 2026 16:53:45 +0000

Article URL: https://github.com/zafstojano/policy-gradients

Comments URL: https://news.ycombinator.com/item?id=46659553

Points: 1

# Comments: 0

New comment by starzmustdie in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

starzmustdie — Mon, 02 Jun 2025 09:44:09 +0000

GitHub: https://github.com/open-thought/reasoning-gym

Reasoning Gym: Procedural Dataset Generation for Reinforcement Learning

starzmustdie — Tue, 27 May 2025 19:55:06 +0000

Article URL: https://github.com/open-thought/reasoning-gym

Comments URL: https://news.ycombinator.com/item?id=44110148

Points: 1

# Comments: 0

New comment by starzmustdie in "Ask HN: What are you working on? (March 2025)"

starzmustdie — Sun, 30 Mar 2025 23:27:59 +0000

Reasoning Gym (https://github.com/open-thought/reasoning-gym)

A library that procedurally generates datasets for training reasoning models (like o1/r1) with verifiable rewards.

Show HN: Word Game Bench – evaluating language models on word puzzles

starzmustdie — Fri, 30 Aug 2024 15:51:51 +0000

Hey HN!

Word Game Bench is a fun benchmark for evaluating language models on word puzzle games. It is a relatively hard benchmark, where no model currently scores above 50% average win rate.

Currently, the models are evaluated on 2 tasks:

1. Wordle is a word puzzle game where the player has to guess a 5-letter word in 6 attempts. For each letter in the guessed word, the player receives feedback on whether the letter is in the target word and in the correct position.

2. Connections is a word association game where the player has to group 16 words into 4 categories of 4 words each. The player doesn't know the categories beforehand, and has to group the shuffled words based on their associations

I believe there are several advantages of this benchmark:

- Instead of prompting the model once and getting back a response, in this benchmark the model interacts with the game and produces its final output as a result of its own actions/predictions in the previous steps of the game, as well as the feedback it receives from the environment.

- Tokenizers are one of the main pain points of language models today. Wordle, by providing character level feedback for the guessed word, tests how well the model incorporates this new knowledge into making a next guess satisfying the constraints of the environment.

- On the other hand, Connections is a game that requires the model to reason about the abstract relationships between words and group them into categories.

- "Controversially", I don't plan to maintain a fixed evaluation set for reproducibility purposes because of the commonly occurring test set leakage. Each daily puzzle is evaluated only once!

Let me know what you think!

Page: https://wordgamebench.github.io

Comments URL: https://news.ycombinator.com/item?id=41401850

Points: 1

# Comments: 0

Show HN: Answers to Chip Huyen's ML Interview Questions

starzmustdie — Fri, 15 Mar 2024 14:17:50 +0000

Hi HN,

When I was preparing for ML interviews in my first job hunt, I came across Chip Huyen's ML interview questions [1] which I found incredibly helpful.

Over several weeks I compiled my answers into a LaTeX document, which I have since open sourced.

I thought this document would be useful to other people preparing for their ML roles, especially because there is no centralized and comprehensive repository for most of the answers.

Best, Zafir

[1] https://huyenchip.com/ml-interviews-book/

Comments URL: https://news.ycombinator.com/item?id=39716022

Points: 3

# Comments: 0