Hacker News: henriquegodoy

Extract-0: A specialized language model for document information extraction

henriquegodoy — Tue, 30 Sep 2025 16:31:27 +0000

Article URL: https://arxiv.org/abs/2509.22906

Comments URL: https://news.ycombinator.com/item?id=45427634

Points: 195

# Comments: 58

Learning Perl in one day and the importance of building strong foundations

henriquegodoy — Wed, 27 Aug 2025 12:28:04 +0000

Article URL: https://guilhermenl.dev/articles/9096ed7725d387606d713e7964e2b3ac06f9bebd2650080b9ca070f0106f5c70

Comments URL: https://news.ycombinator.com/item?id=45038732

Points: 3

# Comments: 0

Alvorada-Bench: Can Language Models Solve Brazilian University Entrance Exams?

henriquegodoy — Mon, 25 Aug 2025 15:35:12 +0000

Article URL: https://arxiv.org/abs/2508.15835

Comments URL: https://news.ycombinator.com/item?id=45014995

Points: 1

# Comments: 0

New comment by henriquegodoy in "Launch HN: Design Arena (YC S25) – Head-to-head AI benchmark for aesthetics"

henriquegodoy — Tue, 12 Aug 2025 17:08:59 +0000

This is actually really needed, current ai design tools are so predictable and formulaic, like every output feels like the same purple gradients with rounded corners and that one specific sans serif font that every model seems obsessed with, it's gotten to the point where you can spot ai-generated designs from a mile away because they all have this weird sterile aesthetic that screams "made by a model"

New comment by henriquegodoy in "Show HN: Omnara – Run Claude Code from anywhere"

henriquegodoy — Tue, 12 Aug 2025 17:03:48 +0000

This is pretty cool and feels like we're heading in the right direction, the whole idea of being able to hop between devices while claude code is thinking through problems is neat, but honestly what excites me more is the broader pattern here, like we're moving toward a world where coding isn't really about sitting down and grinding out syntax for hours, it's becoming more about organizing tasks and letting ai agents figure out the implementation details.

I can already see how this evolves into something where you're basically managing a team of specialized agents rather than doing the actual coding, you set up some high-level goals, maybe break them down into chunks, and then different agents pick up different pieces and coordinate with each other, the human becomes more like a project manager making decisions when the agents get stuck or need direction, imho tools like omnara are just the first step toward that, right now it's one agent that needs your input occasionally, but eventually it'll probably be orchestrating multiple agents working in parallel, way better than sitting there watching progress bars for 10 minutes.

New comment by henriquegodoy in "Evaluating LLMs playing text adventures"

henriquegodoy — Tue, 12 Aug 2025 17:00:14 +0000

Looking at this evaluation it's pretty fascinating how badly these models perform even on decades old games that almost certainly have walkthroughs scattered all over their training data. Like, you'd think they'd at least brute force their way through the early game mechanics by now, but honestly this kinda validates something I've been thinking about like real intelligence isn't just about having seen the answers before, it's about being good at games and specifically new situations where you can't just pattern match your way out

This is exactly why something like arc-agi-3 feels so important right now. Instead of static benchmarks that these models can basically brute force with enough training data, like designing around interactive environments where you actually need to perceive, decide, and act over multiple steps without prior instructions, that shift from "can you reproduce known patterns" to "can you figure out new patterns" seems like the real test of intelligence.

What's clever about the game environment approach is that it captures something fundamental about human intelligence that static benchmarks miss entirely, like, when humans encounter a new game, we explore, form plans, remember what worked, adjust our strategy all that interactive reasoning over time that these text adventure results show llms are terrible at, we need systems that can actually understand and adapt to new situations, not just really good autocomplete engines that happen to know a lot of trivia.

New comment by henriquegodoy in "Claude Sonnet 4 now supports 1M tokens of context"

henriquegodoy — Tue, 12 Aug 2025 16:49:38 +0000

Thats incredible to see how ai models are improving, i'm really happy with this news. (imo it's more impactful than the release of gpt5) now, we need more tokens per second, and then the self-improvement of the model will accelerate.

New comment by henriquegodoy in "GPT-5"

henriquegodoy — Thu, 07 Aug 2025 18:41:00 +0000

That SWE-bench chart with the mismatched bars (52.8% somehow appearing larger than 69.1%) was emblematic of the entire presentation - rushed and underwhelming. It's the kind of error that would get flagged in any internal review, yet here it is in a billion-dollar product launch. Combined with the Bernoulli effect demo confidently explaining how airplane wings work incorrectly (the equal transit time fallacy that NASA explicitly debunks), it doesn't inspire confidence in either the model's capabilities or OpenAI's quality control.

The actual benchmark improvements are marginal at best - we're talking single-digit percentage gains over o3 on most metrics, which hardly justifies a major version bump. What we're seeing looks more like the plateau of an S-curve than a breakthrough. The pricing is competitive ($1.25/1M input tokens vs Claude's $15), but that's about optimization and economics, not the fundamental leap forward that "GPT-5" implies. Even their "unified system" turns out to be multiple models with a router, essentially admitting that the end-to-end training approach has hit diminishing returns.

The irony is that while OpenAI maintains their secretive culture (remember when they claimed o1 used tree search instead of RL?), their competitors are catching up or surpassing them. Claude has been consistently better for coding tasks, Gemini 2.5 Pro has more recent training data, and everyone seems to be converging on similar performance levels. This launch feels less like a victory lap and more like OpenAI trying to maintain relevance while the rest of the field has caught up. Looking forward to seeing what Gemini 3.0 brings to the table.

New comment by henriquegodoy in "GPT-5 for Developers"

henriquegodoy — Thu, 07 Aug 2025 18:35:19 +0000

I dont think there's so much difference from opus 4.1 and gpt-5, probably just the context size, waiting for the gemini 3.0

New comment by henriquegodoy in "I gave the AI arms and legs then it rejected me"

henriquegodoy — Wed, 06 Aug 2025 12:31:03 +0000

I think this blog post was the best way to get into Anthropic, and it was well-deserved. That's the reality of hiring in tech: there are many non-technical people judging whether technical people are competent or not. Escaping that matrix through things like blog posts, cold emails, and Twitter threads can be great ways to break in and get noticed by these companies.

New comment by henriquegodoy in "Open models by OpenAI"

henriquegodoy — Tue, 05 Aug 2025 18:16:24 +0000

Seeing a 20B model competing with o3's performance is mind blowing like just a year ago, most of us would've called this impossible - not just the intelligence leap, but getting this level of capability in such a compact size.

I think that the point that makes me more excited is that we can train trillion-parameter giants and distill them down to just billions without losing the magic. Imagine coding with Claude 4 Opus-level intelligence packed into a 10B model running locally at 2000 tokens/sec - like instant AI collaboration. That would fundamentally change how we develop software.

New comment by henriquegodoy in "Vibe code is legacy code"

henriquegodoy — Wed, 30 Jul 2025 22:04:42 +0000

I'm seeing a real-world example of Jevons paradox playing out here. When AI coding tools first emerged, everyone predicted mass developer unemployment. Instead, I'm watching demand for skilled developers actually increase.

What's happening is that all this "vibe coded" software needs someone to fix it when it breaks. I've been getting more requests than ever to debug AI-generated codebases where the original "developer" can't explain what any of it does. The security audit work alone is keeping me busy - these AI-generated apps often have vulnerabilities that would never pass a human code review. It reminds me of when WordPress democratized web development. Suddenly everyone could build a website, but that just created a massive market for developers who could fix broken WordPress sites, migrate databases, and patch security holes. The difference now is the scale and complexity. At least with WordPress, there was some underlying structure you could reason about. With vibe coding, you get these sprawling codebases where the AI has reinvented the wheel five different ways in the same project, used deprecated libraries because they were in its training data, and created bizarre architectural decisions that only make sense if you don't understand the problem domain.

So yeah, the jobs aren't disappearing - they're just shifting from "build new features" to "fix the mess the PM made last weekend when they tried to ship their own feature."

New comment by henriquegodoy in "I launched 17 side projects. Result? I'm rich in expired domains"

henriquegodoy — Wed, 30 Jul 2025 21:12:00 +0000

Ever thinked on automating this process of creating this side projects? i think that more and more future feels like a lot of ones having really big swarms of "agents" that can like research about ideas on the internet (like finding problems on twitter, reddit, ... that a saas can solve it) and a team implementing and deploying since from code to marketing in a frenetic rhythm

New comment by henriquegodoy in "Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production"

henriquegodoy — Wed, 30 Jul 2025 21:04:45 +0000

my vision is that the market is not really prepared for that right now, the best way is this guys is solving a really niche problem with their plataform and then expanding trough more areas

New comment by henriquegodoy in "Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production"

henriquegodoy — Wed, 30 Jul 2025 21:03:11 +0000

Nice, i think that yall are on the correct path betting on evals, but please make your ui less "generic"

New comment by henriquegodoy in "Fast"

henriquegodoy — Wed, 30 Jul 2025 20:59:48 +0000

Will apply this for the next interfaces that im going to build

New comment by henriquegodoy in "Show HN: A GitHub Action that quizzes you on a pull request"

henriquegodoy — Tue, 29 Jul 2025 22:01:39 +0000

can i automate the process of answering this pr questions too?

New comment by henriquegodoy in "Supervised fine tuning on curated data is reinforcement learning"

henriquegodoy — Tue, 29 Jul 2025 21:59:49 +0000

It's cool to see the perspective that many problems (somekinda communication problems, look at lawyers, compliance and etc...) can be solved by treating AI less as agents and more as modular components within a larger system. Once we build a working process—monitored through evals—we can then reduce costs by distilling these modules. That means starting with superintelligent models and later distilling them down to just a few billion parameters, instead of needing hundreds of billions.

New comment by henriquegodoy in "Study mode"

henriquegodoy — Tue, 29 Jul 2025 17:50:47 +0000

The point is that you can have a highly advanced teacher with infinite patience, available 24/7—even when you have a question at 3 a.m is game changer and people that know how to use that will have a extremaly leverage in their life.

New comment by henriquegodoy in "Principles for production AI agents"

henriquegodoy — Tue, 29 Jul 2025 02:09:52 +0000

I've been tinkering with agentic systems for a while now, and this post nails some key pain points that hit close to home. The emphasis on splitting context and designing tight feedback loops feels spot on—I've seen agents go off the rails without them, hallucinating solutions because the prompt was too bloated or the validation was half-baked. It's like building a machine where every part needs to click just right, or else you're debugging forever.

What really resonates is the bit about frustrating behaviors signaling deeper system issues, not just model quirks. In my own experiments, I've had agents stubbornly ignore tools because I forgot to expose the right APIs, and it made me rethink how we treat these as "intelligent" when they're really just following our flawed setups. It pushes us toward more robust orchestration, where humans handle the high-level intentions and AI fills in the execution gaps seamlessly.

This ties into broader ideas on how AI interfaces will evolve as models get smarter. I extrapolate more of this thinking and dive deeper into human–AI interfaces on my blog if anyone’s interested in checking it out: https://henriquegodoy.com/blog/stream-of-consciousness