<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jeffreysmith</title><link>https://news.ycombinator.com/user?id=jeffreysmith</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 01 May 2026 23:54:23 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jeffreysmith" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jeffreysmith in "Ask HN: Who is hiring? (May 2026)"]]></title><description><![CDATA[
<p>Burnin (London or NYC) | Founding Research Scientist + Founding Engineering Lead | Hybrid (Some on-site) | Full-time<p>We are building the trust stack for AI code generation, targeted at high-stakes computing where wrong numbers cost real money: a statically typed functional language designed for coding models, a compiler whose guarantees double as audit-grade trust infrastructure, and a coding model fine-tuned on the language with compiler fitness as the training signal. Language and model are co-designed.<p>Small team out of PyTorch, FAIR, and Meta. Strong institutional VC support from our pre-seed; active investor interest heading into seed.<p>Founding Research Scientist: own the agenda across the language and the model. Type system design, compiler analyses that produce useful fitness gradients, and fine-tuning open-weight coding models on a language with no pretraining footprint. Publish in PL and ML venues. Strongest fits cross between machine learning, programming languages, and formal methods. PhD preferred, equivalent output equally fine. Stack is Rust, Lean 4, Python.<p>Founding Engineering Lead: own engineering across compiler internals, language runtime, GPU backends, notebook and library tooling, and the AI infrastructure around training, evaluation, model release, and likely public inference. We already support x86 and ARM, CUDA and AMD, macOS and Linux, and the matrix grows. Real feel for statically typed functional programming expected; the kind of engineer who picks up Lean or Haskell on a weekend because they wanted to. Have led engineering before, formally or not. Stack is mostly Rust with Python where it earns its place, primarily on AWS.<p>Both roles: comfortable with early-stage ambiguity, define your own roadmap, defend it with evidence.<p>Apply:
RS: <a href="https://wellfound.com/l/2Carrr" rel="nofollow">https://wellfound.com/l/2Carrr</a>
Eng Lead: <a href="https://wellfound.com/l/2CewDC" rel="nofollow">https://wellfound.com/l/2CewDC</a><p>Socials:
YT: <a href="https://www.youtube.com/channel/UC4DLS_emqwKXO7A9qaOtxlw" rel="nofollow">https://www.youtube.com/channel/UC4DLS_emqwKXO7A9qaOtxlw</a>
SS: <a href="https://burninai.substack.com/" rel="nofollow">https://burninai.substack.com/</a></p>
]]></description><pubDate>Fri, 01 May 2026 15:12:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=47975741</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47975741</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47975741</guid></item><item><title><![CDATA[New comment by jeffreysmith in "A Basket of Eggs"]]></title><description><![CDATA[
<p>OP here. This is the sequel to the original Good Egg post (<a href="https://news.ycombinator.com/item?id=47151678">https://news.ycombinator.com/item?id=47151678</a>).<p>We ran three experiments since v1 that changed the model substantially.<p>We tried to detect suspended GitHub accounts from behavioral signals (merge rate, network centrality, TF-IDF on PR titles, LLM classification with ~31K Gemini calls). Best individual AUC was 0.619 on a 1.9% base rate. The merged-PR population is too homogeneous. Accounts that pass code review look like everyone else. The interesting finding: the suspension rate among contributors with merged PRs is under 2%. The review process is a better filter than the discourse around AI slop suggests.<p>That led us to question the scoring model. The graph score (bipartite construction, personalized ranking,   language normalization, the whole pipeline from v1) actively hurts predictions for the contributors who actually need scoring: unknown people with a handful of merged PRs. Merge rate alone outperforms merge rate plus graph at every tier we tested. The new default model is merged / (merged + closed). We also pulled account   age out of the score into a separate advisory after DeLong tests showed it adds nothing once you condition on merge rate.<p>The post has the full data, including the tables.<p>Next we're working on content scoring (does this PR fit this repo's conventions?) and cold-start tooling (helping new contributors understand project expectations before they submit). Contributor reputation is one input to review triage. The PR itself carries more signal.<p>Repo: <a href="https://github.com/2ndSetAI/good-egg" rel="nofollow">https://github.com/2ndSetAI/good-egg</a><p>pip install good-egg<p>Or just run it via uvx.</p>
]]></description><pubDate>Mon, 16 Mar 2026 17:39:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47402185</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47402185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47402185</guid></item><item><title><![CDATA[A Basket of Eggs]]></title><description><![CDATA[
<p>Article URL: <a href="https://neotenyai.substack.com/p/a-basket-of-eggs">https://neotenyai.substack.com/p/a-basket-of-eggs</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47402184">https://news.ycombinator.com/item?id=47402184</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 16 Mar 2026 17:39:33 +0000</pubDate><link>https://neotenyai.substack.com/p/a-basket-of-eggs</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47402184</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47402184</guid></item><item><title><![CDATA[New comment by jeffreysmith in "3D-Knitting: The Ultimate Guide"]]></title><description><![CDATA[
<p>They definitely are a powerful option for smaller scale runs. Very much optimized to have the unit economics and turnaround time work for smaller brands.<p>I don't really know the answer around supplying your own yarn. I'd assume that's the abnormal case, but just a guess.</p>
]]></description><pubDate>Thu, 12 Mar 2026 17:58:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=47354757</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47354757</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47354757</guid></item><item><title><![CDATA[New comment by jeffreysmith in "3D-Knitting: The Ultimate Guide"]]></title><description><![CDATA[
<p>I interviewed these guys for an article on the use of seaweed in yarn and fabric. And I bought the 3D knit seaweed sweater. Great team, with a lot of heart and good intentions.<p>I'm also a hand knitter, and I don't really see any conflict between what they're doing and hand knitting. The grist of the yarn that you use as a hand knitter is generally much thicker than these machines commonly use. Commercial 3D knitting machines can do all of the stretchy, thin, and light stuff that the modern wardrobe is built around.<p>As folks note, this technology was really pioneered by Shimaseki's work in Japan just decades ago. What OC and the similar Brooklyn-based Tailored Industry are really innovating on is the business model and connection to production process. Folks like this are really serious about not producing all of the waste that comes with most fashion production processes, and it shows up at several levels of the stack.<p>For the HN crowd, TI's platform gives you more of a sense of why this sort of tech is really like the cloud for knitwear: <a href="https://tailoredindustry.com/platform" rel="nofollow">https://tailoredindustry.com/platform</a><p>Really a fascinating part of the global fashion production world, and one we would all benefit from seeing grow.</p>
]]></description><pubDate>Thu, 12 Mar 2026 11:20:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47349132</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47349132</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47349132</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Ask HN: Maintainers, do LLM-only users often clutter your issues/PRs?"]]></title><description><![CDATA[
<p>And I filed that suggestion as an enhancement issue on the repo: <a href="https://github.com/2ndSetAI/good-egg/issues/43" rel="nofollow">https://github.com/2ndSetAI/good-egg/issues/43</a><p>Thanks for the idea.</p>
]]></description><pubDate>Thu, 05 Mar 2026 11:34:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47260426</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47260426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47260426</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Ask HN: Maintainers, do LLM-only users often clutter your issues/PRs?"]]></title><description><![CDATA[
<p>Thanks!<p>Yeah, scanning non-GitHub is on the roadmap and really should be done. I expect there would be value in understanding all of the current GitHub competitors. And I think the forecasts of new GH competitors getting launched (likely by AI companies) will become relevant in the near future.</p>
]]></description><pubDate>Thu, 05 Mar 2026 11:15:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=47260306</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47260306</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47260306</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Ask HN: Maintainers, do LLM-only users often clutter your issues/PRs?"]]></title><description><![CDATA[
<p>Quick footnote to call out this really good summary from the team at :probabl (the scikit-learn/skore company): <a href="https://blog.probabl.ai/maintaining-open-source-age-of-gen-ai" rel="nofollow">https://blog.probabl.ai/maintaining-open-source-age-of-gen-a...</a></p>
]]></description><pubDate>Wed, 04 Mar 2026 19:11:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47252337</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47252337</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47252337</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Ask HN: Maintainers, do LLM-only users often clutter your issues/PRs?"]]></title><description><![CDATA[
<p>I'm a bit obsessed with this topic lately, so I'm going to keep refreshing this thread to see if folks have good answers.<p>One thing I've been working with is this little util to try to do a quick sniff test on the contributors: <a href="https://github.com/2ndSetAI/good-egg" rel="nofollow">https://github.com/2ndSetAI/good-egg</a> (Longer explanation on Substack: <a href="https://neotenyai.substack.com/p/scoring-open-source-contributors" rel="nofollow">https://neotenyai.substack.com/p/scoring-open-source-contrib...</a> )<p>From what I've seen in the data, acceptance rates to all major OSS projects are down since the age of coding agents.<p>And when I talk to maintainers, most of them are talking about some version of doing fast and easy pocket vetos (leaving the PRs to rot) or even just banning on the first offense.<p>It's been building for a bit, but I think the crisis point is solidly here. And things like OpenClaw turn up the dials. I'm sure more tools and changes to practices will be coming.</p>
]]></description><pubDate>Wed, 04 Mar 2026 19:11:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=47252330</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47252330</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47252330</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Scoring Open Source Contributors in the Age of AI Slop: Finding Good Eggs"]]></title><description><![CDATA[
<p>Jeff, the author, here. We built a tool that scores PR authors by mining their contribution graph from the GitHub API. Every input is a merge/reject decision a human maintainer already made. It doesn't look at PR content or try to detect AI usage. It just answers: has this person gotten code accepted into projects before, and how relevant is that history to your project?<p>The scoring is graph-based (bipartite user-repo graph, personalized ranking, 180-day recency decay). Scores are context-specific, so the same person can score differently against different repos. The post walks through how Guillermo Rauch scores MEDIUM against his own company's Next.js repo because he has zero merged PRs there, and how v2 rescues that with merge rate and account age.<p>We validated on 5,129 PRs across 49 repos. Three features survived statistical testing, four didn't. The most surprising failure: text similarity between PR descriptions and project READMEs predicted lower merge rates. We published all of it, including the failures.<p>More detail on the Substack post.<p>Repo: <a href="https://github.com/2ndSetAI/good-egg" rel="nofollow">https://github.com/2ndSetAI/good-egg</a> (MIT, pip install good-egg). Runs as a CLI, GitHub Action, Python library, and MCP server.</p>
]]></description><pubDate>Wed, 25 Feb 2026 14:08:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47151679</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47151679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47151679</guid></item><item><title><![CDATA[Scoring Open Source Contributors in the Age of AI Slop: Finding Good Eggs]]></title><description><![CDATA[
<p>Article URL: <a href="https://neotenyai.substack.com/p/scoring-open-source-contributors">https://neotenyai.substack.com/p/scoring-open-source-contributors</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47151678">https://news.ycombinator.com/item?id=47151678</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 25 Feb 2026 14:08:16 +0000</pubDate><link>https://neotenyai.substack.com/p/scoring-open-source-contributors</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=47151678</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47151678</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Vouch"]]></title><description><![CDATA[
<p>I think this is really a key problem to solve, but I couldn't convince myself that it was the right solution. So, I put up my alternative proposal, Good Egg: <a href="https://github.com/2ndSetAI/good-egg" rel="nofollow">https://github.com/2ndSetAI/good-egg</a><p>Key differences:
- Based on commit history, with nuance around relatedness of projects, types of projects, age, etc.
- Requires no ongoing work. Just add it to your GH Actions CI.
- Agent ready with an MCP interface, Python lib, and CLI<p>Discussion on HN here: <a href="https://news.ycombinator.com/item?id=46960412">https://news.ycombinator.com/item?id=46960412</a><p>Feedback and PRs welcome.</p>
]]></description><pubDate>Tue, 10 Feb 2026 18:45:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=46964813</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=46964813</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46964813</guid></item><item><title><![CDATA[Show HN: Good Egg: Trust Scoring for GitHub PR Authors]]></title><description><![CDATA[
<p>I'm Jeff Smith. I've been contributing to AI in open source for a long time, across the Spark, Elixir, and PyTorch ecosystems. I've seen firsthand how open source can be a great place for people to collaborate and build AI together. I even wrote a book about it: <a href="https://www.manning.com/books/machine-learning-systems" rel="nofollow">https://www.manning.com/books/machine-learning-systems</a> with all open source code: <a href="https://github.com/jeffreyksmithjr/reactive-machine-learning-systems" rel="nofollow">https://github.com/jeffreyksmithjr/reactive-machine-learning...</a><p>But the challenges are real. AI-generated code slop and low-quality submissions are flooding projects. Contribution volume is up; signal-to-noise is down. Maintainers can no longer assume a PR represents genuine investment.<p>Good Egg is a tool I built to help. It mines a contributor's merged PR history across the GitHub ecosystem and computes a trust score relative to your project. The core idea: good contributors are already exhibiting good behavior -- merged PRs in established repos, sustained contributions over time, work across multiple projects. That track record is a strong signal, and it already exists in the GitHub API.<p>How it works:<p>- Builds a bipartite contribution graph (users ↔ repositories) from merged PRs<p>- Applies personalized graph scoring biased toward your project and language ecosystem<p>- Accounts for recency decay, repository quality (stars, language normalization), and anti-gaming measures (self-contribution penalties, per-repo caps)<p>- Classifies contributors as HIGH / MEDIUM / LOW / UNKNOWN / BOT<p>The methodology doc goes into the full detail: <a href="https://github.com/2ndSetAI/good-egg/blob/main/docs/methodology.md" rel="nofollow">https://github.com/2ndSetAI/good-egg/blob/main/docs/methodol...</a><p>Runs four ways:<p>- GitHub Action: drop it into any PR workflow and get a comment with the score
- CLI:  good-egg score <user> --repo <owner/repo>
- Python library: await score_pr_author(login, repo_owner, repo_name, token)
- MCP server: plug it into Claude or other AI assistants<p>On Vouch and the circle-of-trust approach:<p>Mitchell Hashimoto's Vouch takes a different angle: maintainers manually vouch for contributors they trust, building a web-of-trust. I think that's a valid approach and have seen circles of trust work well (on PyTorch specifically, where contributors came from all over, including major corporate partners). But I've also seen gaps that could easily be filled by a bit of data that already exists. Vouch requires active maintainer participation in a separate system and has a cold-start problem. Good Egg is complementary. It's automated, doesn't ask maintainers to do extra work, and works from day one on any repo.<p>What it doesn't do:<p>Good Egg doesn't send data to any remote service. It reads from the GitHub API, computes locally, and that's it. I'm not building a training set or a contributor database. This is just a tool for the community.<p>Configuration and extensibility:<p>Scoring parameters (thresholds, graph weights, recency decay, language multipliers) are all configurable via YAML or environment variables.
More extensibility is planned, particularly around additional data sources (e.g., GitLab) and methodology variations like graph-based project relatedness and incorporating review/issue activity alongside PRs.<p>Code: <a href="https://github.com/2ndSetAI/good-egg" rel="nofollow">https://github.com/2ndSetAI/good-egg</a>
PyPI: pip install good-egg
Docs: <a href="https://github.com/2ndSetAI/good-egg/tree/main/docs" rel="nofollow">https://github.com/2ndSetAI/good-egg/tree/main/docs</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46960412">https://news.ycombinator.com/item?id=46960412</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 10 Feb 2026 14:50:37 +0000</pubDate><link>https://github.com/2ndSetAI/good-egg</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=46960412</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46960412</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Olmo 3: Charting a path through the model flow to lead open-source AI"]]></title><description><![CDATA[
<p>Weird that this late, dupe thread came alive after this/my earlier submission didn't seem to get noticed: <a href="https://news.ycombinator.com/item?id=45993118">https://news.ycombinator.com/item?id=45993118</a></p>
]]></description><pubDate>Sat, 22 Nov 2025 02:29:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=46011527</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=46011527</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46011527</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Olmo 3: Charting a path through the model flow to lead open-source AI"]]></title><description><![CDATA[
<p>Totally. I don't get why people sleep on AI2's launches. They're such powerful platforms for AI R&D.</p>
]]></description><pubDate>Fri, 21 Nov 2025 00:32:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=45999855</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=45999855</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45999855</guid></item><item><title><![CDATA[Olmo 3: Charting a path through the model flow to lead open-source AI]]></title><description><![CDATA[
<p>Article URL: <a href="https://allenai.org/blog/olmo3">https://allenai.org/blog/olmo3</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45993118">https://news.ycombinator.com/item?id=45993118</a></p>
<p>Points: 20</p>
<p># Comments: 3</p>
]]></description><pubDate>Thu, 20 Nov 2025 14:49:51 +0000</pubDate><link>https://allenai.org/blog/olmo3</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=45993118</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45993118</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Leaving Meta and PyTorch"]]></title><description><![CDATA[
<p>I'm one of the many people who Soumith hired to Meta and PyTorch. I had the privilege of working on PyTorch with him and lots of the folks on this post.<p>As his longtime colleague, the one thing I would want people to know about him and this decision is that Soumith has always viewed PyTorch as a community project. He consistently celebrated the contributions of his co-creators Adam and Sam, and he extended the same view towards the Yangqing and the Caffe2 crew that we merged into PyTorch. At the very beginning, by Soumith's highly intentional design, PyTorch was aimed at being truly developed by and for the AI research community and for many years that was the key way in which we grew the framework, FB PT team, and the wider community. At every single stage of PT's lifecycle, he always ensured that our conception of PT and its community grew to include and celebrate the new people and organizations growing what was possible with PT. He's an incredible talent magnet, and thus more and more smart people kept dedicating their blood, sweat, and tears to making PT bigger and better for more people.<p>I've worked with some very well known and highly compensated leaders in tech, but *no one* has done the job he has done with ameliorating a bus factor problem with his baby. PT has a unique level of broad support that few other open source technology can reach. In a world of unbounded AI salaries, people who want to move AI research methods forward still freely give their time and attention to PyTorch and its ecosystem. It's the great lever of this era of AI that is moving the world, *due in large part* to the strength of the community he fostered and can now let continue without his direct involvement.<p>His departure is the end of an era, but it's also operationally a true non-event. PyTorch is going strong and can afford to let one of its creators retire from stewardship. This is precisely what success looks like in open source software.<p>He deserves our congratulations and our thanks. Enjoy your PT retirement, man.</p>
]]></description><pubDate>Fri, 07 Nov 2025 19:43:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=45850209</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=45850209</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45850209</guid></item><item><title><![CDATA[New comment by jeffreysmith in "If my kids excel, will they move away?"]]></title><description><![CDATA[
<p>American here who went to a Chinese (grad) school for CS and was admitted to every Chinese school I applied to. This is very much a possible route, if you’re appropriately qualified for the program. The main issue is language: outside of HK, programs in English are rare.</p>
]]></description><pubDate>Sun, 14 Sep 2025 03:31:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=45237186</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=45237186</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45237186</guid></item><item><title><![CDATA[New comment by jeffreysmith in "NSF and Nvidia award Ai2 $152M to support building an open AI ecosystem"]]></title><description><![CDATA[
<p>Not sure what's with the HN tone on this announcement. AI2 are really some of the best people around for creating truly open artifacts for the whole ecosystem. Their work on OLMo and Molmo is some of the most transparent and educational material you can find on model building. This is just great news for everyone.</p>
]]></description><pubDate>Thu, 14 Aug 2025 15:22:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=44901496</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=44901496</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44901496</guid></item><item><title><![CDATA[New comment by jeffreysmith in "Taking a Second Look"]]></title><description><![CDATA[
<p>Howdy, HN. Authors here. We got tired of text-to-image leaderboards that only focus on aesthetics, so we built our own benchmarks to test what matters for real work: fidelity to complex prompts, safety, bias, and IP infringement.<p>We analyzed 18 models and found that no single model is good at everything. For example, GPT-4o has the best safety guardrails but also a 98% IP infringement rate on celebrity likenesses. Google's Imagen 4 Ultra actively counters bias (e.g., 90% of its "CEOs" are female) but struggles with generating crowds. X AI's Grok 2 blocks almost nothing.<p>Lots more detail in the post. We'll be here all day to answer questions.</p>
]]></description><pubDate>Thu, 14 Aug 2025 14:38:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=44900926</link><dc:creator>jeffreysmith</dc:creator><comments>https://news.ycombinator.com/item?id=44900926</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44900926</guid></item></channel></rss>