<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: boole1854</title><link>https://news.ycombinator.com/user?id=boole1854</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 17 Apr 2026 04:29:58 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=boole1854" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by boole1854 in "The Connection Machine CM-1 "Feynman" T-shirt"]]></title><description><![CDATA[
<p>I ordered one of these a while back. Be warned that it <i>will</i> shrink if put in the dryer.</p>
]]></description><pubDate>Tue, 03 Feb 2026 02:02:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=46865400</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=46865400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46865400</guid></item><item><title><![CDATA[New comment by boole1854 in "GPT-5.2"]]></title><description><![CDATA[
<p><a href="https://openai.com/index/hello-gpt-4o/" rel="nofollow">https://openai.com/index/hello-gpt-4o/</a><p>I see evaluations compared with Claude, Gemini, and Llama there on the GPT 4o post.</p>
]]></description><pubDate>Thu, 11 Dec 2025 19:11:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=46235745</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=46235745</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46235745</guid></item><item><title><![CDATA[New comment by boole1854 in "Building more with GPT-5.1-Codex-Max"]]></title><description><![CDATA[
<p>Today I did some comparisons of GPT-5.1-Codex-Max (on high) in the Codex CLI versus Gemini 3 Pro in the Gemini CLI.<p>- As a general observation, Gemini is less easy to work with as a collaborator. If I ask the same question to both models, Codex will answer the question. Gemini will read some intention behind the question, write code to implement the intention, and only then answer the question. In one case, it took me five rounds of repeatedly rewriting my prompt in various ways before I could get it to <i>not code</i> but just answer the question.<p>- Subjectively, it seemed to me that the code that Gemini wrote was more similar to code that I, as a senior-level developer, would have written than what I have been used to from recent iterations of GPT-5.1. The code seemed more readable-by-default and not merely technically correct. I was happy to see this.<p>- Gemini seems to have a tendency to put its "internal dialogue" into comments. For example, "// Here we will do X because of reason Y. Wait, the plan calls for Z instead. Ok, we'll do Z.". Very annoying.<p>I did two concrete head-to-head comparisons where both models had the same code and the same prompt.<p>First, both models were told to take a high-level overview of some new functionality that we needed and were told to create a detailed plan for implementing it. Both models' plans were then reviewed by me and also by both models (in fresh conversations). All three of us agreed that Codex's plan was better. In particular, Codex was better at being more comprehensive and at understanding how to integrate the new functionality more naturally into the existing code.<p>Then (in fresh conversations), both models were told to implement that plan. Afterwards, again, all three of us compared the resulting solutions. And, again, all three of us agreed that Codex's implementation was better.<p>Notably, Gemini (1) hallucinated database column names, (2) ignored parts of the functionality that the plan called for, and (3) did not produce code that was integrated as well with the existing codebase. In its favor, it did produce a better version of a particular finance-related calculation function than Codex did.<p>Overall, Codex was the clear winner today. Hallucinations and ignored requirements are <i>big</i> problems that are very annoying to deal with when they happen. Additionally, Gemini's tendencies to include odd comments and to jump past the discussion phase of projects both make it more frustrating to work with, at this stage.</p>
]]></description><pubDate>Wed, 19 Nov 2025 21:36:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=45985606</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=45985606</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45985606</guid></item><item><title><![CDATA[New comment by boole1854 in "Blue Prince (1989)"]]></title><description><![CDATA[
<p>Ok, so this post is a joke of some kind (there was no 1989 version of Blue Prince).<p>But it raises an interesting question: would it have been possible to implement that upside down floppy disk puzzle in a game?<p>1. Was it even possible to insert floppy disks upside down? I lived through the floppy disk era in my childhood, but I have to admit I can't remember if the drives would even let you do this.<p>2. If the answer to #1 is yes, would there be any way of programmatically detecting the floppy-disk-was-inserted-the-wrong-way state?</p>
]]></description><pubDate>Wed, 05 Nov 2025 14:44:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=45823381</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=45823381</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45823381</guid></item><item><title><![CDATA[New comment by boole1854 in "You are the scariest monster in the woods"]]></title><description><![CDATA[
<p>If anyone knows of a steelman version of the "AGI is not possible" argument, I would be curious to read it. I also have trouble understanding what goes into that point of view.</p>
]]></description><pubDate>Wed, 15 Oct 2025 14:45:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=45593436</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=45593436</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45593436</guid></item><item><title><![CDATA[New comment by boole1854 in "Grok Code Fast 1"]]></title><description><![CDATA[
<p>It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.<p>I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.</p>
]]></description><pubDate>Fri, 29 Aug 2025 14:21:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=45064512</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=45064512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45064512</guid></item><item><title><![CDATA[New comment by boole1854 in "O3 Turns Pro"]]></title><description><![CDATA[
<p>Here are my own anecdotes from using o3-pro recently.<p>My primary use cases where I am willing to wait 10-20 minutes for an answer from the "big slow" model (o3-pro) is code reviews of large amounts of code. I have been comparing results on this task from the three models above.<p>Oddly, I see many cases where each model will surface issues that the other two miss. In previous months when running this test (e.g., Claude 3.7 Sonnet vs o1-pro vs earlier Gemini), that wasn't the case. Back then, the best model (o1-pro) would almost always find all the issues that the other models found. But now it seems they each have their own blindspots (although they are also all better than the previous generation of models).<p>With that said, I am seeing Claude Opus 4 (w/extended thinking) be distinctly worse at missing problems which o3-pro and Gemini find. It seems fairly consistent that Opus will be the worst out of the three (despite sometimes noticing things the others do not).<p>Whether o3-pro or Gemini 2.5 Pro is better is less clear. o3-pro will report <i>more</i> issues, but it also has a tendency to confabulate problems. My workflow involves providing the model with a diff of all changes, plus the full contents of the files that were changed. o3-pro seems to have a tendency to imagine and report problems in the files that were not provided to it. It also has an odd new failure mode, which is very consistent: it gets confused by the fact that I provide both the diff and the full file contents. It "sees" parts of the same code twice and will usually report that there has accidentally been some code duplicated. Base o3 does this as well. None of the other models get confused in that way, and I also do not remember seeing that failure mode with o1-pro.<p>Nevertheless, it seems o3-pro can sometimes find real issues that Gemini 2.5 Pro and Opus 4 cannot more often than vice versa.<p>Back in the o1-pro days, it was fairly straightforward in my testing for this use case that o1-pro was simply better across the board. Now with o3-pro compared particularly with Gemini 2.5 Pro, it's no longer clear whether the bonus of occasionally finding a problem that Gemini misses is worth the trouble of (1) waiting <i>way</i> longer for an answer and (2) sifting through more false positives.<p>My other common code-related use case is actually writing code. Here, Claude Code (with Opus 4) is amazing and has replaced all my other use of coding models, including Cursor. I now code almost exclusively by peer programming with Claude Code, allowing it to be the code writer while I oversee and review. The OpenAI competitor to Claude Code, called Codex CLI, feels distinctly undercooked. It has a recurring problem where it seems to "forget" that it is an agent that needs to go ahead and edit files, and it will instead start to offer me suggestions about how I can make the change. It also hallucinates running commands on a regular basis (e.g., I tell it to commit the changes we've done, and outputs that it has done so, but it has not.)<p>So where will I spend my $200 monthly model budget? Answer: Claude, for nearly unlimited use of Claude Code. For highly complex tasks, I switch to Gemini 2.5 Pro, which is still free in AI Studio. If I can wait 10+ minutes, I may hand it to o3-pro. But once my ChatGPT Pro subscription expires this month, I may either stop using o3-pro altogether, or I may occasionally use it as a second opinion by paying on-demand through the API.</p>
]]></description><pubDate>Tue, 17 Jun 2025 18:34:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=44302297</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44302297</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44302297</guid></item><item><title><![CDATA[New comment by boole1854 in "The Gentle Singularity"]]></title><description><![CDATA[
<p>You can hover over places on the chart to get exact values. In January 1980, the index was at 37.124. In April 2025, it was at 125.880.<p>Then calculate cumulative inflation as the proportional change in the price level, like this:<p>(P_final - P_initial) / P_initial
= (125.880 - 37.124) / 37.124
= 2.39<p>This shows that the overall price level (the cumulative inflation embodied in the PCEPI) has increased by about 2.39 times over the period, which is 239%.</p>
]]></description><pubDate>Wed, 11 Jun 2025 17:32:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=44249857</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44249857</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44249857</guid></item><item><title><![CDATA[New comment by boole1854 in "The Gentle Singularity"]]></title><description><![CDATA[
<p>It rose 2.75% per year (239% over 45 years).<p>Source with details: <a href="https://fred.stlouisfed.org/graph/?g=1JxIa" rel="nofollow">https://fred.stlouisfed.org/graph/?g=1JxIa</a></p>
]]></description><pubDate>Wed, 11 Jun 2025 15:37:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44248783</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44248783</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44248783</guid></item><item><title><![CDATA[New comment by boole1854 in "The Gentle Singularity"]]></title><description><![CDATA[
<p>Even without including employer health insurance costs, real wages are up 67% since 1980.<p>Source: <a href="https://fred.stlouisfed.org/graph/?g=1JxBn" rel="nofollow">https://fred.stlouisfed.org/graph/?g=1JxBn</a><p>Details: uses the "Wage and salary accruals per full-time-equivalent employee" time series, which is the broadest wage measure for FTE employees, and adjusts for inflation using the PCE price index, which is the most economically meaningful measure of "how much did prices change for consumers" (and is the inflation index that the Fed targets)</p>
]]></description><pubDate>Wed, 11 Jun 2025 14:11:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=44247830</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44247830</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44247830</guid></item><item><title><![CDATA[New comment by boole1854 in "OpenAI o3-pro"]]></title><description><![CDATA[
<p>I also don't have that tweet saved, but I do remember it.</p>
]]></description><pubDate>Tue, 10 Jun 2025 21:13:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=44241499</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44241499</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44241499</guid></item><item><title><![CDATA[New comment by boole1854 in "OpenAI o3-pro"]]></title><description><![CDATA[
<p>No, this doesn't seem to be correct, although confusion regarding model names is understandable.<p>o4-mini-high is the label on chatgpt.com for what in the API is called o4-mini with reasoning={"effort": "high"}. Whereas o4-mini on chatgpt.com is the same thing as reasoning={"effort": "medium"} in the API.<p>o3 can also be run via the API with reasoning={"effort": "high"}.<p>o3-pro is <i>different</i> than o3 with high reasoning. It has a separate endpoint, and it runs for much longer.<p>See <a href="https://platform.openai.com/docs/guides/reasoning?api-mode=responses" rel="nofollow">https://platform.openai.com/docs/guides/reasoning?api-mode=r...</a></p>
]]></description><pubDate>Tue, 10 Jun 2025 21:10:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=44241469</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44241469</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44241469</guid></item><item><title><![CDATA[New comment by boole1854 in "Google AI Ultra"]]></title><description><![CDATA[
<p>They are working on it: <a href="https://jules.google/" rel="nofollow">https://jules.google/</a></p>
]]></description><pubDate>Tue, 20 May 2025 19:15:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44044876</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=44044876</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44044876</guid></item><item><title><![CDATA[New comment by boole1854 in "ChatGPT Saved My Life (no, seriously, I'm writing this from the ER)"]]></title><description><![CDATA[
<p>According to the story, the ChatGPT conversation that led to the ER visit happened on a Sunday. In my part of the world, all local pharmacies are closed on Sundays, so going to a pharmacy and showing the results would not have been an option.</p>
]]></description><pubDate>Tue, 25 Feb 2025 15:17:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43172967</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=43172967</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43172967</guid></item><item><title><![CDATA[New comment by boole1854 in "Launch HN: A0.dev (YC W25) – React Native App Generator"]]></title><description><![CDATA[
<p>> which involves alot of stuff outside of code-gen that we're working on<p>Could you elaborate on what extra stuff you are working on that will be a value-add over standalone Cursor?</p>
]]></description><pubDate>Tue, 11 Feb 2025 21:53:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=43018838</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=43018838</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43018838</guid></item><item><title><![CDATA[New comment by boole1854 in "Delaware faces exodus of tech companies"]]></title><description><![CDATA[
<p>That article states:<p>> Although some scholars and practitioners have long argued that officers should or do owe a duty of oversight, and as a practical matter many officers likely assume that they have such an obligation, McDonald’s marks the first time this duty was explicitly acknowledged by a Delaware court.<p>To me this seems to imply that the ruling was incorporating the already existing practice into law, which would seem to imply that it isn't a big shift.</p>
]]></description><pubDate>Sat, 01 Feb 2025 20:20:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=42901851</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=42901851</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42901851</guid></item><item><title><![CDATA[New comment by boole1854 in "DeepSeek-R1"]]></title><description><![CDATA[
<p>In their paper, they explain that "in the case of math problems with deterministic results, the model is required to provide the final answer in a specified format (e.g., within a box), enabling reliable rule-based verification of correctness. Similarly, for LeetCode problems, a compiler can be
used to generate feedback based on predefined test cases."<p>Basically, they have an external source-of-truth that verifies whether the model's answers are correct or not.</p>
]]></description><pubDate>Tue, 21 Jan 2025 18:08:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=42783343</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=42783343</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42783343</guid></item><item><title><![CDATA[New comment by boole1854 in "No Calls"]]></title><description><![CDATA[
<p>Ah ha! Makes sense. Thank you.</p>
]]></description><pubDate>Thu, 16 Jan 2025 15:32:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=42726633</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=42726633</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42726633</guid></item><item><title><![CDATA[New comment by boole1854 in "No Calls"]]></title><description><![CDATA[
<p>The post is about how they have a no-calls policy, even for enterprise sales. The author brags, "I nuked the 'book a call' button from my pricing page".<p>...But their pricing page actually has a big "Schedule a Call" button when you drag the pricing slider into enterprise territory: <a href="https://keygen.sh/pricing/" rel="nofollow">https://keygen.sh/pricing/</a><p>What am I missing?</p>
]]></description><pubDate>Thu, 16 Jan 2025 15:28:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=42726556</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=42726556</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42726556</guid></item><item><title><![CDATA[New comment by boole1854 in "Narcolepsy is weird but I didn't notice"]]></title><description><![CDATA[
<p>Oddly the author compares their cataplexy experience to sleep paralysis and says they are <i>not</i> similar because in sleep paralysis "you can't feel" whereas in cataplexy "you can feel all your limbs and it feels like they're all ready to obey you".<p>I have experienced sleep paralysis several times, and I have always retained the ability to apparently feel my body/limbs as I think most people do. It would seem that the author's experience of sleep paralysis is different from most people's.</p>
]]></description><pubDate>Sat, 11 Jan 2025 20:03:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=42668537</link><dc:creator>boole1854</dc:creator><comments>https://news.ycombinator.com/item?id=42668537</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42668537</guid></item></channel></rss>