<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: guilamu</title><link>https://news.ycombinator.com/user?id=guilamu</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 25 May 2026 22:21:26 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=guilamu" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by guilamu in "Oura says it gets government demands for user data"]]></title><description><![CDATA[
<p>If you're concerned about that do not give internet to your tv and use any kind of tv box instead (shield tv, apple tv, etc).</p>
]]></description><pubDate>Sat, 23 May 2026 16:17:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=48248897</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=48248897</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48248897</guid></item><item><title><![CDATA[New comment by guilamu in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"]]></title><description><![CDATA[
<p>It's not.
France: €0.149/kWh (~$0.175)
US: ~$0.12–$0.14/kWh
<a href="https://www.globalpetrolprices.com/France/electricity_prices/" rel="nofollow">https://www.globalpetrolprices.com/France/electricity_prices...</a></p>
]]></description><pubDate>Sun, 17 May 2026 19:11:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48172228</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=48172228</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48172228</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>You're right, I've certainly been a bit presumptuous to call this'a benchmark'. It is indeed a flawed test. Yet,It's been giving me the occasion to try some open source models and for my workflow, some of them are incredibly competitive with sota closed source models.</p>
]]></description><pubDate>Mon, 27 Apr 2026 10:02:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47919664</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47919664</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47919664</guid></item><item><title><![CDATA[New comment by guilamu in "Bob Odenkirk would like to remind you that life is a meaningless farce"]]></title><description><![CDATA[
<p>Most people, including me, beg to disagree. Better Call Saul was a masterpiece.<p><a href="https://www.metacritic.com/tv/better-call-saul/" rel="nofollow">https://www.metacritic.com/tv/better-call-saul/</a></p>
]]></description><pubDate>Mon, 27 Apr 2026 09:44:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47919567</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47919567</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47919567</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Yeah as I said this a benchmark for my usecase only, a single use case, which is obvisouly not representative of everybody's needs.<p>What strike me as very strange though is that 0 model were able to just use the search input already present in GravitYForms forms list page and all created a second input.<p>Also, I know it's not in the prompt, but adding a ctrl+f shortcut to a search input? Is that that crazy? I don't know.</p>
]]></description><pubDate>Fri, 24 Apr 2026 22:10:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=47896428</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47896428</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47896428</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p><a href="https://openrouter.ai/openai/gpt-5.5-pro" rel="nofollow">https://openrouter.ai/openai/gpt-5.5-pro</a><p>30/180 usd on Openrouter. Did I miss something?</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:41:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895498</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895498</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895498</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>When nothing is noted it's max reasoning (xhigh in copilot chat in vscode if available).<p>The models not availble on copilot were tested through opencode (max reasoning) and deepseek v4 was tested through Cline (with max reasoning too).</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:36:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895428</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895428</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895428</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Yes those two models were tested on my own PC (local inference using my own CPU/GPU). So something my be bugged on my setup. gemma4-26b should be far better than gemma4-e4b.</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:30:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895363</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895363</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895363</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own".<p>The eval prompt is quite extensive: <a href="https://github.com/guilamu/llms-wordpress-plugin-benchmark/blob/main/Level%201/protocol/prompt.%C3%A9valuation.txt" rel="nofollow">https://github.com/guilamu/llms-wordpress-plugin-benchmark/b...</a></p>
]]></description><pubDate>Fri, 24 Apr 2026 20:28:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895353</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895353</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895353</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Haha, just fixed the date!<p>I haven't evaluated the judge benchmark. You have everything needed in the repo to do so though, so be my guest. It took me a bit of time to put all this together and won't have much more time to dedicate to it before a couple of weeks.<p>BTW, if you explore the repo, sorry for all the French files...</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:25:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895322</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895322</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895322</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Yes Opus 4.7 fast (no reasoning) did a worst job than Sonnet 4.6 high (with reasoning) according to Gemini 3.1 Pro evaluation.</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:22:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895291</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895291</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895291</guid></item><item><title><![CDATA[New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"]]></title><description><![CDATA[
<p>Just tested it on my homemade Wordpress+GravityForms benchmark and it's one of the worst model of the leaderboard performance wise and the worst value wise: <a href="https://github.com/guilamu/llms-wordpress-plugin-benchmark" rel="nofollow">https://github.com/guilamu/llms-wordpress-plugin-benchmark</a><p>I know it's only on a single benchmark, but I dont understand how it can be so bad...</p>
]]></description><pubDate>Fri, 24 Apr 2026 20:05:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=47895122</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47895122</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47895122</guid></item><item><title><![CDATA[New comment by guilamu in "Show HN: I blind-tested 14 LLMs on a WP plugin task. Surprising Findings"]]></title><description><![CDATA[
<p>Good point! I think I won't change anything right now or I'll have to remake all tests... I'll use your input for the Level 2 task I plan on working on.</p>
]]></description><pubDate>Fri, 24 Apr 2026 06:51:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47886545</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47886545</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47886545</guid></item><item><title><![CDATA[Show HN: I blind-tested 14 LLMs on a WP plugin task. Surprising Findings]]></title><description><![CDATA[
<p>Recently, GitHub Copilot silently dropped support for Claude Opus on Pro accounts. Since Opus was my go-to model for my daily workflow (developing WordPress plugins), I needed a reliable replacement.<p>I decided to run a rigorous, blind benchmark across 14 state-of-the-art and local LLMs to objectively measure which model understands WordPress development best. To ensure a perfectly fair test, I started with a completely fresh IDE and zero context for every single generation.<p>I asked each model to build a "Gravity Forms Live Search" plugin using a minimal, zero-shot prompt. To avoid personal bias, I had Gemini 3.1 Pro blindly grade the anonymized outputs against a strict 100-point rubric, comparing them to my own reference implementation.<p>Surprising Findings<p>1. The "Blind Spot" (Re-inventing the wheel) Out of 14 models, exactly 0 successfully hooked into the native Gravity Forms search input (#form_list_search). Instead of analyzing the implicit context (the DOM), every single model forcefully injected a brand new, redundant <input> into the page.<p>2. Complete lack of advanced UX foresight Because it wasn't explicitly asked for, no model anticipated the need for keyboard shortcuts (Ctrl+F), nor did any attempt to update the native item counter as rows were hidden. Zero models implemented background-fetching for paginated pages to make the search global.<p>3. The Diacritics Separator Most models used a simple .toLowerCase() for filtering, breaking on accents. Only a select few implemented robust normalization (.normalize('NFD')) to handle diacritics correctly.<p>4. Local models struggled Local inferences failed to keep up on my low end hardware (7700x 64gb, rx6700 10gb). Gemma4-26b underperformed significantly, generating a fatal PHP error and scoring 18/100.<p>The Standouts<p>The Winner: Claude 4.7 Opus (68/100). It wrote highly performant JS (caching DOM text, 120ms debounce), handled diacritics perfectly, and used modern WordPress i18n. It stands out as the most capable direct replacement for Copilot Pro Opus.<p>The Value King: GLM 5.1 (61/100). GLM secured a notable 2nd place before Opus 4.6! When checking OpenRouter, GLM 5.1 ($1.05 in / $3.50 out) is ~3-4x cheaper than Sonnet 4.6 and ~5-7x cheaper than Opus 4.6/4.7, making it a very cost-effective alternative for this task.<p>The Leaderboard<p>1. Claude 4.7 Opus plan – 68<p>2. GLM 5.1 – 61<p>3. Claude 4.6 Opus plan – 59<p>4. Mimo v2.5 pro – 58<p>5. Qwen 3.6+ – 55<p>6. Sonnet 4.6 – 55<p>7. Gemini 3.1 pro – 53<p>8. Kimi K2.6 – 49<p>9. GPT 5.4 xHigh – 49<p>10. Gemini 3 flash – 47<p>11. Claude 4.7 Opus fast – 46<p>12. Minimax m2.7 – 36<p>13. Gemma4-e4b (Local rx6700) – 32<p>14. Gemma4-26b (Local CPU) – 18<p>Takeaway<p>Even the best LLMs default to the path of least resistance: "just make it work." If you want native-feeling, fully integrated UX, you cannot rely on the model's implicit knowledge; you have to explicitly prompt for it.<p>I've published the full leaderboard, the exact prompts used, the detailed scoring grid, and all the generated code in the GitHub repository here: <a href="https://github.com/guilamu/llms-wordpress-plugin-benchmark" rel="nofollow">https://github.com/guilamu/llms-wordpress-plugin-benchmark</a><p>I will be testing Level 2 prompt next, feeding the models a Wordpress+Gravity Forms reference file to see how they adapt.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47880678">https://news.ycombinator.com/item?id=47880678</a></p>
<p>Points: 3</p>
<p># Comments: 2</p>
]]></description><pubDate>Thu, 23 Apr 2026 19:42:02 +0000</pubDate><link>https://github.com/guilamu/llms-wordpress-plugin-benchmark/blob/main/README.md</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47880678</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47880678</guid></item><item><title><![CDATA[New comment by guilamu in "No more Opus for Copilot Pro plan users"]]></title><description><![CDATA[
<p>Indeed. I'm still sad thought :(</p>
]]></description><pubDate>Mon, 20 Apr 2026 19:14:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47839160</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47839160</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47839160</guid></item><item><title><![CDATA[No more Opus for Copilot Pro plan users]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.blog/changelog/2026-04-20-changes-to-github-copilot-plans-for-individuals/">https://github.blog/changelog/2026-04-20-changes-to-github-copilot-plans-for-individuals/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47838896">https://news.ycombinator.com/item?id=47838896</a></p>
<p>Points: 37</p>
<p># Comments: 5</p>
]]></description><pubDate>Mon, 20 Apr 2026 18:53:29 +0000</pubDate><link>https://github.blog/changelog/2026-04-20-changes-to-github-copilot-plans-for-individuals/</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47838896</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47838896</guid></item><item><title><![CDATA[Reasons to think that the Claude Mythos announcement was overblown]]></title><description><![CDATA[
<p>Article URL: <a href="https://garymarcus.substack.com/p/three-reasons-to-think-that-the-claude">https://garymarcus.substack.com/p/three-reasons-to-think-that-the-claude</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47709400">https://news.ycombinator.com/item?id=47709400</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 09 Apr 2026 20:26:41 +0000</pubDate><link>https://garymarcus.substack.com/p/three-reasons-to-think-that-the-claude</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47709400</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47709400</guid></item><item><title><![CDATA[New comment by guilamu in "Proton Meet isn't what they told you it was"]]></title><description><![CDATA[
<p>"Proton Mail, one of the services he moved to, is ultimately controlled by the US Gov,"<p>Would you mind elaborating, pretty please?</p>
]]></description><pubDate>Fri, 03 Apr 2026 10:08:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47624916</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47624916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47624916</guid></item><item><title><![CDATA[New comment by guilamu in "People inside Microsoft are fighting to drop mandatory Microsoft Account"]]></title><description><![CDATA[
<p>Why not just get the iso, install, activate with massgravel and be done for life?</p>
]]></description><pubDate>Sat, 28 Mar 2026 12:29:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47553961</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47553961</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47553961</guid></item><item><title><![CDATA[New comment by guilamu in "People inside Microsoft are fighting to drop mandatory Microsoft Account"]]></title><description><![CDATA[
<p>That's true indeed, but Microsoft is not giving us any other option so why not use the good version at home? I mean what is the risk really?</p>
]]></description><pubDate>Sat, 28 Mar 2026 07:05:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47552293</link><dc:creator>guilamu</dc:creator><comments>https://news.ycombinator.com/item?id=47552293</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47552293</guid></item></channel></rss>