Hacker News: guilamu

New comment by guilamu in "Oura says it gets government demands for user data"

guilamu — Sat, 23 May 2026 16:17:12 +0000

If you're concerned about that do not give internet to your tv and use any kind of tv box instead (shield tv, apple tv, etc).

New comment by guilamu in "Mistral's CEO: Europe has 2 years to stop becoming America's AI 'vassal state'"

guilamu — Sun, 17 May 2026 19:11:10 +0000

It's not. France: €0.149/kWh (~$0.175) US: ~$0.12–$0.14/kWh https://www.globalpetrolprices.com/France/electricity_prices...

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Mon, 27 Apr 2026 10:02:45 +0000

You're right, I've certainly been a bit presumptuous to call this'a benchmark'. It is indeed a flawed test. Yet,It's been giving me the occasion to try some open source models and for my workflow, some of them are incredibly competitive with sota closed source models.

New comment by guilamu in "Bob Odenkirk would like to remind you that life is a meaningless farce"

guilamu — Mon, 27 Apr 2026 09:44:37 +0000

Most people, including me, beg to disagree. Better Call Saul was a masterpiece.

https://www.metacritic.com/tv/better-call-saul/

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 22:10:00 +0000

Yeah as I said this a benchmark for my usecase only, a single use case, which is obvisouly not representative of everybody's needs.

What strike me as very strange though is that 0 model were able to just use the search input already present in GravitYForms forms list page and all created a second input.

Also, I know it's not in the prompt, but adding a ctrl+f shortcut to a search input? Is that that crazy? I don't know.

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:41:49 +0000

https://openrouter.ai/openai/gpt-5.5-pro

30/180 usd on Openrouter. Did I miss something?

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:36:16 +0000

When nothing is noted it's max reasoning (xhigh in copilot chat in vscode if available).

The models not availble on copilot were tested through opencode (max reasoning) and deepseek v4 was tested through Cline (with max reasoning too).

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:30:14 +0000

Yes those two models were tested on my own PC (local inference using my own CPU/GPU). So something my be bugged on my setup. gemma4-26b should be far better than gemma4-e4b.

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:28:33 +0000

Yes, the prompt is slim by design. I might be wrong, but the point was to see what the model can do "on it's own".

The eval prompt is quite extensive: https://github.com/guilamu/llms-wordpress-plugin-benchmark/b...

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:25:31 +0000

Haha, just fixed the date!

I haven't evaluated the judge benchmark. You have everything needed in the repo to do so though, so be my guest. It took me a bit of time to put all this together and won't have much more time to dedicate to it before a couple of weeks.

BTW, if you explore the repo, sorry for all the French files...

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:22:53 +0000

Yes Opus 4.7 fast (no reasoning) did a worst job than Sonnet 4.6 high (with reasoning) according to Gemini 3.1 Pro evaluation.

New comment by guilamu in "OpenAI releases GPT-5.5 and GPT-5.5 Pro in the API"

guilamu — Fri, 24 Apr 2026 20:05:07 +0000

Just tested it on my homemade Wordpress+GravityForms benchmark and it's one of the worst model of the leaderboard performance wise and the worst value wise: https://github.com/guilamu/llms-wordpress-plugin-benchmark

I know it's only on a single benchmark, but I dont understand how it can be so bad...

New comment by guilamu in "Show HN: I blind-tested 14 LLMs on a WP plugin task. Surprising Findings"

guilamu — Fri, 24 Apr 2026 06:51:33 +0000

Good point! I think I won't change anything right now or I'll have to remake all tests... I'll use your input for the Level 2 task I plan on working on.

Show HN: I blind-tested 14 LLMs on a WP plugin task. Surprising Findings

guilamu — Thu, 23 Apr 2026 19:42:02 +0000

Recently, GitHub Copilot silently dropped support for Claude Opus on Pro accounts. Since Opus was my go-to model for my daily workflow (developing WordPress plugins), I needed a reliable replacement.

I decided to run a rigorous, blind benchmark across 14 state-of-the-art and local LLMs to objectively measure which model understands WordPress development best. To ensure a perfectly fair test, I started with a completely fresh IDE and zero context for every single generation.

I asked each model to build a "Gravity Forms Live Search" plugin using a minimal, zero-shot prompt. To avoid personal bias, I had Gemini 3.1 Pro blindly grade the anonymized outputs against a strict 100-point rubric, comparing them to my own reference implementation.

Surprising Findings

1. The "Blind Spot" (Re-inventing the wheel) Out of 14 models, exactly 0 successfully hooked into the native Gravity Forms search input (#form_list_search). Instead of analyzing the implicit context (the DOM), every single model forcefully injected a brand new, redundant into the page.

2. Complete lack of advanced UX foresight Because it wasn't explicitly asked for, no model anticipated the need for keyboard shortcuts (Ctrl+F), nor did any attempt to update the native item counter as rows were hidden. Zero models implemented background-fetching for paginated pages to make the search global.

3. The Diacritics Separator Most models used a simple .toLowerCase() for filtering, breaking on accents. Only a select few implemented robust normalization (.normalize('NFD')) to handle diacritics correctly.

4. Local models struggled Local inferences failed to keep up on my low end hardware (7700x 64gb, rx6700 10gb). Gemma4-26b underperformed significantly, generating a fatal PHP error and scoring 18/100.

The Standouts

The Winner: Claude 4.7 Opus (68/100). It wrote highly performant JS (caching DOM text, 120ms debounce), handled diacritics perfectly, and used modern WordPress i18n. It stands out as the most capable direct replacement for Copilot Pro Opus.

The Value King: GLM 5.1 (61/100). GLM secured a notable 2nd place before Opus 4.6! When checking OpenRouter, GLM 5.1 ($1.05 in / $3.50 out) is ~3-4x cheaper than Sonnet 4.6 and ~5-7x cheaper than Opus 4.6/4.7, making it a very cost-effective alternative for this task.

The Leaderboard

1. Claude 4.7 Opus plan – 68

2. GLM 5.1 – 61

3. Claude 4.6 Opus plan – 59

4. Mimo v2.5 pro – 58

5. Qwen 3.6+ – 55

6. Sonnet 4.6 – 55

7. Gemini 3.1 pro – 53

8. Kimi K2.6 – 49

9. GPT 5.4 xHigh – 49

10. Gemini 3 flash – 47

11. Claude 4.7 Opus fast – 46

12. Minimax m2.7 – 36

13. Gemma4-e4b (Local rx6700) – 32

14. Gemma4-26b (Local CPU) – 18

Takeaway

Even the best LLMs default to the path of least resistance: "just make it work." If you want native-feeling, fully integrated UX, you cannot rely on the model's implicit knowledge; you have to explicitly prompt for it.

I've published the full leaderboard, the exact prompts used, the detailed scoring grid, and all the generated code in the GitHub repository here: https://github.com/guilamu/llms-wordpress-plugin-benchmark

I will be testing Level 2 prompt next, feeding the models a Wordpress+Gravity Forms reference file to see how they adapt.

Comments URL: https://news.ycombinator.com/item?id=47880678

Points: 3

# Comments: 2

New comment by guilamu in "No more Opus for Copilot Pro plan users"

guilamu — Mon, 20 Apr 2026 19:14:24 +0000

Indeed. I'm still sad thought :(

No more Opus for Copilot Pro plan users

guilamu — Mon, 20 Apr 2026 18:53:29 +0000

Article URL: https://github.blog/changelog/2026-04-20-changes-to-github-copilot-plans-for-individuals/

Comments URL: https://news.ycombinator.com/item?id=47838896

Points: 37

# Comments: 5

Reasons to think that the Claude Mythos announcement was overblown

guilamu — Thu, 09 Apr 2026 20:26:41 +0000

Article URL: https://garymarcus.substack.com/p/three-reasons-to-think-that-the-claude

Comments URL: https://news.ycombinator.com/item?id=47709400

Points: 2

# Comments: 0

New comment by guilamu in "Proton Meet isn't what they told you it was"

guilamu — Fri, 03 Apr 2026 10:08:54 +0000

"Proton Mail, one of the services he moved to, is ultimately controlled by the US Gov,"

Would you mind elaborating, pretty please?

New comment by guilamu in "People inside Microsoft are fighting to drop mandatory Microsoft Account"

guilamu — Sat, 28 Mar 2026 12:29:04 +0000

Why not just get the iso, install, activate with massgravel and be done for life?

New comment by guilamu in "People inside Microsoft are fighting to drop mandatory Microsoft Account"

guilamu — Sat, 28 Mar 2026 07:05:53 +0000

That's true indeed, but Microsoft is not giving us any other option so why not use the good version at home? I mean what is the risk really?