Hacker News: couAUIA

New comment by couAUIA in "Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?"

couAUIA — Sat, 18 Jul 2026 12:51:18 +0000

Yes I agree, but I actually did a lot more runs, with different prompts, different times ect... And each time /goal had a small or insignificant impact

New comment by couAUIA in "Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?"

couAUIA — Sat, 18 Jul 2026 12:27:50 +0000

well thank you so much for this

New comment by couAUIA in "Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?"

couAUIA — Sat, 18 Jul 2026 11:01:23 +0000

A deepdive on the /goal effect on a problem literally made for this.

Fable 5 vs. GPT-5.6 Sol on an NP-Hard Problem: Does /goal help?

couAUIA — Sat, 18 Jul 2026 11:00:29 +0000

Article URL: https://charlesazam.com/blog/fable-5-gpt-5-6-sol-goal/

Comments URL: https://news.ycombinator.com/item?id=48956879

Points: 236

# Comments: 115

Gemini 3.5 Pro delays due to coding performance, upgraded Flash model in testing

couAUIA — Fri, 17 Jul 2026 08:23:29 +0000

Article URL: https://9to5google.com/2026/07/16/gemini-3-5-pro-delays/

Comments URL: https://news.ycombinator.com/item?id=48944680

Points: 2

# Comments: 0

New comment by couAUIA in "Kimi K3 is ranked 3rd on artificial analysis, only 2 points behind Sol"

couAUIA — Thu, 16 Jul 2026 19:47:10 +0000

It seems that it is very strong at Frontend, I wonder how good are its multimodal capabilities. This is very important for me and not well represented in benchmarks in my opinion.

Kimi K3 is ranked 3rd on artificial analysis, only 2 points behind Sol

couAUIA — Thu, 16 Jul 2026 19:23:41 +0000

Article URL: https://artificialanalysis.ai

Comments URL: https://news.ycombinator.com/item?id=48939066

Points: 8

# Comments: 2

I ran an AI nuclear engineering department for a week

couAUIA — Sat, 04 Jul 2026 19:42:54 +0000

Article URL: https://charlesazam.com/blog/ai-engineering-department/

Comments URL: https://news.ycombinator.com/item?id=48788296

Points: 2

# Comments: 0

New comment by couAUIA in "White House lifts ban on Anthropic models"

couAUIA — Wed, 01 Jul 2026 07:21:45 +0000

Please use the sharing tools found via the share button at the top or side of articles. Copying articles to share with others is a breach of FT.com T&Cs and Copyright Policy. Email licensing@ft.com to buy additional rights. Subscribers may share up to 10 or 20 articles per month using the gift article service. More information can be found at https://help.ft.com/faq/gifting-and-sharing-an-article/what-.... https://www.ft.com/content/137ddb71-852f-438c-ad76-25e2dc434...

The US Department of Commerce told Anthropic it had lifted the ban on foreigners accessing the company’s Mythos and Fable models on Tuesday evening, according to people with knowledge of the matter. The decision allows the AI group to re-release its latest model, Fable 5, to the general public.

White House lifts ban on Anthropic models

couAUIA — Wed, 01 Jul 2026 07:21:45 +0000

Article URL: https://www.ft.com/content/137ddb71-852f-438c-ad76-25e2dc43486b

Comments URL: https://news.ycombinator.com/item?id=48743332

Points: 2

# Comments: 1

Hardware Engineering as Code

couAUIA — Tue, 23 Jun 2026 14:09:44 +0000

Article URL: https://github.com/charles-azam/pyforge

Comments URL: https://news.ycombinator.com/item?id=48645242

Points: 1

# Comments: 0

Scientific documents should be written in Python (2022)

couAUIA — Mon, 22 Jun 2026 20:46:31 +0000

Article URL: https://github.com/charles-azam/pyforge

Comments URL: https://news.ycombinator.com/item?id=48635935

Points: 1

# Comments: 0

Mistral Compute? I hear Mistral Cloud

couAUIA — Thu, 28 May 2026 08:30:21 +0000

Article URL: https://mistral.ai/products/compute

Comments URL: https://news.ycombinator.com/item?id=48306243

Points: 13

# Comments: 0

Proton Pass for AI Agents

couAUIA — Wed, 27 May 2026 20:16:33 +0000

Article URL: https://proton.me/blog/pass-access-tokens

Comments URL: https://news.ycombinator.com/item?id=48299979

Points: 3

# Comments: 0

New comment by couAUIA in "Incident with Actions and Pages"

couAUIA — Tue, 26 May 2026 12:19:28 +0000

LoL they added "Copilot AI Model Providers" in githubstatus and it has 100% up time.

Thanks for pointing out that nobody is using that thing

Ask HN: Should I continue this project ? (Being able to change AI harness)

couAUIA — Tue, 05 May 2026 17:58:21 +0000

Article URL: https://github.com/charles-azam/OmniAgents

Comments URL: https://news.ycombinator.com/item?id=48026121

Points: 1

# Comments: 1

Evaluate Your Own RAG: Why Best Practices Failed Us

couAUIA — Mon, 16 Feb 2026 14:02:22 +0000

Article URL: https://charlesazam.com/blog/rag/

Comments URL: https://news.ycombinator.com/item?id=47035041

Points: 2

# Comments: 0

New comment by couAUIA in "GLM-5 topped the coding benchmarks. Then I used it"

couAUIA — Sat, 14 Feb 2026 20:21:27 +0000

TL;DR: GLM-5 tops coding benchmarks. I tested it on an unpublished NP-hard optimization problem (KIRO) and 89-task Terminal-Bench. Best case: competitive. Typical case: 30% invalid output, every trial timed out, and two identical runs could produce a valid solution or complete garbage. Zhipu AI reports 56% on Terminal-Bench; I got 40%.

GLM-5 topped the coding benchmarks. Then I used it

couAUIA — Sat, 14 Feb 2026 20:21:27 +0000

Article URL: https://charlesazam.com/blog/glm5-benchmark-reality/

Comments URL: https://news.ycombinator.com/item?id=47018003

Points: 5

# Comments: 1

New comment by couAUIA in "I benchmarked 4 coding agents on an NP-hard problem I solved 8 years ago"

couAUIA — Thu, 12 Feb 2026 14:44:20 +0000

I gave an unpublished fiber network optimization problem to Claude Code, Codex, Gemini CLI, and Mistral. The score is total fiber length (lower is better). A good human solution in 30 minutes: ~40,000. My best after days of C++: 34,123. Given one hour, Claude Code hit 34,061 — beating me by 62 points. A 7-word prompt hint improved every agent by 18-30%. About 15% of all trials produced completely invalid outputs.