Hacker News: Jianghong94

New comment by Jianghong94 in "Mercury 2: Fast reasoning LLM powered by diffusion"

Jianghong94 — Wed, 25 Feb 2026 19:15:38 +0000

Honestly I don't understand why they/any fast-and-error-prone model position themselves as coding agents; my experience tells me that I'd much rather working with a slow-but-correct model and let it run longer session than handholding a fast-but-wrong model.

New comment by Jianghong94 in "I'm not worried about AI job loss"

Jianghong94 — Fri, 13 Feb 2026 21:46:02 +0000

This. At this point AI/LLM/Claude Code is still a power user tool; the more you know about your domain + the more you're willing to reasonably use it, the more gain you have.

That being said the real danger is not coming from AI today, it's more C-suites believing AI can just zero shot any problem you throw at it.

New comment by Jianghong94 in "I'm not worried about AI job loss"

Jianghong94 — Fri, 13 Feb 2026 21:35:50 +0000

Well, like I said, there're hidden incentives behind the scene; in my case, the hidden incentive is that, the requester/client is one of the company's subpar broker, and PM probably decided to just offer an average level of commitment, not going above and beyond. Hence the plan was to do exactly what the broker want even though that was messy and inferior. You can't just write down that kind of motivation on paper anywhere.

--- I said it because I did the analysis, and realized that if I implement the original version, which basically is a crazy way to iteratively solve the MIP problem, it's much harder to reason with internally, and much harder to code correctly. But obviously it keep the broker happy (the developer is doing exactly what I said)

New comment by Jianghong94 in "I'm not worried about AI job loss"

Jianghong94 — Fri, 13 Feb 2026 20:00:46 +0000

Maybe I'm being naive here, but for AI (heck, for any good algorithm) to work well, you need some at least loosely-clearly defined objectives. I assume it's much more straightforward in semi, but there're many industries, once you get into the details, all kinds of incentives start to disalign and I doubt AI could understand all kinds of nuances.

E.g. once I was tasked to build a new matching algorithm for a trading platform, and upon fully understanding of the specs I realized it can be interpreted as a mixed integer programming problem; the idea got shot down right away because PM don't understand it. There're all kinds of limiting factors once you get into the details.

New comment by Jianghong94 in "Is 2026 next year?"

Jianghong94 — Tue, 02 Dec 2025 20:17:28 +0000

I believe so, see my result with Haiku extended thinking on. I think the weights are just too biased towards blurping out the majority of the training data of 'next year is xxx'. Interesting problem to solve indeed.

New comment by Jianghong94 in "Is 2026 next year?"

Jianghong94 — Tue, 02 Dec 2025 20:15:01 +0000

I think the current trick for LLM API provider is to insert the today is $DATE into the system prompt, so maybe it's worthwhile to do that and see if that automatically fixes those OSS models?

New comment by Jianghong94 in "Is 2026 next year?"

Jianghong94 — Tue, 02 Dec 2025 20:13:46 +0000

I did a similar test especially with the extended thinking on and off for Haiku, and once you have extended thinking on, the result is more or less the same as Sonnet.

Thought process: The user is asking if 2026 is next year. According to the context, today's date is Tuesday, December 02, 2025. So the current year is 2025. That means next year would be 2026. So yes, 2026 is next year.Yes, 2026 is next year. Actual resp Since we're currently in December 2025, 2026 is just about a month away.

New comment by Jianghong94 in "The lazy Git UI you didn't know you need"

Jianghong94 — Thu, 13 Nov 2025 00:39:31 +0000

I don't think JB UIs been changing that much, albeit I haven't been working in the industry long enough. I think the last major UI redo was like 2,3 years ago and most of it is to make UI more compact, and I definitely like it. That being said YMMV.

IMHO another good thing of using JB IDE git UI (especially in a corporate setting) instead of using another software is that everyone has the IDE so it's easier to collaborate. Imagine if you're helping a junior member debug their local branch and they don't have lazy git installed.

New comment by Jianghong94 in "The lazy Git UI you didn't know you need"

Jianghong94 — Tue, 11 Nov 2025 15:07:05 +0000

Wait, no one mentions the default JetBrains IDE git UI? I mean, I get it if you're working from another IDE/text editor that doesn't have good git UI support out of the box, but JB's git UI is reasonably good enough that I don't want anything else.

Things that I use (and I like): 1. quick checkout to another branch and automatically stash and unstash your local changes; when I just need to inspect code elsewhere I find it really useful. My changes are small so I can always remember to stash them later; 2. compare branch/commit etc via UI; again I know you can do that in git diff, but then you would need to know the command and the commit SHA to compare; in UI it comes in really handy, just select the branch or commits you want to compare and that's it. I've seen my coworkers trying to come up with the command and I just say: use IDE and a couple of clicks they got it working. 3. filter commits by user and by folder.

New comment by Jianghong94 in "What we talk about when we talk about sideloading"

Jianghong94 — Tue, 28 Oct 2025 21:02:06 +0000

An even more grotesque practice is to charge a stratosphere level premium for the product itself AND put its control behind a subscription e.g. 8sleep

New comment by Jianghong94 in "Living Dangerously with Claude"

Jianghong94 — Fri, 24 Oct 2025 04:00:01 +0000

this seem solvable if the whitelisting just allows regex

New comment by Jianghong94 in "OpenAI acquires Sky.app"

Jianghong94 — Fri, 24 Oct 2025 03:52:01 +0000

My take is that it's a standalone business consideration: Apple users are more inclined to pay for software (definitely the case for iPhone vs. Android, although I haven't found a source for Windows).

New comment by Jianghong94 in ""Vibe code hell" has replaced "tutorial hell" in coding education"

Jianghong94 — Fri, 10 Oct 2025 16:37:29 +0000

OR problems are hard because whoever try to vibe coding it probably don't realize they fall into a specific algorithm and can prompt llm to do thatl; what's worse is that even if you tell them so they won't be able to understand the math behind it and would much prefer their vide coding solution.

New comment by Jianghong94 in "AI 2027"

Jianghong94 — Fri, 04 Apr 2025 17:36:52 +0000

Not only does the article claim that when we get to self-improving ai it becomes generally intelligent, it's assuming that AI is pretty close right now:

> It’s good at this due to a combination of explicit focus to prioritize these skills, their own extensive codebases they can draw on as particularly relevant and high-quality training data, and coding being an easy domain for procedural feedback.

> OpenBrain continues to deploy the iteratively improving Agent-1 internally for AI R&D. Overall, they are making algorithmic progress 50% faster than they would without AI assistants—and more importantly, faster than their competitors.

> what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.

To me, claiming today's AI IS capable of such thing is too hand-wavy. And I think that's the crux of the article.

New comment by Jianghong94 in "AI 2027"

Jianghong94 — Fri, 04 Apr 2025 17:26:35 +0000

Putting the geopolitical discussion aside, I think the biggest question lies in how likely the *current paradigm LLM* (think of it as any SOTA stock LLM you get today, e.g., 3.7 sonnet, gemini 2.5, etc) + fine-tuning will be capable of directly contributing to LLM research in a major way.

To quote the original article,

> OpenBrain focuses on AIs that can speed up AI research. They want to win the twin arms races against China (whose leading company we’ll call “DeepCent”)16 and their US competitors. The more of their research and development (R&D) cycle they can automate, the faster they can go. So when OpenBrain finishes training Agent-1, a new model under internal development, it’s good at many things but great at helping with AI research. (footnote: It’s good at this due to a combination of explicit focus to prioritize these skills, their own extensive codebases they can draw on as particularly relevant and high-quality training data, and coding being an easy domain for procedural feedback.)

> what do we mean by 50% faster algorithmic progress? We mean that OpenBrain makes as much AI research progress in 1 week with AI as they would in 1.5 weeks without AI usage.

> AI progress can be broken down into 2 components:

> Increasing compute: More computational power is used to train or run an AI. This produces more powerful AIs, but they cost more.

> Improved algorithms: Better training methods are used to translate compute into performance. This produces more capable AIs without a corresponding increase in cost, or the same capabilities with decreased costs.

> This includes being able to achieve qualitatively and quantitatively new results. “Paradigm shifts” such as the switch from game-playing RL agents to large language models count as examples of algorithmic progress.

> Here we are only referring to (2), improved algorithms, which makes up about half of current AI progress.

---

Given that the article chose a pretty aggressive timeline (the algo needs to contribute late this year so that its research result can be contributed to the next gen LLM coming out early next year), the AI that can contribute significantly to research has to be a current SOTA LLM.

Now, using LLM in day-to-day engineering task is no secret in major AI labs, but we're talking about something different, something that gives you 2 extra days of output per week. I have no evidence to either acknowledge or deny whether such AI exists, and it would be outright ignorant to think no one ever came up with such an idea or is trying such an idea. So I think it goes down into two possibilities:

1. This claim is made by a top-down approach, that is, if AI reaches superhuman in 2027, what would be the most likely starting condition to that? And the author picks this as the most likely starting point, since the authors don't work in major AI lab (even if they do they can't just leak such trade secret), the authors just assume it's likely to happen anyway (and you can't dismiss that). 2. This claim is made by a bottom-up approach, that is the author did witness such AI exists to a certain extent and start to extrapolate from there.

New comment by Jianghong94 in "AI 2027"

Jianghong94 — Fri, 04 Apr 2025 16:54:38 +0000

Well based on what I'm reading, the OP's intent is that, not all (hence 'fully') validation, if not most of, can be done in-silico. I think we all agree that and that's the major bottleneck making agents useful - you have to have human-in-the-loop to closely guardrail the whole process.

Of course you can get a lot of mileage via synthetically generated CoT but does that lead to LLM speed up developing LLM is a big IF.

New comment by Jianghong94 in "AI agents: Less capability, more reliability, please"

Jianghong94 — Mon, 31 Mar 2025 16:16:05 +0000

Yep that's what I've been thinking. This shouldn't be that hard, at this point LLMs should already have all the 'rules' (e.g. credit card A buys flight X give you m point which can be converted into n miles) in their params or can easily query the web to get it out. Dev need to encode the whole thing into a decision mechanism and once executed ask LLM to chase down the specific path (e.g. bombard ticket office with emails).

New comment by Jianghong94 in "AI agents: Less capability, more reliability, please"

Jianghong94 — Mon, 31 Mar 2025 15:52:12 +0000

Now THAT's the workflow I'd like to see AI agent automate, streamline and democratize for everybody.

New comment by Jianghong94 in "AI Agents: Less Capability, More Reliability, Please"

Jianghong94 — Mon, 31 Mar 2025 15:50:00 +0000

Superhuman results 1/10 are, in fact, a very strong reliability guarantee (maybe not up to today's nth 9 decimal standard that we are accustomed to, but probably much higher than any agent in real-world workflow).

New comment by Jianghong94 in "FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI"

Jianghong94 — Mon, 11 Nov 2024 00:14:30 +0000

I guess the primary reason is that the answers must be numbers that can be verified easily. Otherwise, you just flood the validator with long LLM reasoning that's hard to verify. People have been proposing using LEAN as a medium for answers but AFAIK even LEAN is not mainstream in the general math community, so there's always trade-offs.

Also, coming up with good problems is an art in its own right; the Soviets was famous for institutionalizing anti-Semitism via special math puzzles for Jews in Moscow Univerisity entrance exams. The questions are constructed as such that are hard to solve but have some elementary solutions to divert criticism.