Hacker News: dongobread

New comment by dongobread in "GLM-5: Targeting complex systems engineering and long-horizon agentic tasks"

dongobread — Wed, 11 Feb 2026 19:31:43 +0000

What a strangely hostile statement on an open weight model. Running like 20 benchmark evaluations isn't trivial by itself, and even updating visuals and press statements can take a few days at a tech company. It's literally been 5 days since this "new generation" of models released. GPT-5.3(-codex) can't even be called via API, so it's impossible to test for some benchmarks.

I notice the people who endlessly praise closed-source models never actually USE open weight models, or assume their drop-in prompting methods and workflow will just work for other model families. Especially true for SWEs who used Claude Code first and now think every other model is horrible because they're ONLY used to prompting Claude. It's quite scary to see how people develop this level of worship for a proprietary product that is openly distrusting of users. I am not saying this is true or not of the parent poster, but something I notice in general.

As someone who uses GLM-4.7 a good bit, it's easily at Sonnet 4.5 tier - have not tried GLM-5 but it would be surprising if it wasn't at Opus 4.5 level given the massive parameter increase.

New comment by dongobread in "Open models by OpenAI"

dongobread — Tue, 05 Aug 2025 22:33:31 +0000

It is absolutely awful at writing and general knowledge. IMO coding is its greatest strength by far.

New comment by dongobread in "Open models by OpenAI"

dongobread — Tue, 05 Aug 2025 22:32:41 +0000

How up to date are you on current open weights models? After playing around with it for a few hours I find it to be nowhere near as good as Qwen3-30B-A3B. The world knowledge is severely lacking in particular.

New comment by dongobread in "Why Austin Is Falling Out of Favor for Tech Workers"

dongobread — Mon, 07 Jul 2025 04:24:04 +0000

This is a little misleading. The data they quote is based on their previous article[1], which just uses this analysis[2] provided by a VC company. Funnily enough the same VC company put a seperate clickbaitish article just a year before that one, claiming the exact opposite findings (about startups ditching SV).

I would guess a lot of these annual trends are just random fluctuations in their dataset, though to be honest I wonder how they're even trying to estimate this kind of information.

[1] https://www.wsj.com/articles/austins-reign-as-a-tech-hub-mig...

[2] https://www.signalfire.com/blog/signalfire-state-of-talent-r...

[3] https://www.signalfire.com/blog/state-of-talent-tech-trends

New comment by dongobread in "Meta invests $14.3B in Scale AI to kick-start superintelligence lab"

dongobread — Fri, 13 Jun 2025 21:04:05 +0000

The corporate politics at Meta is the result of Zuck's own decisions. Even in big tech, Meta is (along with Amazon) rather famous for its highly political and backstabby culture.

This is because these two companies have extremely performance-review oriented cultures where results need to be proven every quarter or you're grounds for laying off.

Labs known for being innovative all share the same trait of allowing researchers to go YEARS without high impact results. But both Meta and Scale are known for being grind shops.

New comment by dongobread in "Trump temporarily drops tariffs to 10% for most countries"

dongobread — Wed, 09 Apr 2025 20:57:00 +0000

The US has crashed its own stock market, tanked its own government's approval ratings, and had its own business leaders speak out against the government. This definitely does not increase leverage.

New comment by dongobread in "Andrew Gelman: Is marriage associated with happiness for men or for women?"

dongobread — Mon, 02 Sep 2024 21:31:20 +0000

The paragraph immediately after that paragraph explains that the study was based off faulty analysis (and links to the below article).

https://www.vox.com/future-perfect/2019/6/4/18650969/married...

New comment by dongobread in "GPT-4 LLM simulates people well enough to replicate social science experiments"

dongobread — Thu, 08 Aug 2024 01:30:08 +0000

I'm very skeptical on this, the paper they linked is not convincing. It says that GPT-4 is correct at predicting the experiment outcome direction 69% of the time versus 66% of the time for human forecasters. But this is a silly benchmark because people are not trusting human forecasters in the first place, that's the whole purpose for why the experiment is run. Knowing that GPT-4 is slightly better at predicting experiments than some human guessing doesn't make it a useful substitute for the actual experiment.

New comment by dongobread in "XLSTMTime: Long-Term Time Series Forecasting with xLSTM"

dongobread — Tue, 16 Jul 2024 22:20:16 +0000

They definitely would and do, the vast majority of time series work is not about asset prices or beating the stock market

New comment by dongobread in "XLSTMTime: Long-Term Time Series Forecasting with xLSTM"

dongobread — Tue, 16 Jul 2024 21:39:14 +0000

I think what you say is true when comparing transformers to CNNs/RNNs, but not to MLPs.

Transformers, RNNs, and CNNs are all techniques to reduce parameter count compared to a pure-MLP model. If you took a transformer model and replaced each self-attention layer with a linear layer+activation function, you'd have a pure MLP model that can model every relationship the transformer does, but can model more possible relationships as well (but at the cost of tons more parameters). MLPs are more powerful/scalable but transformers are more efficient.

Compared to MLPs, transformers save on parameter count by skimping on the number of parameters devoted to modeling the relationship between tokens. This works in language modeling, where relationships between tokens isn't that important - you can jumble up the words in this sentence and it still mostly makes sense. This doesn't work in time series, where relationships between tokens (timesteps) is the most important thing of all. The LTSF paper linked in the OP paper also mentions this same problem: https://arxiv.org/pdf/2205.13504 (see section 1)

New comment by dongobread in "XLSTMTime: Long-Term Time Series Forecasting with xLSTM"

dongobread — Tue, 16 Jul 2024 20:43:00 +0000

From experience in payments/spending forecasting, I've found that deep learning generally underperform gradient-boosted tree models. Deep learning models tend to be good at learning seasonality but do not handle complex trends or shocks very well. Economic/financial data tends to have straightforward seasonality with complex trends, so deep learning tends to do quite poorly.

I do agree with this paper - all of the good deep learning time series architectures I've tried are simple extensions of MLPs or RNNs (e.g. DeepAR or N-BEATS). The transformer-based architectures I've used have been absolutely awful, especially the endless stream of transformer-based "foundational models" that are coming out these days.

New comment by dongobread in "Lots of people in education disagree with the premise of maximizing learning"

dongobread — Tue, 09 Jul 2024 21:47:36 +0000

I get what this piece is trying to say, but it's ignoring the fact that schools are trying to maximize learning with pupils who often don't want or care about learning (unlike with athletes or musicians who are generally learning their craft by choice).

A significant part of teaching disinterested students (not just in a grade school but in general) is about making the subject interesting enough that students will want to spend time on learning and continue to delve further in their free time.

If you're trying to teach someone web development, would you have them churn through a stack of predetermined bootcamp-style projects, or would let them try to build something they have personal interest in? I bet the latter method would turn out much better for the student in the long run.

New comment by dongobread in "Maker of RStudio launches new R and Python IDE"

dongobread — Fri, 28 Jun 2024 02:47:28 +0000

I'm not sure what would lead to you believe this. I've worked in the data science/ML space for over a decade now and I see the majority of pure analytics projects started in R, including at big tech companies I've worked at recently.

Of course, ML projects and other things that need to result in production-grade models are almost always done in Python. This is currently the most visible form of "data project" due to all the ML/AI hype, but it is far from the only data work going on.

New comment by dongobread in "Gemma 2: Improving Open Language Models at a Practical Size [pdf]"

dongobread — Thu, 27 Jun 2024 18:15:17 +0000

The knowledge distillation is very interesting but generating trillions of outputs from a large teacher model seems insanely expensive. Is this really more cost efficient than just using that compute instead for training your model with more data/more epochs?

New comment by dongobread in "Why we no longer use LangChain for building our AI agents"

dongobread — Fri, 21 Jun 2024 01:49:48 +0000

Langchain feels very much like shovelware that was created for the sole purpose of parting VCs of their money. At one point the codebase had a "prompt template" class that literally just called Python's f-string.

New comment by dongobread in "Tokyo government to launch dating app in bid to boost birth rate"

dongobread — Tue, 18 Jun 2024 03:03:36 +0000

I'm skeptical of that. Most western Europe countries have similarly low birth rates to Japan despite having some of the lowest working hours in the world.

New comment by dongobread in "Ask HN: What's the best way to learn machine learning in 2024?"

dongobread — Fri, 24 May 2024 15:17:58 +0000

Assuming you already know some basic linear algebra and calculus, know Python (or R), and have a decent-but-not-advanced grasp of statistics, I'd recommend working through these books. They are very readable and focus on intuitive understanding/practical applications, but give enough technical foundation for you to jump into more specific subfields if needed.

Stats & ML - https://www.statlearning.com/

Deep Learning - https://udlbook.github.io/udlbook/

Reinforcement Learning - https://web.stanford.edu/class/psych209/Readings/SuttonBarto...

As with anything else, people usually fail to learn ML not because of content quality but because of lack of effort/time/consistency. Take handwritten notes, solve exercises, etc., and expect to spend at least a hundred hours on each book.

New comment by dongobread in "Multi AI agent systems using OpenAI's assistants API"

dongobread — Sat, 18 May 2024 00:36:31 +0000

We tried using a multi-agent system for a complex NLP-type task and we found:

- Too many errors that just propogate on top of each other, if a single agent in the chain generates something even a little bit off then the whole system goes off the rails.

- You often end up having to pass a massive amount of shared context to every agent which just increases the cost dramatically.

Curiously enough we had an architect from OpenAI tell us the same thing about agent systems a few days ago (our company is a big spender so they serve a consulting function), so I don't think anybody is really finding success with multi-agent systems currently. IMO the core tech is nowhere near good enough yet.

New comment by dongobread in "Falcon 2"

dongobread — Mon, 13 May 2024 15:42:55 +0000

Their benchmark results seem roughly on par with Mistral 7B and Llama 3 8B, which hardly seems that great given the increase in model size.

https://huggingface.co/tiiuae/falcon-11B

https://huggingface.co/meta-llama/Meta-Llama-3-8B

https://mistral.ai/news/announcing-mistral-7b/

New comment by dongobread in "Stack Overflow simply bans folks who don't want their advice used to train AI"

dongobread — Fri, 10 May 2024 16:35:08 +0000

Legality aside, I think the "payment" people get from posting free knowledge on the Internet is the human connection, and the satisfaction of knowing that other people are reading and appreciating it directly.

Injecting an LLM middleman between your post and the end user changes this dynamic quite a bit - without the human component, the feeling is that you're just doing unpaid labor for a profit-oriented company (OpenAI).