Hacker News: d3m0t3p

New comment by d3m0t3p in "AGENTS.md outperforms skills in our agent evals"

d3m0t3p — Thu, 29 Jan 2026 23:49:17 +0000

Yea but the goal it not to bloat the context space. Here you "waste" context by providing non usefull information. What they did instead is put an index of the documentation into the context, then the LLM can fetch the documentation. This is the same idea that skills but it apparently works better without the agentic part of the skills. Furthermore instead of having a nice index pointing to the doc, They compressed it.

New comment by d3m0t3p in "I asked AI researchers and economists about SWE career strategies given AI"

d3m0t3p — Mon, 29 Dec 2025 12:30:03 +0000

Same, Firefox iOS

New comment by d3m0t3p in "History LLMs: Models trained exclusively on pre-1913 texts"

d3m0t3p — Thu, 18 Dec 2025 23:51:29 +0000

The model is fined tuned for chat behavior. So the style might be due to - Fine tuning - More Stylised text in the corpus, english evolved a lot in the last century.

New comment by d3m0t3p in "F-35 Fighter Jet's C++ Coding Standards [pdf]"

d3m0t3p — Sun, 07 Dec 2025 22:10:07 +0000

Is that really the only thing you managed to remember ?

New comment by d3m0t3p in "Nvidia DGX Spark: When benchmark numbers meet production reality"

d3m0t3p — Sun, 26 Oct 2025 20:59:14 +0000

Because the ML ecosystem is more mature on the NVidia side. Software-wise the cuda platform is more advanced. It will be hard for AMD to catch up. It is good to see competition tho.

New comment by d3m0t3p in "The Missing Semester of Your CS Education (2020)"

d3m0t3p — Sat, 25 Oct 2025 10:57:06 +0000

In my own studies, software engineering was mostly about structurig code, coding pattern such as visitor, singleton etc. I.E how to create a maintainable codebase

New comment by d3m0t3p in "How does gradient descent work?"

d3m0t3p — Tue, 07 Oct 2025 21:42:29 +0000

Would you have some literature about that ?

New comment by d3m0t3p in "How does gradient descent work?"

d3m0t3p — Tue, 07 Oct 2025 20:11:18 +0000

This sounds a lot like what the Muon / Shampoo optimizer do.

New comment by d3m0t3p in "Apertus: An open, transparent, multilingual language model"

d3m0t3p — Tue, 02 Sep 2025 11:02:02 +0000

Interesting to see that they enforce retroactive opt out for data collection. I wonder how they do that, what if the model is already trained with your data and you opt out.

New comment by d3m0t3p in "Open models by OpenAI"

d3m0t3p — Tue, 05 Aug 2025 20:38:31 +0000

You can batch only if you have distinct chat in parallel,

New comment by d3m0t3p in "Adding lookbehinds to rust-lang/regex"

d3m0t3p — Tue, 15 Jul 2025 16:50:57 +0000

Nice to see a master thesis highlighted on the research groupe page

New comment by d3m0t3p in "Cognition (Devin AI) to Acquire Windsurf"

d3m0t3p — Mon, 14 Jul 2025 20:48:13 +0000

Your first link is (in my opinion) highly biased in the samples they choose, they hired maintainers from open-source repos (people with multi years of experience, on their specific repo).

So indeed, IF you are in that case: Many years on the same project with multiple years experience then it is not usefull, otherwise it might be. This means it might be usefull for junior and for experienced devs who are switching projects. It is a tool like any other, indeed if you have a workflow that you optimized through years of usage it won't help.

New comment by d3m0t3p in "Show HN: Refine – A Local Alternative to Grammarly"

d3m0t3p — Mon, 14 Jul 2025 08:08:52 +0000

It is Gemma 3n, I can't give feedback yet on the battery hit, But I would not expect anything bad as these models have been developed for much smaller devices (Phones)

New comment by d3m0t3p in "ETH Zurich and EPFL to release a LLM developed on public infrastructure"

d3m0t3p — Sat, 12 Jul 2025 10:13:25 +0000

Hey, really cool project, I’m excited to see the outcome. Is there a blog / paper summarizing how you are doing it ? Also which research group is currently working on it at eth ?

New comment by d3m0t3p in "A non-anthropomorphized view of LLMs"

d3m0t3p — Sun, 06 Jul 2025 23:46:23 +0000

Do they ? LLM embedd the token sequence N^{L} to R^{LxD}, we have some attention and the output is also R^{LxD}, then we apply a projection to the vocabulary and we get R^{LxV} we get therefore for each token a likelihood over the voc. In the attention, you can have Multi Head attention (or whatever version is fancy: GQA,MLA) and therefore multiple representation, but it is always tied to a token. I would argue that there is no hidden state independant of a token.

Whereas LSTM, or structured state space for example have a state that is updated and not tied to a specific item in the sequence.

I would argue that his text is easily understandable except for the notation of the function, explaining that you can compute a probability based on previous words is understandable by everyone without having to resort to anthropomorphic terminology

New comment by d3m0t3p in "Occurences of swearing in the Linux kernel source code over time"

d3m0t3p — Mon, 16 Jun 2025 08:56:52 +0000

You can check company names too ! It's interesting to see that by default, the graph shows google,apple. But adding meta, and IBM really changes the plot.

Meta went from 2K to 10K+ from 2018 to 2025. While IBM seems to have stopped contributing in 2008. Since they the merging with RedHat, I would have expected to see them increase again but none of RedHat / IBM seems to have increase. https://www.vidarholen.net/contents/wordcount/#redhat,oracle... Not sure if their name appearing means that they are contributing tho.

Really cool project,

New comment by d3m0t3p in "Show HN: WhatsApp MCP Server"

d3m0t3p — Mon, 31 Mar 2025 11:23:38 +0000

Apparently this is the case: https://github.com/tulir/whatsmeow/discussions/199

New comment by d3m0t3p in "Ask HN: Who is hiring? (March 2025)"

d3m0t3p — Mon, 03 Mar 2025 23:30:33 +0000

Hi, do you offer visa / allow remote from the EU (GMT+2)

New comment by d3m0t3p in "CTO / cofounder exit deal after 1.5y at 600k revenue without SHA"

d3m0t3p — Sun, 05 Jan 2025 13:07:21 +0000

Would you mind sharing your company name? I'm a master's student in AI, and after finishing my master's thesis at IBM this summer, I'll be looking for jobs.

New comment by d3m0t3p in "Beyond Gradient Averaging in Parallel Optimization"

d3m0t3p — Tue, 31 Dec 2024 01:16:03 +0000

Well, it's just like stochastic gradient descent, if you think about it. The normal gradient descent is computed using the whole training set. The stochastic gradient is trained on a batch (a subset of the training set), and in the distributed case, we compute two batches at once by doing the gradient on each in parallel. The intuition works IMO, but indeed, having the first batch update and then the second, is not equal to having the mean update.

This is indeed super cool !