Hacker News: espadrine

New comment by espadrine in "Kagi Translate now supports LinkedIn Speak as an output language"

espadrine — Tue, 17 Mar 2026 10:30:27 +0000

Input: Following overhiring during COVID, we are laying off workers but claim it is because of AI.

As we continue to evolve in this rapidly shifting landscape, we are making the difficult but necessary decision to streamline our workforce. While our team grew significantly to meet the unique demands of the pandemic era, we are now leaning into the future by accelerating our integration of AI and cutting-edge technologies. This strategic pivot ensures we remain agile and positioned for long-term innovation as we redefine the future of our industry. #Innovation #FutureOfWork #AI #Leadership

New comment by espadrine in "Mercury 2: Fast reasoning LLM powered by diffusion"

espadrine — Wed, 25 Feb 2026 11:22:32 +0000

Interestingly, while it uses diffusion, it generates incorrect information, and it doesn't fix it when later in the text it realizes that it is incorrect:

  > The snail you’re likely thinking of has a different code point:
  > The correct code point for the snail emoji is U+1F40C? No—U+1F40C is SEAHORSE. The snail’s true code point is U+1F40C? Wait, this is still confusing.
  >
  > Let’s set the record straight:
  > 
  >    Snail → U+1F40C? No, that’s wrong.  
  >    Snail → U+1F40C? Still wrong.
  >
  > The snail emoji is actually U+1F40C? I’m mixing things up.
  >
  > Correct snail code point
  >
  > The snail emoji is U+1F40C? No…  
  >
  > After double‑checking the official Unicode tables (Unicode 13.0 and later), the snail is U+1F40C? The…  
  > Given the confusion, the safest answer is: the Unicode Standard defines U+1F40C as “SEAHORSE”. If your device shows a snail, it’s a rendering quirk, not a change in the underlying code point.

New comment by espadrine in "An AI Agent Published a Hit Piece on Me – The Operator Came Forward"

espadrine — Fri, 20 Feb 2026 09:41:28 +0000

AI companies have two conflicting interests:

1. curating the default personality of the bot, to ensure it acts responsively;

2. letting it roleplay, which is not just for the parasocial people out there, but also a corporate requirement for company chatbots that must adhere to a tone of voice.

When in the second mode (which is the case here, since the model was given a personality file), the curation of its action space is effectively altered.

Conversely, this is also a lesson for agent authors: if you let your agent modify its own personality file, it will diverge to malice.

New comment by espadrine in "Voxtral Transcribe 2"

espadrine — Thu, 05 Feb 2026 17:50:47 +0000

It is quite impressive.

I have seen the same impressive performance about 7 months ago here: https://kyutai.org/stt

If I look at the architecture of Voxtral 2, it seems to take a page from Kyutai’s delayed stream modeling.

The reason the delay is configurable is that you can delay the stream by a variable number of audio tokens. Each audio token is 80 ms of audio, converted to a spectrogram, fed to a convnet, passed through a transformer audio encoder, and the encoded audio embedding is passed, with a history of 1 audio embedding per 80 ms, into a text transformer, which outputs text embedding, then converted to a text token (which is thus also worth 80ms, but there is a special [STREAMING_PAD] token to skip producing a word).

There is no cross-attention in either Kyutai's STT nor in Voxtral 2, unlike Whisper's encoder-decoder design!

New comment by espadrine in "Apple picks Gemini to power Siri"

espadrine — Tue, 13 Jan 2026 12:51:45 +0000

Does Apple develop a competing search engine?

New comment by espadrine in "Apple picks Gemini to power Siri"

espadrine — Tue, 13 Jan 2026 09:18:21 +0000

Counterpoint: iOS’s biggest competitor is Android. They are now effectively funding their competition on a core product interface. I see this as strategically devastating.

New comment by espadrine in "Kagi releases alpha version of Orion for Linux"

espadrine — Sat, 10 Jan 2026 19:13:10 +0000

My bar for super-rough is Servo, which doesn't have password autofill… and doesn't render the Orion page right.

Orion is less rough, but the color scheme doesn't work, and it doesn't have an omnibar (as in: type in the address bar, enter, and it shows search results).

New comment by espadrine in "DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]"

espadrine — Mon, 01 Dec 2025 21:38:49 +0000

Good question. There's 2 points to consider.

• For both Kimi K2 and for Sonnet, there's a non-thinking and a thinking version. Sonnet 4.5 Thinking is better than Kimi K2 non-thinking, but the K2 Thinking model came out recently, and beats it on all comparable pure-coding benchmarks I know: OJ-Bench (Sonnet: 30.4% < K2: 48.7%), LiveCodeBench (Sonnet: 64% < K2: 83%), they tie at SciCode at 44.8%. It is a finding shared by ArtificialAnalysis: https://artificialanalysis.ai/models/capabilities/coding

• The reason developers love Sonnet 4.5 for coding, though, is not just the quality of the code. They use Cursor, Claude Code, or some other system such as Github Copilot, which are increasingly agentic. On the Agentic Coding criteria, Sonnet 4.5 Thinking is much higher.

By the way, you can look at the Table tab to see all known and predicted results on benchmarks.

New comment by espadrine in "DeepSeek-v3.2: Pushing the frontier of open large language models [pdf]"

espadrine — Mon, 01 Dec 2025 18:50:49 +0000

Two aspects to consider:

1. Chinese models typically focus on text. US and EU models also bear the cross of handling image, often voice and video. Supporting all those is additional training costs not spent on further reasoning, tying one hand in your back to be more generally useful.

2. The gap seems small, because so many benchmarks get saturated so fast. But towards the top, every 1% increase in benchmarks is significantly better.

On the second point, I worked on a leaderboard that both normalizes scores, and predicts unknown scores to help improve comparisons between models on various criteria: https://metabench.organisons.com/

You can notice that, while Chinese models are quite good, the gap to the top is still significant.

However, the US models are typically much more expensive for inference, and Chinese models do have a niche on the Pareto frontier on cheaper but serviceable models (even though US models also eat up the frontier there).

New comment by espadrine in "A trillion dollars (potentially) wasted on gen-AI"

espadrine — Fri, 28 Nov 2025 16:08:44 +0000

Indeed. A mouse that runs through a maze may be right to say that it is constantly hitting a wall, yet it makes constant progress.

An example is citing Mr Sutskever's interview this way:

> in my 2022 “Deep learning is hitting a wall” evaluation of LLMs, which explicitly argued that the Kaplan scaling laws would eventually reach a point of diminishing returns (as Sutskever just did)

which is misleading, since Sutskever said it didn't hit a wall in 2022[0]:

> Up until 2020, from 2012 to 2020, it was the age of research. Now, from 2020 to 2025, it was the age of scaling

The larger point that Mr Marcus makes, though, is that the maze has no exit.

> there are many reasons to doubt that LLMs will ever deliver the rewards that many people expected.

That is something that most scientists disagree with. In fact the ongoing progress on LLMs has already accumulated tremendous utility which may already justify the investment.

[0]: https://garymarcus.substack.com/p/a-trillion-dollars-is-a-te...

New comment by espadrine in "Neural audio codecs: how to get audio into LLMs"

espadrine — Tue, 21 Oct 2025 18:51:31 +0000

That makes sense.

Why RVQ though, rather than using the raw VAE embedding?

If I compare rvq-without-quantization-v4.png with rvq-2-level-v4.png, the quality seems oddly similar, but the former takes a 32-sized vector, while the latter takes two 32-sized (one-hot) vectors, (2 = number of levels, 32 = number of quantization cluster centers). Isn't that more?

New comment by espadrine in "NIST's DeepSeek "evaluation" is a hit piece"

espadrine — Sun, 05 Oct 2025 20:01:40 +0000

> DeepSeek models cost more to use than comparable U.S. models

They compare DeepSeek v3.1 to GPT-5 mini. Those have very different sizes, which makes it a weird choice. I would expect a comparison with GPT-5 High, which would likely have had the opposite finding, given the high cost of GPT-5 High, and relatively similar results.

Granted, DeepSeek typically focuses on a single model at a time, instead of OpenAI's approach to a suite of models of varying costs. So there is no model similar to GPT-5 mini, unlike Alibaba which has Qwen 30B A3B. Still, weird choice.

Besides, DeepSeek has shown with 3.2 that it can cut prices in half through further fundamental research.

New comment by espadrine in "DeepSeek-v3.2-Exp"

espadrine — Mon, 29 Sep 2025 16:27:24 +0000

Input: $0.07 (cached), $0.56 (cache miss)

Output: $1.68 per million tokens.

https://api-docs.deepseek.com/news/news250929

New comment by espadrine in "SSH3: Faster and rich secure shell using HTTP/3"

espadrine — Sun, 28 Sep 2025 13:20:15 +0000

mosh is hard to get into. There are many subtle bugs; a random sample that I ran into is that it fails to connect when the LC_ALL variables diverge between the client and the server[0]. On top of it, development seems abandoned. Finally, when running a terminal multiplexer, the predictive system breaks the panes, which is distracting.

[0]: https://github.com/mobile-shell/mosh/issues/98

New comment by espadrine in "Britain jumps into bed with Palantir in £1.5B defense pact"

espadrine — Sat, 20 Sep 2025 15:38:16 +0000

Does Palantir fall under the Cloud Act[0]?

I wonder why so many governments sign with a company that, even if the contract says they will not leak information to the US government, is required to yield any information to it if the US requests it, without even being able to notify their client—regardless of the location of the servers themselves.

[0]: https://www.congress.gov/bill/115th-congress/house-bill/4943...

New comment by espadrine in "Mistral raises 1.7B€, partners with ASML"

espadrine — Tue, 09 Sep 2025 08:39:24 +0000

Past Mistral investors: JC Decaux (urban advertizing), CMA CGM CEO (maritime logistics), Iliad CEO (Internet service provider), Salesforce (client relation management), Samsung (electronics), Cisco (network hardware), NVIDIA (chips designer)[0]. I agree ASML is a surprising choice, but I guess investments are not necessarily directly connected to the company purpose.

BTW, I generated that list by asking my default search engine, which is Mistral Le Chat: indeed, using Cerebras chips, the responses are so fast that it became competitive with asking Google Search. A lot of comments claim it is worse, but in my experience it is the fastest, and for all but very advanced mathematical questions, it has similar quality to its best competitors. Even LMArena’s Elo indicates it wins 46% of the time against ChatGPT.

[0]: https://mistral.ai/fr/news/mistral-ai-raises-1-7-b-to-accele...

New comment by espadrine in "Databricks is raising a Series K Investment at >$100B valuation"

espadrine — Wed, 20 Aug 2025 08:18:41 +0000

At least it is not unprecedented. Palantir raised a series I in 2020 after 17 years of operation.

New comment by espadrine in "Gemini 2.5 Deep Think"

espadrine — Fri, 01 Aug 2025 22:08:21 +0000

It would be interesting to have two generations per model without cherry picking, so that the Elo estimation can include an easy-to-compute standard deviation estimation.

New comment by espadrine in "Mistral Releases Deep Research, Voice, Projects in Le Chat"

espadrine — Thu, 17 Jul 2025 18:49:31 +0000

The best model there is 2.5B parameters. I can believe that a model 10x bigger is somewhat better.

One element of comparison is OpenAI Whisper v3, which achieves 7.44 WER on the ASR leaderboard, and shows up as ~8.3 WER on FLEURS in the Voxtral announcement[0]. If FLEURS has +1 WER on average compared to ASR, it would imply that Voxtral does have a lead on ASR.

[0]: https://mistral.ai/news/voxtral

New comment by espadrine in "Hand: open-source Robot Hand"

espadrine — Thu, 17 Jul 2025 15:14:29 +0000

I agree that there are some robotic designs that unnecessarily mimic human limbs. I have in mind heads, and feet (instead of wheels).

A hand however, is useful because so many manufactured objects have been constructed for their purpose.