Hacker News: shenberg

New comment by shenberg in "PA bench: Evaluating web agents on real world personal assistant workflows"

shenberg — Thu, 26 Feb 2026 11:42:47 +0000

Using existing enterprise apps probably - this solution is scalable for the vendor and it's easier to sell using existing software as-is than to start out by writing new custom tools.

New comment by shenberg in "Elsevier shuts down its finance journal citation cartel"

shenberg — Mon, 23 Feb 2026 19:40:47 +0000

Mid-way I realized this was AI writing (took me a while), then I read a quote in the text about a comment that "The tragedy isn’t that they cheated; it’s that the system was designed to let them thrive for a decade before anyone bothered to look at the data." I didn't find this comment in EJMR, or anywhere on the internet except the OP post, for that matter.

New comment by shenberg in "Audio is the one area small labs are winning"

shenberg — Mon, 16 Feb 2026 10:02:34 +0000

Moshi was an amazing tech demo, building the entire stack from scratch in 6 months with a small team was an amazing show of skill: 7B text LLM data + training, emotive TTS for synth data generation (again model + data collection), synth data pipeline, novel speech codec, rust inference stack for low latency, audio LLM architecture incl. text "thoughts" stream which was novel.

But, this piece is a fluff piece: "underfunded" means a total of around $400 million ($330 million in the initial round, $70 million for Gradium). Compare to Elevenlabs who used a $2 million pre-seed for creating their initial product.

A bunch of other stuff there is disingenuous, like comparing their 7B model to Llama-3 405B (hint: the 7B model is a _lot_ dumber). There's also the outright lie: team of 4 made Moshi, which is corrected _in the same piece_ to 8 if you read enough.

New comment by shenberg in "Ask HN: Who wants to be hired? (February 2026)"

shenberg — Thu, 05 Feb 2026 11:18:50 +0000

Location: Paris, France (US citizen, EU resident)

Remote: Yes

Willing to relocate: Not until 2027

Technologies:

ML / DS: PyTorch, CUDA, distributed training & inference, performance profiling/optimization (audio & speech focus, some LLM inference acceleration)

Systems: C/C++/asm, low-level performance work, reliability/scale engineering

Backend / Infra: Python/Java/C# prod services, ETL / data pipelines, k8s (incl. operator work)

Roles: Tech Lead, Research Engineer (training and/or inference of large models)

Résumé/CV: https://www.linkedin.com/in/roeeshenberg/

Email: roee.shenberg@upai.dev

I’m a hands-on engineer who’s spent the last 6 years doing freelance ML + data science, primarily in audio/speech, and before that 10+ years in startups building and scaling production systems.

I’m looking for where research meets real systems: training and/or inference for large models, especially roles that value end-to-end ownership. Open to freelance engagements or full-time roles.

New comment by shenberg in "FlashAttention-T: Towards Tensorized Attention"

shenberg — Wed, 04 Feb 2026 12:15:41 +0000

There are two ingredients that don't fit in the "attention-is-kernel-smoothing" as far as I can tell: positional encoding and causal masking (another way to say positional encoding, I guess)

Also, Simplical attention is pretty much what the OP was going for, but the hardware lottery is such that it's gonna be pretty difficult to get competitive in terms of engineering, not that people aren't trying (e.g. https://arxiv.org/pdf/2507.02754)

New comment by shenberg in "Cyclic Subgroup Sum"

shenberg — Tue, 27 Jan 2026 08:59:42 +0000

I don't understand how using group-theory language to describe number-theoretic properties provides extra insight in this case (e.g. conjecture: all perfect numbers are even is more concise than the group-theoretic description given in the page). Can you expand on why you believe the tools of group theory have something to say about this? (e.g. for polynomial roots, the connection with symmetry groups comes from symmetries of factorized polynomials, while there's no obvious-to-me connection here as there is no unique-up-to-symmetry integer factorization)

New comment by shenberg in "Exe.dev"

shenberg — Sat, 27 Dec 2025 16:38:38 +0000

ssh exe.dev works

New comment by shenberg in "Ask HN: How are Markov chains so different from tiny LLMs?"

shenberg — Fri, 21 Nov 2025 09:17:15 +0000

The short and unsatisfying answer is that an LLM generation is a markov chain, except that instead of counting n-grams in order to generate the posterior distribution, the training process compresses the statistics into the LLM's weights.

There was an interesting paper a while back which investigated using unbounded n-gram models as a complement to LLMs: https://arxiv.org/pdf/2401.17377 (I found the implementation to be clever and I'm somewhat surprised it received so little follow-up work)

New comment by shenberg in "US declines to join more than 70 countries in signing UN cybercrime treaty"

shenberg — Thu, 30 Oct 2025 14:56:17 +0000

When countries like North Korea, which depends on cybercrime to fund itself, are signatories, you have to wonder whether this agreement means what its title says.

New comment by shenberg in "What would an efficient and trustworthy meeting culture look like?"

shenberg — Mon, 28 Jul 2025 08:38:26 +0000

The reality of meetings in most places I've seen is that key stakeholders have already formed an opinion beforehand, the meeting is a place to disseminate decisions that have already been made and align the organization.

New comment by shenberg in "Learnings from building AI agents"

shenberg — Thu, 26 Jun 2025 14:45:28 +0000

When I read "51% fewer false positives" followed immediately by "Median comments per pull request cut by half" it makes me wonder how many true positives they find. That's maybe unfair as my reference is automated tooling in the security world, where the true-positive/false-positive ratio is so bad that a 50% reduction in false positives is a drop in the bucket

New comment by shenberg in "Sam Altman said startups with $10M were 'hopeless' competing with OpenAI"

shenberg — Tue, 28 Jan 2025 17:58:08 +0000

The DeepSeek v3 model had a net training cost of >$5m for the final training run, the paper lists over 100 authors[1], meaning highly-paid engineers. This is also one of a sequence of models (v1, v2, math, coder) trained in order to build the institutional knowledge necessary to get to the frontier , and this ends up still far above the $10m mark. It's hardly a "trio of super-smart engineers".

[1] https://arxiv.org/abs/2412.19437v1

New comment by shenberg in "Israel, Hamas reach ceasefire deal to end 15 months of war in Gaza"

shenberg — Thu, 16 Jan 2025 12:09:11 +0000

That's really not true, e.g. the wikipedia page on population transfer in the Ottoman empire[1]. This dates way back to the Assyrian and Persian empries explicitly moving conquered peoples around in their empires in order to safeguard their rule. This book on population transfer in the Ottoman empire[2] explicitly states, with references, that the Ottomans habits were inherited from the steppe Turks, the Byzantines (=the Romans) and the Arabs.

[1] https://en.wikipedia.org/w/index.php?title=Population_transf... [2] https://websites.umich.edu/~gocek/Work/ja/Gocek.Muge.ja.popu...

New comment by shenberg in "Z-Library Helps Students to Overcome Academic Poverty, Study Finds"

shenberg — Thu, 21 Nov 2024 11:55:57 +0000

Anecdotally, a pro-audio software company I worked with had to fire 1/3 of the company when their copy-protection was cracked and sales tanked immediately afterwards, and recovered once a new copy-protection scheme was developed and applied. And just to be clear, software licenses in direct-to-user sales are not that company's only revenue stream (they sell hardware and software to OEMs).

This is to say, the evidence in this natural experiment points towards piracy reducing sales by a lot.

New comment by shenberg in "SpawELO – small free matchmaking system for LAN parties"

shenberg — Sun, 03 Nov 2024 17:09:55 +0000

Under the leaderboard tab, if the "Solution" column has an icon, it's clickable. 2nd place solution is by Jeremy Howard (of fast.ai fame), which I'd summarize as TrueSkill Through Time (Microsoft Research paper) + some overfitting on the public leaderboard (1st place was #26 in the public leaderboard).

New comment by shenberg in "No "Zero-Shot" Without Exponential Data"

shenberg — Thu, 09 May 2024 16:20:17 +0000

The CLIP plot (Fig. 2) is damning, however some of the generative models show flat responses in Fig. 3 (e.g. Adobe GigaGAN, DALL-E-mini). While those are on the one hand technically linear relationships, but are also exactly what we'd want: image generation aesthetic score that doesn't care about concept frequency. Maybe the issue is with the contrastive training target used in CLIP?

New comment by shenberg in "Plasticity through patterned ultrasound-induced brainwave entrainment in mice"

shenberg — Sun, 25 Feb 2024 12:18:05 +0000

I would have expected a sham-treatment arm to the experiment, because how do you differentiate between "an intervention 30 minutes beforehand caused improved learning" and "our specific intervention 30 minutes beforehand caused improved learning."

New comment by shenberg in "Explaining the SDXL Latent Space"

shenberg — Wed, 07 Feb 2024 12:06:13 +0000

I suspect that weight initializations are geared towards inputs being normal random variables with mean 0 and variance 1. Deviating from that makes the learning process unhappy.

New comment by shenberg in "Google Cuts Jobs in Engineering and Other Divisions"

shenberg — Thu, 11 Jan 2024 17:28:47 +0000

"since 2020, the US has printed nearly 80% of ALL US Dollars in circulation" - I've seen this notion repeated and I assume it's a reference to M1 as published by FRED: https://fred.stlouisfed.org/series/M1SL

The actual story, as far as I can tell, is that money that had previously been considered as M2 (=less liquid) is also counted as M1 due to rule changes regarding savings accounts.

To see this is the case, you can plot both together. If in fact, new money was printed, you would expect M2 to have the same jump as M1, as M2 is M1 + more stuff. However, you see a much smaller jump:

https://fred.stlouisfed.org/graph/fredgraph.png?g=1dwhY

The rule-change coincided with COVID relief measures which did include money-printing, but at a much smaller scale than implied.

New comment by shenberg in "The 1988 shooting down of Flight 655 as a user interface disaster"

shenberg — Wed, 29 Nov 2023 12:35:23 +0000

If you're seriously suggesting that attacking unarmed civilians intentionally, killing parents in front of their children and then kidnapping the children, slaughtering defenseless party-goers, etc. is what I any resistance movement would do, that's ridiculous. If Hamas would only have attacked military targets, there would be no legitimacy to Israel's actions. However, what actually happened was that they attacked plenty of civilian targets, in a premeditated fashion, in areas that are recognized internationally to be part of Israel.