Hacker News: qumpis

New comment by qumpis in "Tree Search Distillation for Language Models Using PPO"

qumpis — Sun, 15 Mar 2026 14:03:35 +0000

I may never understand what harness means - it's used in so many contexts

New comment by qumpis in "Beyond Diffusion: Inductive Moment Matching"

qumpis — Wed, 12 Mar 2025 13:04:41 +0000

What happens if we don't add any moments matching objective? e.g. at train time just fit a diffusion model that predicts the target given any pair of timesteps (t, t')? Why is moment matching critical here?

Also regarding linearity, why is it inflexible? It seems quite convenient that a simple linear interpolation is used for reconstruction, besides, even in DDIM, the directions towards the final target changes at each step as the images become less noisy. In standard diffusion models or even flow matching, denoising is always equal to the prediction of the original data + direction from current timestep to the timestep t'. Just to be clear, it is intuitive that such models are inferior in few-step generations since they don't optimise for test time efficiency (in terms of the tradeoff of quality vs compute), but it's unclear what inflexibility exists there beyond this limitation.

Clearly there's no expected benefit in quality if all timesteps are used in denoising?

New comment by qumpis in "I always knew I was different, I didn't know I was a sociopath"

qumpis — Thu, 14 Mar 2024 12:54:18 +0000

Fascinating article. I wonder what made her do the steps in finding what's "different" with her, and most of all, why the need to fix it arose. Is it to "fit in", understandably? It somehow felt alien to me, and this shows my ignorance on the topic, that people lacking in empathy department would attempt to understand the reasons and act "good" towards others even if this feeling is only understood intellectually.

New comment by qumpis in "Stable Diffusion 3"

qumpis — Fri, 23 Feb 2024 06:14:35 +0000

Yes it makes sense a bit. Many popular convents operate on 3x3 kernels. But the number of channel increases per layer. This, coupled with the fact that the receptive field increases per layer and allows convnets to essentially see the whole image relatively early in model's depth (esp. coupled with pooling operations which increase the receptive field rapidly), makes this intuition questionable. Transformers on the other hand, operate on attention which allows them to weight each patch dynamically, but it's clear to me that this allows them to attend to all parts of the image in a way different from convnets.

New comment by qumpis in "Stable Diffusion 3"

qumpis — Fri, 23 Feb 2024 02:07:25 +0000

Convolutions are bad at long range spatial dependencies? What makes you say that - any chance you have a reference?

New comment by qumpis in "A* tricks for videogame path finding"

qumpis — Mon, 01 Jan 2024 20:23:48 +0000

I haven't seen RL with decision trees! it sounds really interesting. Any classic results worth looking into?

New comment by qumpis in "Reindeer sleep and eat simultaneously"

qumpis — Thu, 28 Dec 2023 11:32:58 +0000

Toll noticable only back in those days or even in the future?

New comment by qumpis in "Reindeer sleep and eat simultaneously"

qumpis — Thu, 28 Dec 2023 02:24:47 +0000

Was it more productive, objectively speaking, than being consistent and not dealing with the constant drainage due to severely lacking sleep?

New comment by qumpis in "Augmenting long-term memory (2018)"

qumpis — Sun, 17 Dec 2023 06:02:01 +0000

Can you give some example usecases of your application? I wonder how it scales to complex (in terms of structure) information processing, e.g. digesting scientific topics

New comment by qumpis in "What’s behind the Freud resurgence?"

qumpis — Thu, 14 Dec 2023 04:18:27 +0000

So therapy is not effective as per the paper. This seems like an astounding conclusion. Has anyone read the paper in detail and has a deeper opinion?

New comment by qumpis in "Mamba: Linear-Time Sequence Modeling with Selective State Spaces"

qumpis — Tue, 05 Dec 2023 00:10:01 +0000

How does this differ from RNNs and their gating mechanism?

New comment by qumpis in "Oracle of Zotero: LLM QA of Your Research Library"

qumpis — Mon, 27 Nov 2023 04:19:49 +0000

From the glance of it, the paper looks very polished. Combine this with the fact that arxiv is invite-only, your prediction might not come about

New comment by qumpis in "Sqids – Generate short unique IDs from numbers"

qumpis — Sun, 26 Nov 2023 01:20:06 +0000

How big and random were these "few extras"?

New comment by qumpis in "Hacking ADHD: Strategies for the modern developer"

qumpis — Thu, 16 Nov 2023 14:08:15 +0000

I've got good results with white noise, not sure if that's something that would work for you. And of course I didn't suggest that this eliminates other distractions.

New comment by qumpis in "Hacking ADHD: Strategies for the modern developer"

qumpis — Wed, 15 Nov 2023 16:08:41 +0000

Why not put these earbuds in the office?

New comment by qumpis in "ArXiv receives $10M for upgrades"

qumpis — Fri, 20 Oct 2023 01:53:53 +0000

"Ar5iv" does a pretty good job from my experience

New comment by qumpis in "The worst programmer I know"

qumpis — Sun, 03 Sep 2023 08:27:43 +0000

A bit surprising to hear that you weren't swimming in offers to hire you all the time. Any idea why that was? Maybe you're somewhat picky about the job and responsibilities that come with it?

New comment by qumpis in "Do Machine Learning Models Memorize or Generalize?"

qumpis — Thu, 10 Aug 2023 17:41:08 +0000

Slightly related but sparsity-inducing activation function Relu is often used in neural networks

New comment by qumpis in "Nvidia H100 GPUs: Supply and Demand"

qumpis — Tue, 01 Aug 2023 09:12:22 +0000

I don't remember the exact tweet, but here's one discussion [1]. I guess something changed in the mean time.

[1]_https://www.reddit.com/r/Amd/comments/140uct5/geohot_giving_...

New comment by qumpis in "Nvidia H100 GPUs: Supply and Demand"

qumpis — Tue, 01 Aug 2023 07:50:41 +0000

Didn't they abandon development of AMD related software?