Hacker News: indeed30

New comment by indeed30 in "Show HN: An easy-to-use online curve fitting tool"

indeed30 — Fri, 14 Nov 2025 17:33:21 +0000

The radioactive decay example specifically? Fit A and k (e.g. by nonlinear least squares) and then use the Jacobian to obtain the approximate covariance matrix. The diagnonal elements of that matrix give you the standard error estimates.

New comment by indeed30 in "Show HN: An easy-to-use online curve fitting tool"

indeed30 — Fri, 14 Nov 2025 17:05:42 +0000

I think we are just coming at this from different angles. I do understand and agree that we are estimating the parameters of the fit curves.

> That already makes strong modeling assumptions (usually including IID, Gaussian noise, etc.,) to get the parameter estimates in the first place

You lose me here - I don't agree with "usually". I guess you're thinking of examples where you are sampling from a population and estimating features of that population. There's nothing wrong with that, but that is a much smaller domain than curve fitting in general.

If you give me a set of x and y, I can fit a parametric curve that tries to minimises the average squared distance between fit and observed values of y without making any assumptions whatsoever. This is a purely mechanical, non-stochastic procedure.

For example, if you give me the points {(0,0), (1,1), (2,4), (3,9)} and the curve y = a x^b, then I'm going to fit a=1, b=2, and I certainly don't need to assume anything about the data generating process to do so. However there is no concept of a confidence interval in this example - the estimates are the estimates, the residual error is 0, and that is pretty much all that can be said.

If you go further and tell me that each of these pairs (x,y) is randomly sampled, or maybe the x is fixed and the y is sampled, then I can do more. But that is often not the case.

New comment by indeed30 in "Show HN: An easy-to-use online curve fitting tool"

indeed30 — Fri, 14 Nov 2025 13:13:29 +0000

I don’t think you can do anything sensible here without making much stronger modelling assumptions. A vanilla non-parametric bootstrap is only valid under a very specific generative story: IID sampling from a population. Many (most?) curve-fitting problems won't satisfy that.

For example, suppose you measure the decay of a radioactive source at fixed times t = 0,1,2,... and fit y = A e^{-kt}. The only randomness is small measurement error with, say, SD = 0.5. The bootstrap sees the huge spread in the y-values that comes from the deterministic decay curve itself, not from noise. It interprets that structural variation as sampling variability and you end up with absurdly wide bootstrap confidence intervals that have nothing to do with the actual uncertainty in the experiment.

New comment by indeed30 in "UK Millionaire exodus did not occur, study reveals"

indeed30 — Mon, 22 Sep 2025 16:10:52 +0000

As long as UK taxes are flow-based and not stock-based, it seems a bit silly to base analysis on a stock-based denominator like the number of millionaires.

New comment by indeed30 in "How big are our embeddings now and why?"

indeed30 — Fri, 05 Sep 2025 18:38:04 +0000

I wouldn’t call the embedding layer "separate" from the LLM. It’s learned jointly with the rest of the network, and its dimensionality is one of the most fundamental architectural choices. You’re right though that, in principle, you can pick an embedding size independent of other hyperparameters like number of layers or heads, so I see where you're coming from.

However the embedding dimension sets the rank of the token representation space. Each layer can transform or refine those vectors, but it can’t expand their intrinsic capacity. A tall but narrow network is bottlenecked by that width. Width-first scaling tends to outperform pure depth scaling, you want enough representational richness per token before you start stacking more layers of processing.

So yeah, embedding size doesn’t have to scale up in lockstep with model size, but in practice it usually does, because once models grow deeper and more capable, narrow embeddings quickly become the limiting factor.

New comment by indeed30 in "Tetris is NP-hard even with O(1) rows or columns (2020) [pdf]"

indeed30 — Mon, 01 Sep 2025 16:35:20 +0000

I think it's actually (2020)

New comment by indeed30 in "Reverse geocoding is hard"

indeed30 — Mon, 28 Apr 2025 14:39:48 +0000

Ten years ago, I worked for a company that had billions of sensor readings from mobile phones. The idea was to use crowdsourced data to create truly detailed, real-world coverage maps, and then sell that data to marketing and network operations teams at telcos.

We used reverse geocoding extensively — but never down to street addresses, always to a higher level. We wanted to split measurements by country, region, city — any geographic unit. When you deal with country borders, you get a lot of weird measurements as phones roam onto foreign networks. We weren’t interested in reporting on the experience of users roaming while abroad, so we needed shapefiles good enough to filter all that out and to partition the rest of the data cleanly.

We built a 30-machine Spark cluster on AWS back when Spark was still super early — around v0.7, definitely before 1.0. At the time, you pretty much had to use Scala with Spark if you cared about performance. Most of the workload was point-in-polygon tests. Before that, we were using a brutally hacky pipeline involving PostGIS, EMR, and Pig, and it was hell.

It was incredibly fun, but looking back now, I can see so clearly all the mistakes I made.

Stop Word Clouds

indeed30 — Wed, 26 Feb 2025 16:04:09 +0000

Article URL: https://www.stopwordclouds.com/

Comments URL: https://news.ycombinator.com/item?id=43184932

Points: 5

# Comments: 2

New comment by indeed30 in "Disney+ loses 700k subscribers following price increase"

indeed30 — Thu, 06 Feb 2025 14:48:31 +0000

I would be pretty confident that Disney has a pricing team whose entire job is to model those effects.

New comment by indeed30 in "Complete hardware and software setup for running Deepseek-R1 locally. ($6000)"

indeed30 — Wed, 29 Jan 2025 16:01:18 +0000

So, can somebody in the know speculate about how Deepseek (or OpenAI, or whoever really) is actually running their API?

If I wanted to run a production-grade service using the full Deepseek model, with good tokens/sec and the ability to serve concurrent requests, what sort of hardware are we looking at?

New comment by indeed30 in "SQL nulls are weird"

indeed30 — Thu, 09 Jan 2025 17:00:36 +0000

That's interesting - I believe this is exactly how Sequelize implements soft-deletion.

New comment by indeed30 in "Magic/tragic email links: don't make them the only option"

indeed30 — Wed, 08 Jan 2025 11:48:22 +0000

To discover where else you then subsequently forwarded it.

I'm not suggesting this is actually a problem, but that's how an argument could go.

New comment by indeed30 in "GPT-5 is behind schedule"

indeed30 — Mon, 23 Dec 2024 12:57:30 +0000

I don't disagree with what you say, but one difference is that we generally hold these people accountable and often shift liability to them when they are wrong (though not always, admittedly), which is not something I have ever seen done with any AI system.

Do you know how much your computer can do in a second? (2015)

indeed30 — Wed, 18 Dec 2024 16:03:40 +0000

Article URL: https://computers-are-fast.github.io/

Comments URL: https://news.ycombinator.com/item?id=42451624

Points: 2

# Comments: 0

New comment by indeed30 in "Popeye and Tintin enter the public domain in 2025 along with Faulkner, Hemingway"

indeed30 — Mon, 16 Dec 2024 21:16:56 +0000

I'd also mention that Tom McCarthy's "Tintin and The Secret of Literature" completely changed the way in which I viewed the series - genuinely exciting stuff.

New comment by indeed30 in "Why is it so hard to find a job now? Enter Ghost Jobs"

indeed30 — Thu, 14 Nov 2024 15:33:39 +0000

Hang on a minute. There is absolutely nothing in this research that measures the accuracy of this approach. A user saying "I was ghosted" is not, to my mind, proof of anything.

Job seekers almost never actually know if the job was real or not, so it's hard to see how Glassdoor reviews can ever provide the insight this work is looking for.

I do believe that "ghost" jobs exist, often for H1B purposes, but I don't think this work proves it.

New comment by indeed30 in "Your Name in Landsat"

indeed30 — Fri, 06 Sep 2024 13:58:18 +0000

Easiest way is going to be to downsample the images and then apply a pre-trained classifier that can ignore the fact these are sat images. You could probably turn them into 28x28 greyscale and then use a model trained on handwritten characters, like EMNIST.

Whatever approach you take, you'll probably be selecting the final set by hand, so it's just about building the candidate set in an efficient manner. Low absolute accuracy isn't really an issue as long as you end up with a managable set to review.

New comment by indeed30 in "The Denmark secret: how it became the most trusting country"

indeed30 — Wed, 22 May 2024 13:23:48 +0000

I'm not sure what the complaint is here. I don't mind saying outright that I believe ethnicity is deeply intertwined with culture and that the two concepts are inseperable. Ethnic groups are essentially defined by their shared heritage, experiences and culture. When we talk about cultural diversity, we inevitably also discuss the ethnic diversity of a population.

I really don't think that is a controversial position, but clearly you disagree.

New comment by indeed30 in "The Denmark secret: how it became the most trusting country"

indeed30 — Wed, 22 May 2024 12:01:46 +0000

I'm interested in what you're comparing against. Denmark is typically towards the bottom of any measure of cultural or ethnic diversity - e.g. https://en.wikipedia.org/wiki/List_of_countries_ranked_by_et...

I have been to Denmark many times and it strikes me as considerably more homogenous than the UK or the US.

New comment by indeed30 in "Interviews in the Age of AI: Ditch Leetcode – Try Code Reviews Instead"

indeed30 — Tue, 17 Oct 2023 13:34:59 +0000

That's not my perception of the industry right now. I think what you said was true two years ago, but not any more.