Hacker News: Fripplebubby

New comment by Fripplebubby in "The economics of software teams: Why most engineering orgs are flying blind"

Fripplebubby — Mon, 13 Apr 2026 12:25:28 +0000

Even if you could do attribution correctly (I think you can do this partially if you are really diligent about A/B testing), that is still only one input to the equation. The other fact worth considering is the scale factor - if a team develops a widget which has some ARR value today, that same widget has a future ARR value that scales with more product adoption - no additional capital required to capture more marginal value. How do you quantify this? Because it is hard and recursive (knowing how valuable a feature will be in the future means knowing how many users you have in the future which depends on how valuable your features are as well as 100 other factors), we just factor this out and don't attempt to quantify things in dollars and euros.

New comment by Fripplebubby in "OpenAI closes funding round at an $852B valuation"

Fripplebubby — Wed, 01 Apr 2026 11:49:47 +0000

> Also, if you don't like the NASDAQ 100 rules, then you don't have to invest in securities that track it.

Isn't the idea with the indexes that they allow you to intentionally not take an activist position in the market? The exposure is not tied to any underlying market hypothesis. In other words, if we make people form a market hypothesis in order to decide whether or not to hold this index, it has failed in its purpose.

New comment by Fripplebubby in "Rust at Scale: An Added Layer of Security for WhatsApp"

Fripplebubby — Wed, 28 Jan 2026 17:07:48 +0000

I think the draft covers this well: https://www.ietf.org/archive/id/draft-knodel-e2ee-definition...

New comment by Fripplebubby in "Show HN: Build Web Automations via Demonstration"

Fripplebubby — Wed, 28 Jan 2026 17:04:02 +0000

Browser use has a project "Workflow Use" that has similar aims: https://github.com/browser-use/workflow-use

New comment by Fripplebubby in "Rust at Scale: An Added Layer of Security for WhatsApp"

Fripplebubby — Wed, 28 Jan 2026 15:31:57 +0000

This is not true. The IETF draft is explicit that E2EE means that the message cannot be read by any party other than the sender and the intended receiver. When companies like Meta claim they support E2EE, this is what they claim. There are no tricky semantics or legalese at play here.

New comment by Fripplebubby in "Implications of AI to schools"

Fripplebubby — Tue, 25 Nov 2025 16:36:56 +0000

Hiring is still a pretty non-uniform thing despite attempts to make it less so - I'm sure there are some teams and orgs at all these large companies that do it well, and some that do it les well. I think it is pretty well accepted that university brand is not a good signal, but it is an easy signal and if the folks in the hiring process are a bit lazy and pressed for time, a bit overwhelmed by the number of inbound candidates, or don't really know how to evaluate for the role competencies, I think it's a tool that is still reached for today.

In a way, I think the hiring process at second-tier (not FAANG) companies is actually better because you have to "moneyball" a little bit - you know that you're going to lose the most-credentialed people to other companies that can beat you dollar for dollar, so you actually have to think a little more deeply about what a role really needs to find the right person.

New comment by Fripplebubby in "Measuring Latency (2015)"

Fripplebubby — Fri, 21 Nov 2025 04:31:50 +0000

I take it as a given that what is stored and graphed is an information-destroying aggregate, but I think that aggregate is actually still useful + meaningful

New comment by Fripplebubby in "Measuring Latency (2015)"

Fripplebubby — Fri, 21 Nov 2025 02:28:50 +0000

> This is partly a tooling problem. Many of the tools we use do not do a good job of capturing and representing this data. For example, the majority of latency graphs produced by Grafana, such as the one below, are basically worthless. We like to look at pretty charts, and by plotting what’s convenient we get a nice colorful graph which is quite readable. Only looking at the 95th percentile is what you do when you want to hide all the bad stuff. As Gil describes, it’s a “marketing system.” Whether it’s the CTO, potential customers, or engineers—someone’s getting duped. Furthermore, averaging percentiles is mathematically absurd. To conserve space, we often keep the summaries and throw away the data, but the “average of the 95th percentile” is a meaningless statement. You cannot average percentiles, yet note the labels in most of your Grafana charts. Unfortunately, it only gets worse from here.

I think this is getting a bit carried away. I don't have any argument against the observation that that average of a p95 is not something that mathematically makes sense, but if you actually understand what it is, it is absolutely still meaningful. With time series data, there is always some time denominator, so it really means (say) "the p95 per minute averaged over the last hour", which is or can be meaningful (and useful at a glance).

Also, the claim that "[o]nly looking at the 95th percentile is what you do when you want to hide all the bad stuff" is very context dependent. As long as you understand what it actually means, I don't see the harm in it. The author makes this point that, because a load of a single webpage will result in 40 requests or so, you are much more likely to hit a p99 and so you should really care about p99 and up - more power to you, if that's the contextually appropriate, then that is absolutely right, but that really only applies to a webserver serving webpage assets which is only one kind of software that you might be writing. I think it is definitely important to know, for one given "eyeball" waiting on your service to respond, what the actual flow is - whether it's just one request, or multiple concurrent requests, or some kind of dependency graph of calls to your service all needed in sequence - but I don't really think that challenges the commonsense notion of latency, does it?

New comment by Fripplebubby in "The 'Toy Story' You Remember"

Fripplebubby — Tue, 11 Nov 2025 15:13:41 +0000

They care very deeply about this and devoted a lot of resources to (re)grading the digital versions that you see today on Disney+. The versions you see are intentional and not the result of cost cutting. (I was not directly privy to this work but I worked on Disney+ before its launch and I sat in on some tech talks and other internal information about the digital workflows that led to the final result on the small screen and there was a lot of attention on this at the time)

I think there's a discussion to be had about art, perception and devotion to the "original" or "authentic" version of something that can't be resolved completely but what I don't think is correct is the perception that this was overlooked or a mistake.

New comment by Fripplebubby in "Using the expand and contract pattern for schema changes"

Fripplebubby — Mon, 10 Nov 2025 15:18:47 +0000

I'm hearing you out, but how is this going to affect the part of this that is client behavior rather than database behavior? If there is some kind of sdk that actually captures the interface here (that is, that the client needs to be compatible with both versions of the schema at once for a while) and pushes that back to the client, that could be interesting, like a way to define that column "name" and columns "first name", "last name" are conceptually part of the same thing and that the client code paths must provide handling for both at once.

New comment by Fripplebubby in "The Case Against PGVector"

Fripplebubby — Mon, 03 Nov 2025 19:49:32 +0000

I think I see this point now. I thought of YAGNI as, "don't ever over-engineer because you get it wrong a lot of the time" but really, "don't over-engineer out of the gate and be thankful if you get a chance to come back and do it right later". That fits my case exactly, and that's what we did (and it wasn't actually that painful to migrate).

New comment by Fripplebubby in "The Case Against PGVector"

Fripplebubby — Mon, 03 Nov 2025 18:19:41 +0000

I think the tricky thing here is that the specific things I referred to (real time writes and pushing SQL predicates into your similarity search) work fine at small scale in such a way that you might not actually notice that they're going to stop working at scale. When you have 100,000 vectors, you can write these SQL predicates (return the 5 top hits where category = x and feature = y) and they'll work fine up until one day it doesn't work fine anymore because the vector space has gotten large. So, I suppose it is fair to say this isn't YAGNI backfiring, this is me not recognizing the shape of the problem to come and not recognizing that I do, in fact, need it (to me that feels a lot like YAGNI backfiring, because I didn't think I needed it, but suddenly I do)

New comment by Fripplebubby in "The Case Against PGVector"

Fripplebubby — Mon, 03 Nov 2025 14:47:39 +0000

The post is a clear example of when YAGNI backfires, because you think YAGNI but then, you actually do need it. I had this experience, the author had this experience, you might as well - the things you think you AGN are actually pretty basic expectations and not luxuries: being able to write vectors real-time without having to run other processes out of band to keep the recall from degrading over time, being able to write a query that uses normal SQL filter predicates and similarity in one go for retrieval. These things matter and you won't notice that they actually don't work at scale until later on!

New comment by Fripplebubby in "Baseball durations after the pitch clock"

Fripplebubby — Sun, 05 Oct 2025 04:18:39 +0000

One of the interesting experiences I have being a member both of this community and the baseball analytics community is seeing posts like this, where apparently the author thinks that they're the only one who had the idea to look at this, shared widely within the hacker community because it comes from one of their own. Rest assured, within the _baseball_ community, this has been discussed and analyzed to death - it just doesn't get posted here because nobody mentions using unix tools to do it, because it isn't really relevant.

See for example:

https://blogs.fangraphs.com/how-have-the-new-rules-changed-t...

https://www.baseball-reference.com/friv/rules-changes-stats....

And many others, these are two early and relatively canonical ones. If folks reading this post are interested enough in baseball, please, come join us in the baseball analytics community where this is merely the very tippy top of the iceberg of interesting things.

New comment by Fripplebubby in "Deep researcher with test-time diffusion"

Fripplebubby — Wed, 24 Sep 2025 18:59:50 +0000

The way I read the paper, "diffusion" was more of a metaphor - you start with the output of the LLM as the overview (very much _not_ random noise), and then refine it over many steps. However, seeing this, I wonder myself whether or not in-house they actually mean it more literally or have actually tried using it more literally.

New comment by Fripplebubby in "Interstellar Comet 3I/Atlas: What We Know Now"

Fripplebubby — Mon, 28 Jul 2025 20:40:34 +0000

One interesting thing I learned from this was how they determined the probable size of this comet probabilistically, rather than using direct observation - basically, based on the observations, it could either be really big (10km) or really small (0.5km), and we can basically rule out really big because we've been looking for comets for years, and during that time, to see one that is that big implies that we _should have seen_ thousands that are quite small over that time period, because the size of space objects follows a power law since they're always whacking into each other and breaking up. Since we've only seen one small interstellar object during that time rather than thousands, a large comet is so impossibly unlikely that we can conclude that it is 0.5km in size. I'm sure at some point this will be confirmed in a more conventional way, as well.

New comment by Fripplebubby in "Ask HN: What are you working on? (July 2025)"

Fripplebubby — Mon, 28 Jul 2025 01:06:12 +0000

Have you heard of https://www.autobiographer.com/ ? Is it similar, different?

New comment by Fripplebubby in "I am a SOTA 0-shot classifier of your slop"

Fripplebubby — Sat, 26 Jul 2025 15:20:28 +0000

Maybe I worded this more harshly than I meant. I value anybody who tries to communicate with me and I don't mean to try to discourage people from having ideas or communicating them to me, but - writing is thinking, the act of trying to actually use your language and your reasoning will improve your idea. How many times have I thought I had a good idea, then in the process of writing it out, I realize its flaws (many times)? If you pass this process off to an LLM, you skip a key step, and you leave it to me the receiver to do this work for you.

New comment by Fripplebubby in "I am a SOTA 0-shot classifier of your slop"

Fripplebubby — Sat, 26 Jul 2025 15:08:36 +0000

I think this is a great post. It will ruffle some feathers and people will feel attacked, but I think the core idea is exactly right: if we are communicating and the goal is the exchange of information, use your incredible language faculty to communicate with me. To do otherwise is a disrespect to me and it indicates that you value the act of showing me your brilliant idea (in your estimation) more than you value taking the effort to actually communicate the idea to me. You are essentially an "ideas guy". I know that an LLM is a yes-man, and that a yes-man and an "ideas guy" is a combination that produces confident mistakes. If you can't be bothered to communicate your idea, or the essence of your idea, in your own words, please keep it to yourself until you've put in that effort.

New comment by Fripplebubby in "Up to date prices for LLM APIs all in one place"

Fripplebubby — Fri, 25 Jul 2025 15:49:00 +0000

Maybe I am blinded by my own use case, but I find the caching pricing and strategy (since different providers use a different implementation of caching as well as different pricing) to be a major factor rather than just the "raw" per token cost, and that is missing here, as well as on the Simon Willison site [1]. Do most people just not care / not use caching that much that it matters?

[1] https://llm-prices.com/