Hacker News: molf

New comment by molf in "No strcpy either"

molf — Tue, 30 Dec 2025 14:49:15 +0000

That's very interesting! It links to:

https://daniel.haxx.se/blog/2025/10/10/a-new-breed-of-analyz...

and its HN discussion:

https://news.ycombinator.com/item?id=45449348

New comment by molf in "Go ahead, self-host Postgres"

molf — Sat, 20 Dec 2025 16:50:29 +0000

> Every company out there is using the cloud and yet still employs infrastructure engineers

Every company beyond a particular size surely? For many small and medium sized companies hiring an infrastructure team makes just as little sense as hiring kitchen staff to make lunch.

New comment by molf in "Go ahead, self-host Postgres"

molf — Sat, 20 Dec 2025 16:43:29 +0000

Interesting. Is this an issue with RDS?

I use Google Cloud SQL for PostgreSQL and it's been rock solid. No issues; troubleshooting works fine; all extensions we need already installed; can adjust settings where needed.

New comment by molf in "Go ahead, self-host Postgres"

molf — Sat, 20 Dec 2025 16:31:50 +0000

> I'd argue self-hosting is the right choice for basically everyone, with the few exceptions at both ends of the extreme:

> If you're just starting out in software & want to get something working quickly with vibe coding, it's easier to treat Postgres as just another remote API that you can call from your single deployed app

> If you're a really big company and are reaching the scale where you need trained database engineers to just work on your stack, you might get economies of scale by just outsourcing that work to a cloud company that has guaranteed talent in that area. The second full freight salaries come into play, outsourcing looks a bit cheaper.

This is funny. I'd argue the exact opposite. I would self host only:

* if I were on a tight budget and trading an hour or two of my time for a cost saving of a hundred dollars or so is a good deal; or

* at a company that has reached the scale where employing engineers to manage self-hosted databases is more cost effective than outsourcing.

I have nothing against self-hosting PostgreSQL. Do whatever you prefer. But to me outsourcing this to cloud providers seems entirely reasonable for small and medium-sized businesses. According to the author's article, self hosting costs you between 30 and 120 minutes per month (after setup, and if you already know what to do). It's easy to do the math...

New comment by molf in "Show HN: OCR Arena – A playground for OCR models"

molf — Tue, 25 Nov 2025 11:01:37 +0000

What is needed to evaluate OCR for most business applications (above everything else) is accuracy.

Some results look plausible but are just plain wrong. That is worse than useless.

Example: the "Table" sample document contains chemical substances and their properties. How many numbers did the LLM output and associate correctly? That is all that matters. There is no "preference" aspect that is relevant until the data is correct. Nicely formatted incorrect data is still incorrect.

I reviewed the output from Qwen3-VL-8B on this document. It mixes up the rows, resulting in many values associated with the wrong substance. I presume using its output for any real purpose would be incredibly dangerous. This model should not be used for such a purpose. There is no winning aspect to it. Does another model produce worse results? Then both models should be avoided at all costs.

Are there models available that are accurate enough for this purpose? I don't know. It is very time consuming to evaluate. This particular table seems pretty legible. A real production grade OCR solution should probably need a 100% score on this example before it can be adopted. The output of such a table is not something humans are good at reviewing. It is difficult to spot errors. It either needs to be entirely correct, or the OCR has failed completely.

I am confident we'll reach a point where a mix of traditional OCR and LLM models can produce correct and usable output. I would welcome a benchmark where (objective) correctness is rated separately from of the (subjective) output structure.

Edit: Just checked a few other models for errors on this example.

* GPT 5.1 is confused by the column labelled "C4" and mismatches the last 4 columns entirely. And almost all of the numbers in the last column are wrong.

* olmOCR 2 omits the single value in column "C4" from the table.

* Gemini 3 produces "1.001E-04" instead of "1.001E-11" as viscosity at T_max for Argon. Off by 7 orders of magnitude! There is zero ambiguity in the original table. On the second try it got it right. Which is interesting! I want to see this in a benchmark!

There might be more errors! I don't know, I'd like to see them!

New comment by molf in "Friendly attributes pattern in Ruby"

molf — Sat, 08 Nov 2025 10:54:58 +0000

This is a philosophy. One which many people that write Ruby subscribe to. The fundamental idea is: create a DSL that makes it very easy to implement your application. It is what made Rails different when it was created: it is a DSL that makes expressing web applications easy.

I don't know its history well enough, but it seems to originate from Lisp. PG wrote about it before [1].

It can result in code that is extremely easy to read and reason about. It can also be incredibly messy. I have seen lots of examples of both over the years.

It is the polar opposite of Go's philosophy (be explicit & favour predictability across all codebases over expressiveness).

[1]: https://paulgraham.com/progbot.html

New comment by molf in "Exploring PostgreSQL 18's new UUIDv7 support"

molf — Fri, 17 Oct 2025 21:50:04 +0000

Good question. There's a few reasons to pick UUID over serial keys:

- Serial keys leak information about the total number of records and the rate at which records are added. Users/attackers may be able to guess how many records you have in your system (counting the number of users/customers/invoices/etc). This is a subtle issue that needs consideration on a case by case basis. It can be harmless or disastrous depending on your application.

- Serial keys are required to be created by the database. UUIDs can be created anywhere (including your backend or frontend application), which can sometimes simplify logic.

- Because UUIDs can be generated anywhere, sharding is easier.

The obvious downside to UUIDs is that they are slightly slower than serial keys. UUIDv7 improves insert performance at the cost of leaking creation time.

I've found that the data leaked by serial keys is problematic often enough; whereas UUIDs (v4) are almost always fast enough. And migrating a table to UUIDv7 is relatively straightforward if needed.

New comment by molf in "Exploring PostgreSQL 18's new UUIDv7 support"

molf — Fri, 17 Oct 2025 21:31:13 +0000

There is a big difference though. Serial keys allow attackers to guess the rate at which data is being added.

UUID7 allows anyone to know the time of creation, but not how many records have been created (approximately) in a particular time frame. It leaks data about the record itself, but not about other records.

New comment by molf in "Mistral raises 1.7B€, partners with ASML"

molf — Tue, 09 Sep 2025 11:38:47 +0000

ASML CEO: Mistral investment not aimed at strategic autonomy for Europe

"In the long run, all AI models will be similar. It's about how you use the models in a well-protected environment. We will never allow our data and that of our customers to leave ASML. So a partner must be willing to work with us and adapt its model to our needs. Not only did Mistral want to do that, it is also their business model."

https://fd.nl/bedrijfsleven/1569378/asml-ceo-strategische-au...

Full article translated:

“A good reason to collaborate.” That's how ASML's CEO described his company's remarkable €1.3 billion investment in French AI company Mistral on Wednesday. Since the investment was leaked by Reuters on Sunday, there has been much speculation about ASML's reasons for investing in the European challenger to giants such as OpenAI and Anthropic. Analysts and commentators pointed to the geopolitical implications or the strong French link between the companies. But according to ASML CEO Christophe Fouquet, the reason was purely business. “Sovereignty has never been the goal.”

Mistral AI is a start-up founded in 2023 that specializes in building large language models. The French CEO of ASML and Mistral CEO Arthur Mensch met at an AI summit in Paris earlier this year and decided to work together to use Mistral's models to further improve ASML's chip machines.

Surprising investment

Each ASML machine generates approximately 1 terabyte of data per day. “Our machines are very complex,” Fouquet explains in an interview with the FD. "We have highly advanced control systems on our machines to enable them to operate very quickly and with great accuracy. The amount of data our machines generate gives us the opportunity to use AI. With the current software and machine learning models, we are limited in what we can do with the data and how quickly we can adjust the machine,“ says the CEO. ”AI is the next step in making better use of all that data."

ASML has invested in other companies in the past, such as German lens manufacturer Zeiss and Eindhoven-based photonics company Smart Photonics, but those were either suppliers or potential customers. Mistral is neither.

Running AI models in-house

According to the ASML CEO, the Dutch company's investment in Mistral stems from the conviction that both companies can create value together. If Mistral becomes more valuable as a result of the collaboration, ASML can benefit from that.

ASML is the main investor in a new €1.7 billion financing round for Mistral. This makes Mistral an important AI player in Europe, but small compared to its American rivals. OpenAI raised $40 billion in its latest round alone. Anthropic, the company behind the Claude program, which is popular among programmers, just closed a $13 billion round.

“European sovereignty was not the goal”

According to Fouquet, the reason for the collaboration lies primarily in the way Mistral develops its AI models. “In the long run, all AI models will be similar. It's about how you use the models in a well-protected environment,” says Fouquet. “We will never allow our data and that of our customers to leave ASML. So a partner must be willing to work with us and adapt its model to our needs. Not only did Mistral want to do that, it is also their business model.”

According to Fouquet, the collaboration is not motivated by a desire for greater European sovereignty. “That was not the goal. But if it contributes to that, we are happy,” says Fouquet.

ASML supports EU initiatives to strengthen the chip sector in Europe, but always maintains a politically neutral stance in the geopolitical struggle between the United States, China, and the European Union. This is understandable, as the company has major customers in all regions, such as TSMC in Taiwan, SK Hynix in South Korea, SMIC in China, and Intel in the US.

“Two birds with one stone”

Although ASML itself does not play the European card, some analysts and politicians do see such a motive for the collaboration with Mistral. “Thousands of large companies worldwide make extensive use of AI in their product development by using the services of OpenAI, Meta, Microsoft, Google, Mistral, without investing in these companies,” writes investment bank Jefferies in a commentary. “We also do not believe that ASML needed an investment in an AI company to benefit from AI models in its lithography products. In our view, the investment stems primarily from geopolitical motives to support and develop a European AI company and ecosystem,” the bank states.

Wouter Huygen, CEO of AI consultancy Rewire, also sees a clear link to European sovereignty. “ASML is known for taking internal technology development very far. It is therefore quite understandable that ASML is taking this step: access to and influence on the development of a strategic technology. Plus European sovereignty. That's two birds with one stone.”

New comment by molf in "Mistral raises 1.7B€, partners with ASML"

molf — Tue, 09 Sep 2025 08:24:57 +0000

Not only are the CEO + COO French, they recently hired Le Maire, French ex-minister of Finance as a strategic advisor. ASML has also been rumoured to exit the Netherlands and relocate to France.

It is definitely a political move.

New comment by molf in "Google can keep its Chrome browser but will be barred from exclusive contracts"

molf — Wed, 03 Sep 2025 10:13:46 +0000

If there were a culture of always including the original source, or journalists massively advocating to include the original source, then surely the CMS would cater to it. I think it's safe to draw the conclusion that most journalists don't care about it.

New comment by molf in "Do Things That Don't Scale (2013)"

molf — Fri, 15 Aug 2025 23:10:40 +0000

I feel this is an insanely distorted take.

How about extreme and utter irrelevance (such as after building a thing nobody wants)?

Or how about this, arguably the most common: slightly successful; nobody hates it but nobody loves it either. Something people feel mildly positive about, but there is zero “hype” and also no “moat” and nobody cares enough to hate it.

New comment by molf in "The bitter lesson is coming for tokenization"

molf — Tue, 24 Jun 2025 16:22:13 +0000

The key insight is that you can represent different features by vectors that aren't exactly perpendicular, just nearly perpendicular (for example between 85 and 95 degrees apart). If you tolerate such noise then the number of vectors you can fit grows exponentially relative to the number of dimensions.

12288 dimensions (GPT3 size) can fit more than 40 billion nearly perpendicular vectors.

[1]: https://www.3blue1brown.com/lessons/mlp#superposition

New comment by molf in "“Don’t mock what you don't own” in 5 minutes (2022)"

molf — Wed, 18 Jun 2025 10:18:00 +0000

But ultimately it suggests this test; which only tests an empty loop?

  def test_empty_drc():
      drc = Mock(
          spec_set=DockerRegistryClient,
          get_repos=lambda: []
      )

      assert {} == get_repos_w_tags_drc(drc)

Maybe it's just a poor example to make the point. I personally think it's the wrong point to make. I would argue: don't mock anything _at all_ – unless you absolutely have to. And if you have to mock, by all means mock code you don't own, as far _down_ the stack as possible. And only mock your own code if it significantly reduces the amount of test code you have to write and maintain.

I would not write the test from the article in the way presented. I would capture the actual HTTP responses and replay those in my tests. It is a completely different approach.

New comment by molf in "“Don’t mock what you don't own” in 5 minutes (2022)"

molf — Wed, 18 Jun 2025 09:30:19 +0000

I’m not sure this is good advice. I prefer to test as much of the stack as possible. The most common mistake I see these days is people testing too much in isolation, which leads to a false sense of safety.

If you care about being alerted when your dependencies break, writing only the kind of tests described in the article is risky. You’ve removed those dependencies from your test suite. If a minor library update changes `.json()` to `.parse(format="json")`, and you assumed they followed semver but they didn’t: you’ll find out after deployment.

Ah, but you use static typing? Great! That’ll catch some API changes. But if you discover an API changed without warning (because you thought nobody would ever do that) you’re on your own again. I suggest using a nice HTTP recording/replay library for your tests so you can adapt easily (without making live HTTP calls in your tests, which would be way too flaky, even if feasible).

I stopped worrying long ago about what is or isn’t “real” unit testing. I test as much of the software stack as I can. If a test covers too many abstraction layers at once, I split it into lower- and higher-level cases. These days, I prefer fewer “poorly” factored tests that cover many real layers of the code over countless razor-thin unit tests that only check whether a loop was implemented correctly. While risking that the whole system doesn’t work together. Because by the time you get to write your system/integration/whatever tests, you’re already exhausted from writing and refactoring all those near-pointless micro-tests.

New comment by molf in "Selfish reasons for building accessible UIs"

molf — Tue, 17 Jun 2025 13:20:17 +0000

Not just in dev tools; that mess is also in your source code...

New comment by molf in "In case of emergency, break glass"

molf — Thu, 12 Jun 2025 08:48:46 +0000

The points about visual hierarchy are spot on, in particular on macOS. I think Apple has two realistic paths forward to resolve this mess:

1. Double down on the aesthetic and gradually redesign apps to improve the hierarchy. That would mean adapting UX across countless apps to serve the new look.

2. Tone down the glass effects and shadows drastically. Preserve existing app layouts without compromising usability as much. We'll be left with shimmering buttons and panels, a bit more blurred transparency than in the 'current' design language.

My guess is they will end up choosing option 2, simply because it’s cheaper.

New comment by molf in "How we’re responding to The NYT’s data demands in order to protect user privacy"

molf — Fri, 06 Jun 2025 10:36:36 +0000

It would help tremendously if OpenAI would make it possible to apply for zero data retention (ZDR). For many business needs there is no reason to store or log any request at all.

In theory it is possible to apply (it's mentioned on multiple locations in the documentation), but in practice requests are just being ignored. I get that approval needs to be given, and that there are barriers to entry. But it seems to me they mention zero-data retention only for marketing purposes.

We have applied multiple times and have yet to receive ANY response. Reading through the forums this seems very common.

New comment by molf in "A maths proof that is only true in Japan"

molf — Fri, 06 Jun 2025 08:12:46 +0000

Totally get your point, but math is still a human creation. The symbols, language, and frameworks we use are cultural, and disagreement over proofs like this one shows math depends on shared understanding, not just objective truth.

New comment by molf in "A South Korean grand master on the art of the perfect soy sauce"

molf — Thu, 22 May 2025 21:31:22 +0000

This is the first time I hear about keeping soy sauce in the fridge. Is this common?