Hacker News: mweidner

New comment by mweidner in "When AI Builds Itself: Our progress toward recursive self-improvement"

mweidner — Fri, 05 Jun 2026 01:53:01 +0000

Indeed, I do not buy this argument. Would China's progress be close to where it is today without the US labs' examples? Would any of this be happening if OpenAI had not created ChatGPT?

New comment by mweidner in "When AI Builds Itself: Our progress toward recursive self-improvement"

mweidner — Thu, 04 Jun 2026 21:04:45 +0000

The folks I met who were talking about AI Safety in 2018 were certainly sincere, and the two people I knew who later joined Anthropic seem like the type to do it for the greater good instead of money.

I expect that Anthropic will eventually behave as you describe, like any other public corporation. However, my impression is that its current leaders are still more sincere than greedy.

New comment by mweidner in "When AI Builds Itself: Our progress toward recursive self-improvement"

mweidner — Thu, 04 Jun 2026 20:19:32 +0000

Is the idea to keep the world in balance via MAD? I could see that, though it's a dangerous gamble.

From Richard Rhode's "The Making of the Atomic Bomb", I got the impression that most scientists involved thought they could manage a US or UN monopoly on nukes after the war. General Groves attempted to buy up all of the world's uranium ore. Unfortunately, it is only high grade ore that is rare; many countries have low-grade ore.

New comment by mweidner in "When AI Builds Itself: Our progress toward recursive self-improvement"

mweidner — Thu, 04 Jun 2026 20:09:06 +0000

I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?

I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.

New comment by mweidner in "Show HN: Freenet, a peer-to-peer platform for decentralized apps"

mweidner — Thu, 21 May 2026 21:29:21 +0000

For values that don't have a natural merge function (or where you don't want to bother writing one), would it make sense to sync update logs instead? That is:

- The synced value is a history of client updates, sorted in some eventually consistent order (e.g. by hybrid logical clocks). Merging takes the union of the update sets.

- The user-visible value is the result of processing these updates in order, using arbitrary contract code.

This is overkill for simple last-writer-wins values, but it lets you support fairly general data types & arbitrary update functions, including ones that preserve application-specific invariants.

The Automerge CRDT library works like this already [1][2], but it only allows specific updates to JSON data. Sharing code via your contracts solves the hard part of generalizing that to arbitrary data & updates.

[1] https://automerge.org/

[2] https://arxiv.org/abs/1805.04263

New comment by mweidner in "My Favorite Bugs: Invalid Surrogate Pairs"

mweidner — Mon, 18 May 2026 19:38:27 +0000

A CRDT that operates on code units should work out okay, because each grapheme cluster will always be inserted and deleted in a single edit - hence it should stick together in the text. (Some CRDTs actually can mess this up by interleaving concurrent-inserted code units, but Yjs avoids doing so.)

From the fix PR, I believe the issue in this case was with the insertion operations passed to the CRDT, not the CRDT itself. Specifically, Yjs's ProseMirror integration infers what text was inserted by diffing before and after states, instead of directly capturing user inputs (even though those are provided by ProseMirror transactions). The diff algorithm, lib0/diff, was not grapheme aware and hence could generate an inaccurate diff containing lone surrogates.

Operating on code units is convenient in JavaScript because then your CRDT's `length` matches the language's `String.length`, and likewise for indexed access.

New comment by mweidner in "The future of version control"

mweidner — Sun, 22 Mar 2026 22:13:47 +0000

I'm surprised to see the emphasis on tracking lines of text, which ties in to the complexity of merge vs merge-the-other-way vs rebase. If we are committed to enhancing the change history, it seems wiser to go all in and store high-level, semantically-meaningful changes, like "move this code into an `if` block and add `else` block ...".

Consider the first example in the readme, "Left deletes the entire function [calculate]. Right adds a logging line in the middle". If you store the left operation as "delete function calculate" and the right operation as "add line ... to function calculate", then it's obvious how to get the intended result (calculate is completely deleted), regardless of how you order these operations.

I personally think of version control's job not as collaborating on the actual files, but as collaborating on the canonical order of (high-level) operations on those files. This is what a branch is; merge/rebase/cherry-pick are ways of updating a branch's operation order, and you fix a conflict by adding new operations on top. (Though I argue rebase makes the most sense in this model: your end goal is to append to the main branch.)

Once you have high-level operations, you can start adding high-level conflict markers like "this operation changed the docs for function foo; flag a conflict on any new calls to foo". Note that you will need to remember some info about operations' original context (not just their eventual order in the main branch) to surface these conflicts.

New comment by mweidner in "The future of version control"

mweidner — Sun, 22 Mar 2026 21:55:18 +0000

You can think of the semantics (i.e., specification) of any CRDT as a function that inputs the operation history DAG and outputs the resulting user-facing state. However, algorithms and implementations usually have a more programmatic description, like "here is a function `(internal state, new operation) -> new internal state`", both for efficiency (update speed; storing less info than the full history) and because DAGs are hard to reason about. But you do see the function-of-history approach in the paper "Pure Operation-Based Replicated Data Types" [1].

[1] https://arxiv.org/abs/1710.04469

New comment by mweidner in "The future of version control"

mweidner — Sun, 22 Mar 2026 21:41:29 +0000

While this is technically correct, folks discussing CRDTs in the context of text editing are typically thinking of a fairly specific family of algorithms, in which each character (or line) is assigned an immutable ID drawn from some abstract total order. That is the sense in which the original post uses the term (without mentioning a specific total order).

New comment by mweidner in "Lies I was told about collaborative editing, Part 2: Why we don't use Yjs"

mweidner — Mon, 16 Mar 2026 21:42:53 +0000

Your part 1 post was one of the inspirations for that :)

Specifically, it inspired the question: how can one let programmers customize the way edits are processed, to avoid e.g. the "colour" -> "u" anomaly*, without violating CRDT/OTs' strict algebraic requirements? To which the answer is: find a way to get rid of those requirements.

*This is not just common behavior, but also features in a formal specification [1] of how collaborative text-editing algorithms should behave! "[The current text] contains exactly the [characters] that have been inserted, but not deleted."

[1] http://www.cs.ox.ac.uk/people/hongseok.yang/paper/podc16-ful...

New comment by mweidner in ""

mweidner — Mon, 16 Mar 2026 14:39:01 +0000

The rebasing step is indeed a transformation. Some info in the "rebasing" link here [1].

Unlike traditional Operational Transformation, though, there are no "transformation properties" [2] that this rebasing needs to satisfy. (Normally a central-server OT would need to satisfy TP1, or else users may end up in inconsistent states.) Instead, the rebased operations just need to "make sense" to users, i.e., be a reasonable way to apply your original edit to a slightly-further-ahead state. ProseMirror has this sort of rebasing built in, via its step mappings, which lets the collaboration-specific parts of the algorithm look very simple - perhaps deceptively so.

[1] https://prosemirror.net/docs/guide/#collab [2] https://en.wikipedia.org/wiki/Operational_transformation#Tra...

New comment by mweidner in "Lies I was told about collaborative editing, Part 2: Why we don't use Yjs"

mweidner — Mon, 16 Mar 2026 14:20:23 +0000

The PowerSync folks and I worked on a different approach to ProseMirror collaboration here: https://www.powersync.com/blog/collaborative-text-editing-ov... It is neither CRDT nor OT, but does use per-character IDs (like CRDTs) and an authoritative server order of changes (like OT).

The current implementation does suffer from the same issue noted for the Yjs-ProseMirror binding: collaborative changes cause the entire document to be replaced, which messes with some ProseMirror plugins. Specifically, when the client receives a remote change, it rolls back to the previous server state (without any pending local updates), applies the incoming change, and then re-applies its pending local updates; instead of sending a minimal representation of this overall change to ProseMirror, we merely calculate the final state and replace with that.

This is not an inherent limitation of the collaboration algorithm, just an implementation shortcut (as with the Yjs binding). It could be solved by diffing ProseMirror states to find the minimal representation of the overall change, or perhaps by using ProseMirror's built-in undo/redo features to "map" the remote change through the rollback & re-apply steps.

New comment by mweidner in "Lies I was told about collaborative editing, Part 2: Why we don't use Yjs"

mweidner — Mon, 16 Mar 2026 13:05:42 +0000

This was my impression as well. If you ignore the paper and just look at the source code - and carefully study Seph Gentle's Yjs-like RGA implementation [1] - I believe you find that it is equivalent to an RGA-style tree, but with a different rule for sorting insertions that have the same left origin. That rule is hard to describe, but with some effort one can prove that concurrent insertions commute; I'm hoping to include this in a paper someday.

[1] https://josephg.com/blog/crdts-are-the-future/

New comment by mweidner in "Is the nested block/tree model the wrong foundation for modern rich text editors"

mweidner — Sat, 07 Feb 2026 00:58:56 +0000

Managing "a flat-ish collection of nodes" that can be moved around (without merely deleting and re-inserting nodes) is tricky because of how paragraphs can be split and merged. Notion tackled this for their offline mode: https://www.youtube.com/watch?v=AKDcWRkbjYs

If you take that as a solved problem, do your concerns change?

> Selection & Cursors: Selection across regions is notoriously hard. If "Region A" and "Region B" aren't siblings in a tree, how do we handle a user dragging a selection across both?

You could render them in the DOM as an old-fashioned tree, while internally manipulating your "flat" IR, to make selections work nicely.

This is not too different from how Yjs-ProseMirror works already: Yjs has its own representation of the state as a CRDT tree, which it converts to a separate ProseMirror tree on each update (& it uses a diff algorithm to map local user edits in the other direction).

> Prior Art: Has anyone seen a production system (perhaps in the desktop publishing or CAD world) that successfully treated rich text as a non-hierarchical "content pool"?

This might be how Dato CMS works? https://www.datocms.com/docs/content-modelling (I say this based off of 5 minutes spent watching someone else use it.)

> Are we stuck with trees because they are the "right" abstraction, or just because the browser gives them to us for free?

For lists specifically, I would argue the latter. It's natural to think of a list as a flat sequence of list items, in parallel to any surrounding paragraphs; forcing you to wrap your list items in a UL or OL is (to me) a browser quirk.

I made some progress fighting this in Tiptap: https://github.com/commoncurriculum/tiptap-extension-flat-li... Quill.js already models lists in this "flat" way.

New comment by mweidner in "Why haven't local-first apps become popular?"

mweidner — Tue, 23 Sep 2025 00:28:08 +0000

> More specifically, if you can edit different parts of a same document on different devices, then the document should be split across multiple files that can be synced independently

A more robust idea is to store a log of changes in files that are synced with Dropbox/OneDrive/etc. To prevent conflict warnings from Dropbox, you'll want each device to write to a separate file. Readers re-assemble the logs into (some) canonical total order, then reduce over that to get the document state.

The hard part is re-architecting your app to record all changes, instead of just writing out the current state to disk. However, once you do that, it can form the basis for other features like undo/redo, a view of the file's history, etc.

(The changes don't need to be CRDT/OT messages - anything deterministic works, though it's a good idea to make them "rebase-able", i.e., they will still do something reasonable when replayed on top of a collaborator's concurrent changes.)

New comment by mweidner in "How Figma’s multiplayer technology works (2019)"

mweidner — Wed, 20 Aug 2025 02:12:17 +0000

Indeed, Replicache works this way, using server reconciliation (one part of client-side prediction): https://doc.replicache.dev/concepts/how-it-works

New comment by mweidner in "Collaborative Text Editing Without CRDTs or OT"

mweidner — Fri, 23 May 2025 01:46:06 +0000

This sounds similar to the idea behind articulated (though with ids UUID-counter instead of time-counter): https://github.com/mweidner037/articulated

I will check out Antirez.

New comment by mweidner in "Collaborative Text Editing Without CRDTs or OT"

mweidner — Thu, 22 May 2025 00:30:02 +0000

A decentralized, eventually consistent total order on operations is a fully general CRDT, in the sense that you can put whatever (deterministic) operations you want in the total order and clients will end up in eventually consistent states.

Whether the converged result is at all reasonable is a different question.

New comment by mweidner in "Collaborative Text Editing Without CRDTs or OT"

mweidner — Thu, 22 May 2025 00:08:07 +0000

As the author, same.

My best guess is:

- Central-server collaborative editing work focuses on Operational Transformation (OT), likely due to inertia (studied since 1989) and the perception that storing an ID per character is inefficient. In fairness, it is, absent the optimizations introduced by RGASplit and Yjs (~2015).

- For decentralized editing, OT is very complicated, and CRDTs took over as the solution of interest (studied since 2005). Storing every operation permanently in a log - needed to use the linked approach without a server - feels inefficient, as does server reconciliation's undo/re-apply process. So CRDT research has focused on avoiding those inefficiencies, sacrificing simplicity along the way, instead of just embracing them as the easy way out.

To me, the "inefficiencies" seem quite manageable. Storage is cheap, text is small, and you probably want a complete op log anyway for auditing and document histories (cf. git). Server reconciliation's undo/re-apply process can be batched aggressively, e.g., only do it a few times per second; that just makes remote ops take a little longer to show up.

Granted, I have not built a complete app around server reconciliation or the linked approach, so perhaps there is a hidden catch. But I am encouraged by the success of Replicache (https://doc.replicache.dev/concepts/how-it-works), which is where I learned of server reconciliation.

New comment by mweidner in "Collaborative Text Editing Without CRDTs or OT"

mweidner — Wed, 21 May 2025 20:38:59 +0000

Even in the absence of a central server, you can still avoid CRDT/OT complexity if you have a decentralized way to eventually total order operations & apply them in that order: https://mattweidner.com/2025/05/21/text-without-crdts.html#d...

As others in the comments argue, this is technically a CRDT (though a fully general one); also, undoing/replaying ops is itself non-trivial to implement. However, I hope this is still simpler than using a traditional CRDT/OT for each data type.