Hacker News: brig90

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Tue, 20 May 2025 02:57:00 +0000

Honestly I've never heard of Rohonc Codex. I'll have to check it out! Thanks!

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Mon, 19 May 2025 15:13:10 +0000

I'm definitely not the most comfortable writing in public forums, so guilty as charged with throwing my comments through an LLM to make sure my point isn't being misconstrued.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Mon, 19 May 2025 03:07:44 +0000

Honestly, I had never even heard of the manuscript before this weekend. I’ve been looking for interesting ways to strengthen my understanding of NLPs, and thought: 1) maybe this would be a good fit, and 2) maybe it hadn’t been approached in quite this way before?

That second part wasn’t super important though — this was more about learning and experimenting than trying to break new ground. Really appreciate the kind words, and hopefully it sparks someone to take it even further.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Mon, 19 May 2025 03:04:18 +0000

Great question — and something I've been thinking about. I stripped suffixes mostly to normalize some of the repeated endings (aiin, dy, etc.) that felt like filler, but you’re totally right that preserving them might preserve structure I lost.

Clustering by sentence or page would be interesting too — I haven't gone that far yet, but it’d be fascinating to see if there’s consistency across visual/media sections. Appreciate the insight!

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Mon, 19 May 2025 01:23:47 +0000

Totally — I love that kind of sideways thinking. Earthquake prediction feels like one of those massive, noisy systems where patterns might exist, but they’re buried deep in complexity. I’ll admit, I know absolutely nothing about seismology, so I have no idea how realistic that kind of modelling would be — but yeah, it feels like one of those domains where structure might be hiding in what looks like chaos.

Appreciate the nudge — always fascinating to see where people take this kind of thinking.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 22:03:12 +0000

This doesn’t burst my bubble at all — if anything, it’s great to hear that others have been able to make meaningful progress using different methods. I wasn’t trying to crack the manuscript or stake a claim on the origin; this project was more about exploring how modern tools like NLP and clustering could model structure in unknown languages.

My main goal was to learn and see if the manuscript behaved like a real language, not necessarily to translate it. Appreciate the link — I’ll check it out (once I get my German up to speed!).

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 20:27:01 +0000

That’s a really interesting question — and one I’ve been circling in the back of my head, honestly. I’m not a cryptographer, so I can’t speak to how feasible a brute-force approach is at scale, but the idea of mapping each Voynich “word” to a real word in another language and optimizing for coherence definitely lines up with some of the more experimental approaches people have tried.

The challenge (as I understand it) is that the vocabulary size is pretty massive — thousands of unique words — and the structure might not be 1:1 with how real language maps. Like, is a “word” in Voynich really a word? Or is it a chunk, or a stem with affixes, or something else entirely? That makes brute-forcing a direct mapping tricky.

That said… using cluster IDs instead of individual word (tokens) and scoring the outputs with something like a language model seems like a pretty compelling idea. I hadn’t thought of doing it that way. Definitely some room there for optimization or even evolutionary techniques. If nothing else, it could tell us something about how “language-like” the structure really is.

Might be worth exploring — thanks for tossing that out, hopefully someone with more awareness or knowledge in the space see's it!

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 20:13:14 +0000

Apologies but its not letting me edit post any longer (I'm new to HN), here's the link though: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 19:01:41 +0000

Thanks for pointing those out — I hadn’t seen PaCMAP or LocalMAP before, but that definitely looks like the kind of structure-preserving approach that would fit this data better than PCA. Appreciate the nudge — going to dig into those a bit more.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 17:25:30 +0000

I’m definitely not a Voynich expert or linguist — I stumbled into this more or less by accident and thought it would make for a fun NLP learning project. Really appreciate you pointing to those names and that forum — I wasn’t aware of the deeper work on QOKEDAR/CHOLDAIIN cycles or the slot alphabet stuff. It’s encouraging to hear that the kind of structure I modeled seems to resonate with where serious research is heading.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 17:04:58 +0000

Totally fair — I defaulted to paraphrase-multilingual-MiniLM-L12-v2 mostly for speed and wide compatibility, but you’re right that it’s long in the tooth by today’s standards. I’d be really curious to see how something like all-mpnet-base-v2 or even text-embedding-ada-002 would behave, especially if we keep the suffixes in and lean into full contextual embeddings rather than reducing to root forms.

Appreciate you calling that out — that’s a great push toward iteration.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 16:46:56 +0000

Great points — thank you. PCA gave me surprisingly clean separation early on, so I stuck with it for the initial run. But you’re right — throwing UMAP or t-SNE at it would definitely give a nonlinear perspective that could catch subtler patterns (or failure cases).

And yes to the cross-cluster reference idea — I didn’t build a similarity matrix between clusters, but now that you’ve said it, it feels like an obvious next step to test how much signal is really being captured.

Might spin those up as a follow-up. Appreciate the thoughtful nudge.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 16:45:39 +0000

Yep, that’s exactly right — the words like "okeeodair" come directly from the EVA transliteration files, which map the original Voynich glyphs to ASCII approximations. So I’m not working with the glyphs themselves, but rather the standardized transliterated words based on the EVA (European Voynich Alphabet) system. The transliterations I used can be found here: https://www.voynich.nu/

I didn’t re-map anything back to glyphs in this project — everything’s built off those EVA transliterations as a starting point. So if "okeeodair" exists in the dataset, that’s because someone much smarter than me saw a sequence of glyphs and agreed to call it that.

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 16:41:11 +0000

Great question — and you’re right to catch the assumption there. I did assume left-to-right when stripping suffixes, mostly because that’s how the transliteration files were structured and how most Voynich analyses approach it. I didn’t test the reverse — though flipping the structure and checking clustering/syntax behavior would be a super interesting follow-up. Appreciate you calling it out!

New comment by brig90 in "Show HN: I modeled the Voynich Manuscript with SBERT to test for structure"

brig90 — Sun, 18 May 2025 16:39:59 +0000

Yep, that was my takeaway too — the structure feels too consistent to be random, and it echoes known linguistic patterns.

Show HN: I modeled the Voynich Manuscript with SBERT to test for structure

brig90 — Sun, 18 May 2025 16:09:01 +0000

I built this project as a way to learn more about NLP by applying it to something weird and unsolved.

The Voynich Manuscript is a 15th-century book written in an unknown script. No one’s been able to translate it, and many think it’s a hoax, a cipher, or a constructed language. I wasn’t trying to decode it — I just wanted to see: does it behave like a structured language?

I stripped a handful of common suffix-like endings (aiin, dy, etc.) to isolate what looked like root forms. I know that’s a strong assumption — I call it out directly in the repo — but it helped clarify the clustering. From there, I used SBERT embeddings and KMeans to group similar roots, inferred POS-like roles based on position and frequency, and built a Markov transition matrix to visualize cluster-to-cluster flow.

It’s not translation. It’s not decryption. It’s structural modeling — and it revealed some surprisingly consistent syntax across the manuscript, especially when broken out by section (Botanical, Biological, etc.).

GitHub repo: https://github.com/brianmg/voynich-nlp-analysis Write-up: https://brig90.substack.com/p/modeling-the-voynich-manuscrip...

I’m new to the NLP space, so I’m sure there are things I got wrong — but I’d love feedback from people who’ve worked with structured language modeling or weird edge cases like this.

Comments URL: https://news.ycombinator.com/item?id=44022353

Points: 381

# Comments: 132