Hacker News: fs90

New comment by fs90 in "Ntsc-rs – open-source video emulation of analog TV and VHS artifacts"

fs90 — Sun, 07 Jun 2026 00:27:07 +0000

That's interesting!

Show HN: NodeDB – High Perfomance Multi-Model Database

fs90 — Mon, 11 May 2026 23:21:42 +0000

Hey HN,

I've been working on a multi-model database called NodeDB.

Originally, i've found out the idea of SurrealDB quite good. However, it doesn't have some graph and vector features that I need. And since it is just a KV wrapper, instead of purpose-built engine, the performance will never be close to the specialized databases (like Neo4j, Pinecone, Clickhouse, etc).

And i've asked myself, what if, there is a database that have the same idea, but built differently? Instead of just treating it as KV database, we build specialized engines for the data.

Besides that, I want it to be able to support my IOT/edge project, where i need offline sync capabilities (Currentyl still in progress).

Will it work?

I put it into test. I've been experimenting and researching for a year, creating multiple versions, and then I created NodeDB.

Disclaimer: It is still in public beta (as of May 2026), but it really excites me if I can make this db work. And I use AI as assistant for coding and planning. It is nearly impossible to do as a solo developer without any AI assistance.

Would love feedback from HN:

- Are there specific features or improvements that would make it more useful?

If you're interested in experimenting or contributing, the repo is here: GitHub Repo: https://github.com/nodedb-lab/nodedb

Looking forward to your thoughts!

Comments URL: https://news.ycombinator.com/item?id=48102084

Points: 5

# Comments: 1

Show HN: High perfomance database - Graph, vector, array, columnar, KV

fs90 — Sat, 02 May 2026 19:11:41 +0000

Article URL: https://github.com/nodedb-lab/nodedb

Comments URL: https://news.ycombinator.com/item?id=47989443

Points: 3

# Comments: 0

Show HN: Splintr – Rust BPE tokenizer, 12x faster than tiktoken for batches

fs90 — Thu, 27 Nov 2025 01:11:15 +0000

Hi HN,

I built Splintr, a BPE tokenizer in Rust (with Python bindings), because I found existing Python-based tokenizers were bottlenecking my data processing pipelines.

While OpenAI's tiktoken is the gold standard for correctness, I found I could get significantly better throughput on modern multi-core CPUs by rethinking how parallelism is applied.

Splintr achieves ~111 MB/s batch throughput (vs ~9 MB/s for tiktoken).

The Design Choice: "Sequential by Default" One of the most interesting findings during development was that naive parallelism actually hurts performance for typical LLM inputs. Thread pool overhead is significant for texts under 1MB.

I implemented a hybrid strategy:

Single Text (encode): Purely sequential. It’s 3-4x faster than tiktoken simply by using pcre2 with JIT instead of standard regex handling.

Batch Processing (encode_batch): Parallelizes across texts using Rayon, rather than within a text. This saturates all cores without the overhead of splitting small strings.

Other Features:

Safety: Strict UTF-8 compliance, including a streaming decoder that correctly buffers incomplete multi-byte characters.

Compatibility: Drop-in support for cl100k_base (GPT-4), o200k_base (GPT-4o), and llama3 vocabularies.

The repo is written in Rust with PyO3 bindings. I’d love feedback on the implementation or other potential optimization tricks for BPE.

Thanks!

Comments URL: https://news.ycombinator.com/item?id=46064322

Points: 1

# Comments: 0