Hacker News: gunnarmorling

New comment by gunnarmorling in "Hardwood: A New Parser for Apache Parquet"

gunnarmorling — Sun, 01 Mar 2026 14:17:58 +0000

Thanks! The heavy dependency footprint of parquet-java was the main driver for kicking off this project. Hardwood doesn't have any mandatory dependencies; any libs for compression algorithms used can be added by the user (most of them are single JARs with no further transitive dependencies) as needed. Same for log bindings (Hardwood is using the System.Logger abstraction).

New comment by gunnarmorling in "Hardwood: A New Parser for Apache Parquet"

gunnarmorling — Sun, 01 Mar 2026 14:15:30 +0000

Yes, absolutely, DuckDB is great. But I think there's a space and need for a pure Java library.

New comment by gunnarmorling in "Hardwood: A New Parser for Apache Parquet"

gunnarmorling — Sun, 01 Mar 2026 14:14:44 +0000

Thanks! See https://news.ycombinator.com/item?id=47206861 for some general comments on performance. I haven't measured bit unpacking specifically yet.

New comment by gunnarmorling in "Hardwood: A New Parser for Apache Parquet"

gunnarmorling — Sun, 01 Mar 2026 14:13:38 +0000

We have some first benchmarks here: https://github.com/hardwood-hq/hardwood/blob/main/performanc....

From the post:

> As an example, the values of three out of 20 columns of the NYC taxi ride data set (a subset of 119 files overall, ~9.2 GB total, ~650M rows) can be summed up in ~2.7 sec using the row reader API with indexed access on my MacBook Pro M3 Max with 16 CPU cores. With the column reader API, the same task takes ~1.2 sec.

In my measurements, this is significantly faster than parquet-java for the same task (which is not surprising, as Hardwood is multi-threaded); but I want to be sure I am setting up and configuring parquet-java correctly before publishing any comparisons. The test above also is hooked up to run parquet-java (and there's a set-up for PyArrow, too), so you could run it yourself on your machine if you wanted to.

So far, we've spent most time optimizing for flat (non-nested) data sets which are fully parsed (either all columns, or with projections) and I think it's faring really well for those. There's no support for predicate push-down yet, so right now, Hardwood isn't optimal for use cases with high query selectivity; this is the next thing on the roadmap though.

New comment by gunnarmorling in "Ask HN: What are you working on? (February 2026)"

gunnarmorling — Tue, 10 Feb 2026 10:46:38 +0000

I am working on a new Java parser for the Apache Parquet file format, with minimal dependencies and multi-threaded execution: https://github.com/hardwood-hq/hardwood.

Approaching the home stretch for a first 1.0 preview release, including: support for parsing Parquet files with flat and nested schemas, all physical and logical column types, core and advanced encodings, projections, compression, multi-threading, etc. all that with a pretty decent performance.

Next on the roadmap are SIMD support, predicate push-down (bloom filters, statistics, etc.), writer support.

New comment by gunnarmorling in "Show HN: pgwire-replication - pure rust client for Postgres CDC"

gunnarmorling — Fri, 16 Jan 2026 22:34:06 +0000

Nice one, great to see this addition to the Rust ecosystem!

Reading through the README, this piqued my curiosity:

> Small or fast transactions may share the same WAL position.

I don't think that's true; each data change and each commit (whether explicit or not) has its own dedicated LSN.

> LSNs should be treated as monotonic but not dense.

That's not correct; commit LSNs are monotonically increasing, and within a transaction, event LSNs are monotonically increasing. I.e. the tuple commit-LSN/event-LSN is monotonically increasing, but not LSNs per se. You can run multiple concurrent transactions to observe this.

New comment by gunnarmorling in "The Most Popular Blogs of Hacker News in 2025"

gunnarmorling — Sat, 03 Jan 2026 20:35:20 +0000

Oh nice, thanks for providing that data!

Made it to #369 in 2025 with morling.dev; let's see what's in stock this year :)

  year  total_score  rank  days_mentioned
  2025  903          369   8
  2024  604          581   2
  2023  547          861   3
  2022  450          1165  4
  2021  188          2308  2

New comment by gunnarmorling in "2026: The Year of Java in the Terminal?"

gunnarmorling — Wed, 31 Dec 2025 19:17:58 +0000

It's a non-issue with GraalVM native binaries. See https://news.ycombinator.com/item?id=46445989 for an example: this CLI tools starts in ms, fast enough you can launch it during tab completions and have it invoke a REST API without any noticeable delay whatsoever.

But also when running on the JVM, things have improved dramatically over the last few years, e.g. due to things such as AOT class loading and linking. For instance, a single node Kafka broker starts in ~300 ms.

New comment by gunnarmorling in "2026: The Year of Java in the Terminal?"

gunnarmorling — Wed, 31 Dec 2025 17:25:07 +0000

Assuming JVM installation is not required (to which I agree, it shouldn't be), why would you care which language a CLI tool is written in? I mean, do you even know whether a given binary is implemented in Go, Rust, etc.? I don't see how it makes any meaningful difference from a user perspective.

> Pkl, which is at least built using Graal Native Image, but (IMO) would _still_ have better adoption if it was written in something else.

Why do you think is this?

New comment by gunnarmorling in "2026: The Year of Java in the Terminal?"

gunnarmorling — Wed, 31 Dec 2025 17:09:47 +0000

As a practical example for a Java-based CLI tool in the wild, here's kcctl, a command line client for Kafka Connect: https://github.com/kcctl/kcctl/. It's a native binary (via GraalVM), starting up in a few ms, so that it actually be invoked during tab completions and do a round-trip to the Kafka Connect REST API without any noticeable delay whatsoever.

Installation is via brew, so same experience as for all the other CLI tools you're using. The binary size is on the higher end (52 MB), but I don't think this makes any relevant difference for practical purposes. Build times with GraalVM are still not ideal (though getting better). Cross compilation is another sore point, I'm managing it via platform-specific GitHub Action runners. From a user perspective, non of this matters, I'd bet most users don't know that kcctl is written in Java.

New comment by gunnarmorling in "2026: The Year of Java in the Terminal?"

gunnarmorling — Wed, 31 Dec 2025 16:56:12 +0000

Can you back up your claim the post is written by AI?

New comment by gunnarmorling in "A framework for technical writing in the age of LLMs"

gunnarmorling — Fri, 26 Dec 2025 12:14:43 +0000

To me it fundamentally does not make sense to publish texts generated by an LLM. These tools are available to everyone, so readers can just use the LLM directly. When I'm writing a blog post, it's to share my own perspective, ideas, experiences, and that's something which an LLM can't do.

That's not to say I don't use LLMs at all; I don't use them for original writing though, but rather as a copy editor, helping with improving expressions, getting language and grammar right, etc.

I have publicly documented my usage of AI for writing on my blog: https://www.morling.dev/ai/. I wished more folks would do that, as I think transparency is key here, every reader should always know whether they'rea reading a text written by a human or by a machine.

GitHub to postpone the announced billing change for self-hosted GitHub Actions

gunnarmorling — Wed, 17 Dec 2025 20:48:46 +0000

Article URL: https://twitter.com/github/status/2001372894882918548

Comments URL: https://news.ycombinator.com/item?id=46305311

Points: 4

# Comments: 1

Ten Tips to Make Conference Talks Suck Less (2022)

gunnarmorling — Sun, 14 Dec 2025 20:37:47 +0000

Article URL: https://www.morling.dev/blog/ten-tips-make-conference-talks-suck-less/

Comments URL: https://news.ycombinator.com/item?id=46266627

Points: 5

# Comments: 1

New comment by gunnarmorling in "Getting into public speaking"

gunnarmorling — Sun, 14 Dec 2025 19:50:38 +0000

This approach can work for experienced speakers, in particular if you have spoken about the given topic before, but I'd strongly advise against not rehearsing for folks a bit newer into their speaking career. So often I have seen talks where folks either were time after half of their time slot, or they ran out of time towards the end. Or they lost track of the plot, went off on a tangent for way too long, etc.

All this is not great for the audience (who have "invested" into your session, by paying for the ticket, spending time away from work and family, not attending other concurrent sessions, etc.), and it can so easily be avoided by rehearsing.

The most common reason I have seen for folks skipping to rehearse is the awkward feeling you might have when speaking loud all by yourself. If that's the issue, it can help to do a dry run in front of colleagues. In any case, "winging it" is best reserved for later on, after having gathered quite a bit of speaking experience and having spoken about the same, or very similar, topics before.

I'd also recommend to avoid reading from slides during a talk as much as possible, it's also not a great experience for the audience. There shouldn't be much text on slides to begin with, as folks will either read that, or listen to what you say, but typically have a hard time doing both at once.

(All this is a general recommendation, not a comment on your talks which I have not seen)

New comment by gunnarmorling in "You gotta push if you wanna pull"

gunnarmorling — Fri, 12 Dec 2025 09:46:25 +0000

Yepp, making exactly that same point in the post:

> This is why we have indexes in databases, which, if you squint a little, are just another kind of materialized view, at least in their covering form.

New comment by gunnarmorling in "You Gotta Push If You Wanna Pull"

gunnarmorling — Mon, 08 Dec 2025 05:55:17 +0000

Thank you so much! Very glad to hear the post resonated with you.

New comment by gunnarmorling in "Idempotency keys for exactly-once processing"

gunnarmorling — Sat, 06 Dec 2025 09:50:01 +0000

"Idempotency key" is a widely accepted term [1] for this concept; arguably, you could call it "Deduplication key" instead, but I think this ship has sailed.

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/...

New comment by gunnarmorling in "Idempotency keys for exactly-once processing"

gunnarmorling — Sat, 06 Dec 2025 09:47:16 +0000

Exactly that.

New comment by gunnarmorling in "Idempotency keys for exactly-once processing"

gunnarmorling — Sat, 06 Dec 2025 09:44:54 +0000

> A more performant variation of this approach is the "hi/low" algorithm

I am discussing this approach, just not under that name:

> Gaps in the sequence are fine, hence it is possible to increment the persistent state of the sequence or counter in larger steps, and dispense the actual values from an in-memory copy.

In that model, a database sequence (e.g. fetched in 100 increments) represents the hi value, and local increments to the fetched sequence value are the low value.

However, unlike the log-based approach, this does not ensure monotonicity across multiple concurrent requests.