Hacker News: DRMacIver

New comment by DRMacIver in "Hegel, a universal property-based testing protocol and family of PBT libraries"

DRMacIver — Fri, 10 Apr 2026 16:23:10 +0000

So with the caveat that I'm not super familiar with Validity...

The biggest thing that leaps out at me looking at it is that Hegel is very built around flexible user-specified data generation (using the library's base generators and combinators) mixed freely with test execution. Validity in contrast looks extremely type-based, which is convenient when you're only testing fairly general properties of built-in types, but I've never found flexible enough to be a really good basis for property-based testing once your testing needs get even a bit more complicated. e.g. a lot of tests want some sort of upper and lower bounds on their numbers, and I don't want to define a type for each.

For an only slightly more involved example of this, suppose you've got, say, a Project type, and Projects have an owner that is a User. You might want a test that is about a single user that has some number of projects. In a generator-based approach, this is easy: You just generate a User object, then you generate a bunch of Project objects that have to have that User as their owner. Just works.

In contrast, in a type based approach, there's basically no way to express this without e.g. defining a new ProjectsOwnedByASingleUser type and defining what it means to be a valid instance of that type... It's a lot of machinery for what is IMO a strictly worse user experience.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Wed, 25 Mar 2026 14:57:08 +0000

They're random but with a lot of tweaks to the distribution that makes weird edge cases pop up with fairly high probability, and with some degree of internal mutation, followed by shrinking to turn them into nice tidy test cases. In Python we do a little bit of code analysis to find interesting constants, but Hegel doesn't do that, it's just tuned to common edge cases.

I think all the examples I had in the post are typically found in the first 100 test cases and reliably found in the first 1000, but I wouldn't swear that that's the case without double checking.

We don't do any coverage-guidance in Hegel or Hypothesis, because for unit testing style workflows it's rarely worth it - it's very hard to do good coverage guidance in under like... 10k test runs at a minimum, 100k is more likely. You don't have enough time to get really good at exploring the state space, and you haven't hit the point where pure random testing has exhausted itself enough that you have to do something smarter to win.

It's been a long-standing desire of mine to figure out a way to use coverage to do better even on short runs, and there are some kinda neat things you can do with it, but we've not found anything really compelling.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Wed, 25 Mar 2026 12:48:12 +0000

Yeah, that's true. I was going to say that it's maybe not fair to count things that just don't even make sense in Rust, but I guess the logical analogue is something like `Box` which it would make sense to have a default generator for but also we're totally not going to support that.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Wed, 25 Mar 2026 12:23:35 +0000

> To all you amateur Hegel enthusiasts out there: there is no synthesis in Hegel.

Looks like the mods deleted the last long thread about this, so best not to relitigate, but short version: Yes, we know. We liked the name and thought it was funny so we kept it.

> Otherwise: Congratulations on the QuickCheck-style testing in Rust. At work, I’m always surprised that property-based testing is so little known and so rarely used outside of functional programming.

Actually, it's Hypothesis-style testing in Rust. There was already QuickCheck style.

Property-based testing is in fact far more widely used in Python than in functional programming (probably not as a percentage of users, but in terms of raw numbers), which I'm always surprised that the functional programming community seems mostly unaware of.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 21:44:53 +0000

You're very welcome! I'm glad it's been useful for you.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 21:18:02 +0000

Ugh, yeah. Duplicating the code under test is a bad habit that Claude has had when writing property-based tests from very early on and has never completely gone away.

Hmm now that you mention it we should add some instructions not to do that in the hegel-skill, though oddly I've not seen it doing it so far.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 20:48:57 +0000

What do you think we're currently missing that Python's `from_type` has? I actually think the auto-deriving stuff we currently have in Rust is as good or better than from_type (e.g. it gets you the builder methods, has support for enums), but I've never been a heavy from_type user.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 20:14:50 +0000

Please let us know how it goes!

As Liam says, the derive generator is not very well dogfooded at present. The claude skill is a bit better, but we've only been through a few iterations of using it and getting Claude to improve it, and porting from proptest is one of the less well tested areas (because we don't use proptest much ourselves).

I expect all of this works, but I'd like to know ways that it works less well than it could. Or, you know, to bask in the glow of praise of it working perfectly if that turns out to be an option.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 17:38:57 +0000

Answered this over here: https://news.ycombinator.com/item?id=47506274

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 17:32:10 +0000

The short answer to how it fits into existing ecosystems is... in competition I suppose. We've got a lot of respect for the people working on these libraries, but we think the Hypothesis-based approach is better than the various approaches people have adopted. I don't love that the natural languages for us to start with are ones where there are already pretty good property-based testing libraries whose toes we're stepping on, but it ended up being the right choice because those are the languages people care about writing correct software in, and also the ones we most want the tools in ourselves!

I think right now if you're a happy proptest user it's probably not clear that you should switch to Hegel. I'd love to hear about people trying, but I can't hand on my heart say that it's clearly the correct thing for you to do given its early state, even though I believe it will eventually be.

But roughly the things that I think are clearly better about the Hegel approach and why it might be worth trying Hegel if you're starting greenfield are:

* Much better generator language than proptest (I really dislike proptest's choices here. This is partly personal aesthetic preferences, but I do think the explicitly constructed generators work better as an approach and I think this has been borne out in Hypothesis). Hegel has a lot of flexible tooling for generating the data you want.

* Hegel gets you great shrinking out of the box which always respects the validity requirements of your data. If you've written a generator to always ensure something is true, that should also be true of your shrunk data. This is... only kindof true in proptest at best. It's not got quite as many footguns in this space as original quickcheck and its purely type-based shrinking, but you will often end up having to make a choice between shrinking that produces good results and shrinking that you're sure will give you valid data.

* Hegel's test replay is much better than seed saving. If you have a failing test and you rerun it, it will almost immediately fail again in exactly the same way. With approaches that don't use the Hypothesis model, the best you can hope for is to save a random seed, then rerun shrinking from that failing example, which is a lot slower.

There are probably a bunch of other quality of life improvements, but these are the things that have stood out to me when I've used proptest, and are in general the big contrast between the Hypothesis model and the more classic QuickCheck-derived ones.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 17:11:06 +0000

Ouch. Classic Claude. It does tend to cheat when it gets stuck, and I've had some success with stricter harnesses, reflection prompts and getting it to redo work when it notices it's cheated, but it's definitely not a solved problem.

My guess is that you wouldn't have had a better time without PBT here and it would still have either cheated or claimed victory incorrectly, but definitely agreed that PBT can't fully fix the problem, especially if it's PBT that the agent is allowed to modify. I've still anecdotally found that the results are better than without it because even if agents will often cheat when problems are pointed out, they'll definitely cheat if problems aren't pointed out.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 17:05:21 +0000

TBF PBT has been the present in Python for a while now.

10 years ago might have been a little early (Hypothesis 1.0 came out 11 years ago this coming Thursday), but we had pretty wide adoption by year two and it's only been growing. It's just that the other languages have all lagged behind.

It's by no means universally adopted, but it's not a weird rare thing that nobody has heard of.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 16:40:28 +0000

So I think a short list of big API differences are something like:

* Hypothesis/Hegel are very much focused on using test assertions rather than a single property that can be true or false. This naturally drives a style that is much more like "normal" testing, but also has the advantage that you can distinguish between different types of failing test. We don't go too hard on this, but both Hegel and Hypothesis will report multiple distinct failures if your test can fail in multiple ways.

* Hegelothesis's data generation and how it interacts with testing is much more flexible and basically fully imperative. You can basically generate whatever data you like wherever in your test you like, freely interleaving data generation and test execution.

* QuickCheck is very much type-first and explicit generators as an afterthought. I think this is mostly a mistake even in Haskell, but in languages where "just wrap your thing in a newtype and define a custom implementation for it" will get you a "did you just tell me to go fuck myself?" response, it's a nonstarter. Hygel is generator first, and you can get the default generator for a type if you want but it's mostly a convenience function with the assumption that you're going to want a real generator specification at some point soon.

From an implementation point of view, and what enables the big conveniences, Hypothesis has a uniform underlying representation of test cases and does all its operations on them. This means you get:

* Test caching (if you rerun a failing test, it will immediately fail in the same way with the previously shrunk example)

* Validity guarantees on shrinking (your shrunk test case will always be ones your generators could have produced. It's a huge footgun in QuickCheck that you can shrink to an invalid test case)

* Automatically improving the quality of your generators, never having to write your own shrinkers, and a whole bunch of other quality of life improvements that the universal representation lets us implement once and users don't have to care about.

The validity thing in particular is a huge pain point for a lot of users of PBT, and is what drove a lot of the core Hypothesis model to make sure that this problem could never happen.

The test caching is because I personally hated rerunning tests and not knowing whether it was just a coincidence that they were passing this time or that the test case had changed.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 16:23:39 +0000

> But the problem remains verifying that the tests actually test what they're supposed to.

Definitely. It's a lot harder to fake this with PBT than with example-based testing, but you can still write bad property-based tests and agents are pretty good at doing so.

I have generally found that agents with property-based tests are much better at not lying to themselves about it than agents with just example-based testing, but I still spend a lot of time yelling at Claude.

> So "a huge part" - possibly, but there are other huge parts still missing.

No argument here. We're not claiming to solve agentic coding. We're just testing people doing testing things, and we think that good testing tools are extra important in an agentic world.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 16:18:09 +0000

It's on the agenda! We definitely want to rewrite the Hegel core server in rust, but not as much as we wanted to get it working well first.

My personal hope is that we can port most of the Hypothesis test suite to hegel-rust, then point Claude at all the relevant code and tell it to write us a hegel-core in rust with that as its test harness. Liam thinks this isn't going to work, I think it's like... 90% likely to get us close enough to working that we can carry it over the finish line. It's not a small project though. There are a lot of fiddly bits in Hypothesis, and the last time I tried to get Claude to port it to Rust the result was better than I expected but still not good enough to use.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 16:15:34 +0000

We looked at it and given that the repo was archived nearly two years ago decided it wasn't a problem.

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 16:06:26 +0000

Conversation with Will (Antithesis CEO) a couple months ago, heavily paraphrased:

Will: "Apparently Hegel actually hated the whole Hegelian dialectic and it's falsely attributed to him."

Me: "Oh, hm. But the name is funny and I'm attached to it now. How much of a problem is that?"

Will: "Well someone will definitely complain about it on hacker news."

Me: "That's true. Is that a problem?"

Will: "No, probably not."

(Which is to say: You're entirely right. But we thought the name was funny so we kept it. Sorry for the philosophical inaccuracy)

New comment by DRMacIver in "Hypothesis, Antithesis, synthesis"

DRMacIver — Tue, 24 Mar 2026 15:42:42 +0000

Post author here btw, happy to take questions, whether they're about Hegel in particular, property-based testing in general, or some variant on "WTF do you mean you wrote rust bindings to a python library?"

New comment by DRMacIver in "Hypothesis: Property-Based Testing for Python"

DRMacIver — Wed, 05 Nov 2025 17:48:49 +0000

How popular do you want it to be?

The Python survey data (https://lp.jetbrains.com/python-developers-survey-2024/) holds pretty consistently at 4% of Python users saying they use it, which isn't as large as I'd like, but given that only 64% of people in the survey say they use testing at all isn't doing too badly, and I think certainly falsifies the claim that Python programs don't have properties you can test.

Teaching My Younger Self to Program

DRMacIver — Mon, 02 Dec 2024 12:47:36 +0000

Article URL: https://thinkfeelplay.substack.com/p/teaching-my-younger-self-to-program

Comments URL: https://news.ycombinator.com/item?id=42295666

Points: 1

# Comments: 0