Hacker News: frankling_

New comment by frankling_ in "A few interesting modern pixel fonts"

frankling_ — Wed, 27 May 2026 03:57:56 +0000

There are also these somewhat classic-looking bitmap terminal fonts large enough for modern displays: https://github.com/B2HDPI/B2HDPI

New comment by frankling_ in "5x5 Pixel font for tiny screens"

frankling_ — Thu, 23 Apr 2026 02:43:44 +0000

Here are a few made by upscaling and then manually cleaning up classic fonts: https://github.com/B2HDPI/B2HDPI

The glyph coverage is enough for most programming languages; missing glyphs just fall back to a pixelized look.

Lode 1.5x works really well at 110 ppi displays, which seems to be the uncanny valley for antialiasing.

New comment by frankling_ in "ArXiv declares independence from Cornell"

frankling_ — Fri, 20 Mar 2026 07:39:36 +0000

The recent announcement to reject review articles and position papers already smelled like a shift towards a more "opinionated" stance, and this move smells worse.

The vacuum that arXiv originally filled was one of a glorified PDF hosting service with just enough of a reputation to allow some preprints to be cited in a formally published paper, and with just enough moderation to not devolve into spam and chaos. It has also been instrumental in pushing publishers towards open access (i.e., to finally give up).

Unfortunately, over the years, arXiv has become something like a "venue" in its own right, particularly in ML, with some decently cited papers never formally published and "preprints" being cited left and right. Consider the impression you get when seeing a reference to an arXiv preprint vs. a link to an author's institutional website.

In my view, arXiv fulfills its function better the less power it has as an institution, and I thus have exactly zero trust that the split from Cornell is driven by that function. We've seen the kind of appeasement prose from their statement and FAQ [1] countless times before, and it's now time for the usual routine of snapshotting the site to watch the inevitable amendments to the mission statement.

"What positive changes should users expect to see?" - I guess the negative ones we'll have to see for ourselves.

[1] https://tech.cornell.edu/arxiv/

New comment by frankling_ in "Why reinforcement learning plateaus without representation depth (NeurIPS 2025)"

frankling_ — Mon, 19 Jan 2026 01:43:08 +0000

Wow, they finally figured out that it's actually not A, it's B—at least if C.

New comment by frankling_ in "Author-paid publication fees corrupt science and should be abandoned"

frankling_ — Sat, 24 Aug 2024 09:47:58 +0000

If you're able to predict the future with 50% accuracy, you should start filling out some lottery tickets.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Mon, 27 May 2024 08:55:40 +0000

The convolution is approximated via a form of sampling with additional bookkeeping at each encountered branch. How well that scales for deeply branching programs depends on the probabilities of the branches and the diversity in the output on the resulting paths, the worst case being a program where all branches are equally likely and each path generates an entirely different output (as in a hash function, for example). In practice, we've been dealing with problems involving up to tens of thousands of branches or so.

We've haven't done a direct comparison to MCMC approaches yet, but it's on the Todo list. My intuition is that MCMC will win out for problems where finding "just any" local optimum is not good enough.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 20:35:52 +0000

In DiscoGrad, smoothing would be applied by adding Gaussian noise with some configurable variance to x and running the program on those x's. The gradient would then be calculated based on the branch condition's derivative wrt. x (which is 1) and an estimate of the distribution of the condition (which is Gaussian).

In this specific example, the smoothed derivative happens to be exactly the Gaussian cumulative distribution function, so the user could just replace the program with that function. However, for more complex programs, it'd be hard to find such correspondences manually.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 20:13:45 +0000

Yeah, those tricks are highly related to what we do, the main difference being that we don't require a priori information about the distributions involved in the program. Instead, we compute a density estimation of the distribution of branch conditions at runtime, which makes things quite general and fully automatic, at the cost of some accuracy.

As an aside, the combination "known distributions + automation" is covered in the Julia world by stochasticAD (https://github.com/gaurav-arya/StochasticAD.jl).

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 19:56:19 +0000

Well, the most common ML problems can be expressed as optimization over smooth functions (or reformulated that way manually). We might have to convince the ML world that branches do matter :) On the other hand, there are gradient-free approaches that solve problems with jumps in other ways, like many reinforcement learning algorithms, or metaheuristics such as genetic algorithms in simulation-based optimization. The jury's still out on "killer apps" where gradient descent can outperform these approaches reliably, but we're hoping to add to that body of knowledge...

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 19:35:08 +0000

Great point, the sigmoid approximation works well for certain problems and that's in fact what I used in the exploratory papers that lead to this work. The downsides are the lack of a clear interpretation how the original program and its smooth counterpart are related, and the difficulty of controlling the degree of smoothing as programs get longer. What DiscoGrad computes has a statistical interpretation: it's the convolution of the program output with whatever distribution is used for smoothing, typically a Gaussian with a configurable variance.

On top of that, if the program branches on random numbers (which is common in simulations), that suffices for the maths to work out and you get an estimate of the asymptotic gradient (for samples -> infinity) of the original program, without any artificial smoothing.

So in short, I do think it is slightly fancier :)

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 19:16:51 +0000

That's right, plain autodiff just ignores branches. Our canonical "why is this even needed" example is a program like "if (x >= 0) return 1; else return 0", x being the input.

The autodiff derivative of this is zero, wherever you evaluate it, so if you sample x and run your program on each x as in a classical ML setup, you'd be averaging over a series of zero-derivatives. This is of course not helpful to gradient descent. In more complex programs, it's less blatant, but the gist is that just averaging sampled gradients over programs (input-dependent!) branches yields biased or zero-valued derivatives. The traffic light optimization example shown on Github is a more complex example where averaged autodiff-gradients are always zero.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 18:33:16 +0000

Not super closely related: the polytope model (to the degree I'm familiar with it) is used as a representation that facilitates optimization of loop nests. That's optimization in the sense of finding an efficient program.

DiscoGrad deals with (or provides gradients for) mathematical optimization. In our case, the goal is to minimize or maximize the program's numerical output by adjusting it's input parameters. Typically, your C++ program will run somewhat slower with DiscoGrad than without, but you can now use gradient descent to quickly find the best possible input parameters.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 18:22:35 +0000

We're doing something less expensive: essentially, the overall gradient is computed based on certain statistics based on the branch condition and its derivatives when a branch is encountered.

We mention neural networks because DiscoGrad lets you combine branching programs with neural networks (via Torch) and jointly train/optimize them.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 18:15:20 +0000

We actually did some preliminary experiments with Taichi hoping to benefit from the GPU parallelization. I think generally, the world of autodiff tooling is in very good shape. For anything non-exotic, we just use JAX or Torch to get things done quickly and with good performance.

Generally, integrating the ideas behind DiscoGrad into existing frameworks has been on our mind since day one, and the C++ implementation represents a bit of a compromise made to have a lot of flexibility during development while the algorithms were still a moving target, and good performance (albeit without parallelization and GPU support as of yet). Based on DiscoGrad's current incarnation, however, it should not be terribly hard to, say, develop a JAX+DiscoGrad fork and offer some simple "branch-like" abstraction. While we've been looking into this, it can be a bit tricky in a university context to do the engineering leg work required to build something robust...

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 17:45:53 +0000

Enzyme is traditional, but super duper optimized, autodiff, that is, it returns the partial derivatives for one path taken through the program, ignoring other branches. DiscoGrad captures the effects of alternative branches. What's special about enzyme is that the gradient computations benefit from LLVM's optimization passes and language support.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 17:34:28 +0000

Thanks for the kind words! We'd be super happy if this work gets picked up, whether in a commercial context or not.

We were thinking of some disco ball-based logo (among some other designs). With this encouragement, there'll probably be an update in the next few days :)

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 17:30:09 +0000

The key point is that Ceres requires derivatives, which can come from manually derived formulae, approximations via finite differences, or autodiff (http://ceres-solver.org/derivatives.html). DiscoGrad doesn't do the optimization itself (for that, we use gradient descent, for example via Adam), but essentially represents a fourth option to obtain derivatives, and one which captures the branches in an optimization problem (which autodiff doesn't).

While I'm not super familiar with the typical use cases for Ceres, the gradient estimator from DiscoGrad could possibly be integrated to better handle branchy problems.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 17:08:24 +0000

I agree with that intuition. In our experience, it's easiest to see gains over other optimization techniques when the program is "branch-wise smooth and non-constant". Then, we get the full benefits of exact autodiff gradients "per branch", and our smoothing approach handles the branches. For SAT solving and other purely combinatorial problems, sufficiently accurate smoothing may indeed be too expensive. Also, in such problems, the average local minimum found via gradient descent may not always be that great. That said, we're still exploring where the limits really are.

New comment by frankling_ in "Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad"

frankling_ — Sun, 26 May 2024 16:39:38 +0000

Yep, that's exactly it. The smoothness can either come from randomness in the program itself (then the objective function is asymptotically smooth and DiscoGrad estimates the gradient of that smooth function), or the smoothness can be introduced artificially.

As an example, the very first thing we looked into was a transportation engineering problem, where the red/green phases of traffic lights lead to a non-smooth optimization problem. In essence, in that case we were looking for the "best possible" parameters for a transportation simulation (in the form of a C++ program) that's full of branches.

Show HN: Boldly go where Gradient Descent has never gone before with DiscoGrad

frankling_ — Sun, 26 May 2024 12:14:19 +0000

Trying to do gradient descent using automatic differentiation over branchy programs? Or to combine them with neural networks for end-to-end training? Then this might be interesting to you.

We develped DiscoGrad, a tool for automatic differentiation through C++ programs involving input-dependent control flow (e.g., "if (f(x) < c) { ... }", differentiating wrt. x) and randomness. Our initial motivation was to enable the use of gradient descent with simulations, which often rely heavily on such discrete branching. The latter makes plain autodiff mostly useless, since it can only account for the single path taken through the program. Our tool offers several backends that handle this situation, giving useful descent directions for optimization by accounting for alternative branches. Besides simulations, this problem arises in many other places, for example in deep learning when trying to combine imperative programs with neural networks.

In a nutshell, DiscoGrad applies an (LLVM-based) source-to-source transformation to your C++ program, adding some calls to our header library, which then handles the gradient computation. What sets it apart from similar tools/estimators is that it's fully automatic (no need to come up with a differentiable problem formulation/reparametrization) and that the branching condition can be any function of the program inputs (no need to know upfront what distribution the condition follows).

We're currently a team of two working on DiscoGrad as part of a research project, so don't expect to see production-grade code quality, but we do intend for it to be more than a throwaway research prototype. Use cases we've successfully tested include calibrating simulation models of epidemics or evacuation scenarios via gradient descent, and combining simulations with neural networks in an end-to-end trainable fashion.

We hope you find this interesting and useful, and we're happy to answer questions!

Comments URL: https://news.ycombinator.com/item?id=40481578

Points: 232

# Comments: 66