Hacker News: ogrisel

New comment by ogrisel in "Munich 1991: The Roots of the Current AI Boom"

ogrisel — Mon, 22 Jun 2026 09:33:32 +0000

Paul Werbos did not apply backprop to MLPs as cleanly described in Hinton's paper, but rather to some kind of autoregressive non-linear parametrized functions with a much more specific application scope.

Both papers are direct applications of the chain rule applied to estimate the gradient of a multivariate function.

New comment by ogrisel in "Google Antigravity just deleted the contents of whole drive"

ogrisel — Mon, 01 Dec 2025 08:46:08 +0000

How do you deny access to prod credentials from an assistant running on your dev machine assuming you need to store them on that same machine to do manual prod investigation/maintenance work from that machine?

New comment by ogrisel in "Google Antigravity just deleted the contents of whole drive"

ogrisel — Mon, 01 Dec 2025 08:07:03 +0000

When you run Antigravity the first time, it asks you for a profile (I don't remember the exact naming) and you what it entails w.r.t. the level of command execution confirmation is well explained.

New comment by ogrisel in "Google Antigravity just deleted the contents of whole drive"

ogrisel — Mon, 01 Dec 2025 08:05:02 +0000

I think there is far less than 1% chance for this to happen, but there are probably millions of antigravity users at this point, 1 millionths chance of this to happen is already a problem.

We need local sandboxing for FS and network access (e.g. via `cgroups` or similar for non-linux OSes) to run these kinds of tools more safely.

New comment by ogrisel in "Penpot: The Open-Source Figma"

ogrisel — Thu, 27 Nov 2025 14:24:43 +0000

Personally, I do not understand why you think there is a bug from this screen capture alone. Maybe because I am that familiar with penpot and figma, but still, I do not find it obvious.

This is why it's important to describe explicitly the three points in text:

- steps to reproduce;

- what you expected to happen;

- what actual result you observe instead.

Something that might be obvious to you but isn't for others will just be silently ignored most of the time.

EDIT: I now see the problem after reading your other reply above:

https://news.ycombinator.com/item?id=46064757#46069546

This is why it's important to describe explicitly the difference between what you expected and what you observed. I swear I did not see the change in button width before reading the linked comment.

New comment by ogrisel in "Penpot: The Open-Source Figma"

ogrisel — Thu, 27 Nov 2025 14:07:19 +0000

I think it would help to open an issue on github making explicit the following three points explicit in the report:

- steps to reproduce from scratch;

- what you expected to happen;

- what you actually observed (include the screenshot or video capture in addition to a textual description).

Otherwise, you might risk your report being ignored due to a silent misunderstanding about the mismatch between your expectations and the actual results.

New comment by ogrisel in "The first year of free-threaded Python"

ogrisel — Fri, 16 May 2025 13:05:12 +0000

You cannot share arbitrarily structured objects in the `ShareableList`, only atomic scalars and bytes / strings.

If you want to share structured Python objects between instances, you have to pay the cost of `pickle.dump/pickle.dump` (CPU overhead for interprocess communication) + the memory cost of replicated objects in the processes.

New comment by ogrisel in "Probabilistic Artificial Intelligence"

ogrisel — Tue, 11 Mar 2025 10:17:44 +0000

According to the following paper, it's possible to get calibrated confidence scores by directly asking the LLM to verbalize a confidence level, but it strongly depends on how you prompt it to do so:

https://arxiv.org/abs/2412.14737

New comment by ogrisel in "AMD Announces "Instella" Open-Source 3B Language Models"

ogrisel — Thu, 06 Mar 2025 15:07:10 +0000

It appears that they reused a lot of the data preparation provided by the AllenAI team:

https://github.com/allenai/OLMoE

https://github.com/allenai/dolma

https://github.com/AMD-AIG-AIMA/Instella

New comment by ogrisel in "Understanding Reasoning LLMs"

ogrisel — Fri, 07 Feb 2025 08:31:09 +0000

Software Engineering is difficult to verify because it requires dealing with ambiguous understanding of the end-user actual needs / value and subtle trade-offs about code maintainability vs feature coverage vs computational performance.

Algorithmic puzzles, on the other hand, both require reasoning and are easy to verify.

There are other things in coding that are both useful and easy to verify: checking that the generated code follows formatting standards or generating outputs with a specific data schema and so on.

New comment by ogrisel in "Fair Pricing"

ogrisel — Wed, 05 Feb 2025 08:56:56 +0000

Similarly, for paywalled news/journals.

New comment by ogrisel in "Why DeepSeek had to be open source"

ogrisel — Wed, 29 Jan 2025 16:19:36 +0000

It's better to be specific:

- open-source inference code

- open weights (for inference and fine-tuning)

- open pretraining recipe (code + data)

- open fine-tuning recipe (code + data)

Very few entities publish the later two items (https://huggingface.co/blog/smollm and https://allenai.org/olmo come to mind). Arguably, publishing curated large scale pretraining data is very costly but publishing code to automatically curate pretraining data from uncurated sources is already very valuable.

New comment by ogrisel in "DeepSeek could represent Nvidia CEO Jensen Huang's worst nightmare"

ogrisel — Tue, 28 Jan 2025 13:49:20 +0000

I don't understand why it's bad for Nvidia either.

The fact that DeepSeek-R1 is so much better than DeepSeek-V3 at various important tasks means that Chain-of-though / thinking-before-answering models are better. But they are also more compute intensive at inference time than their instruction non-thinking counterparts.

So even if the DeepSeek-V3 pretraining + GRPO COT post-training procedure was cheaper than anticipated to reach o1 grade performance, inference is still costly, even if you use a distilled model.

New comment by ogrisel in "PyPI Blog: Project Quarantine"

ogrisel — Sun, 05 Jan 2025 09:55:06 +0000

Note that it's possible to disable that behavior with `pip install --only-binary :all:`.

This way, pip will fail if a dependency does not provide a `.whl` package, instead of automatically falling back to the "build from source" mode that can lead to arbitrary code execution at install time (via setuptools' `setup.py` or any other build backend mechanism).

However, installing from wheels just protects from arbitrary code execution at install time. If you do not trust the source and integrity of the package you install, you would still be subject to arbitrary code execution at import time.

Therefore, tools and processes to improve package provenance tracing and integrity checking are useful for both kinds of installations.

New comment by ogrisel in "AlphaProof's Greatest Hits"

ogrisel — Mon, 18 Nov 2024 10:25:03 +0000

That should be doable, e.g. by semi-automated curation of the pre-training dataset. However, since curating such large datasets and running pre-training runs is so expensive, I doubt that anybody will run such an experiment. Especially since would have to trust that the curation process was correct enough for the end-result to be meaningful. Checking that the curation process is not flawed is probably as expensive as running it in the first place.

New comment by ogrisel in "State of Python 3.13 performance: Free-threading"

ogrisel — Fri, 08 Nov 2024 09:11:31 +0000

The race condition bugs are typically hidden by different software layers. For instance, we found one that involves OpenBLAS's pthreads-based thread pool management and maybe its scipy bindings:

- https://github.com/scipy/scipy/issues/21479

it might be the same as this one that further involves OpenMP code generated by Cython:

- https://github.com/scikit-learn/scikit-learn/issues/30151

We haven't managed to write minimal reproducers for either of those but as you can observe, those race conditions can only be triggered when composing many independently developed components.

New comment by ogrisel in "State of Python 3.13 performance: Free-threading"

ogrisel — Wed, 06 Nov 2024 09:47:17 +0000

The IPC overhead of process-based parallelism in Python is a pain to deal with in general, even when the underlying computational bottleneck are already written CPU optimized (calls to compiled extensions written in Cython/C/C++/Rust, call to CPU optimized operations written with CPU architecture-specific intrinsics/assembly from OpenBLAS via NumPy/SciPy, calls to e.g. GPU CUDA kernels via PyTorch/Triton, ...).

Sometimes the optimal level of parallelism lies in an outer loop written in Python instead of just relying on the parallelism opportunities of the inner calls written using hardware specific native libraries. Free-threading Python makes it possible to choose which level of parallelism is best for a given workload without having to rewrite everything in a low-level programming language.

New comment by ogrisel in "CuPy: NumPy and SciPy for GPU"

ogrisel — Fri, 20 Sep 2024 15:45:19 +0000

Note that NumPy, CuPy and PyTorch are all involved in the definition of a shared subset of their API:

https://data-apis.org/array-api/

So it's possible to write array API code that consumes arrays from any of those libraries and delegate computation to them without having to explicitly import any of them in your source code.

The only limitation for now is that PyTorch (and to some lower extent cupy as well) array API compliance is still incomplete and in practice one needs to go through this compatibility layer (hopefully temporarily):

https://data-apis.org/array-api-compat/

New comment by ogrisel in "The Cheating Device (ChatGPT on a TI-84) [video]"

ogrisel — Fri, 20 Sep 2024 13:08:17 +0000

It's really already very difficult to write good problem material for evaluations. Having to find a way where difficulty is intermediate for the target audience (not too easy, not too hard) but also too hard for LLMs would be very challenging / impossible for most disciplines.

New comment by ogrisel in "Papermill: Parameterizing, executing, and analyzing Jupyter Notebooks"

ogrisel — Wed, 18 Sep 2024 17:13:19 +0000

With papermill you can parametrize a notebook and run it on different inputs to check that it is not raising uncaught exceptions. This can be wrapped to be part of a pytest test suite, possibly via a some ad-hoc pytest fixture or plugin.

If the notebooks themselves contain assertions to check that expectations on the outputs are met, then you have an automated way to check that the notebooks behave the way you want on some test inputs. For long notebooks, this is more like integration/functional tests rather than unit tests, but I think this is already an improvement over manually run notebooks.

Note sure about strict types: you mean running mypy on a notebook? Maybe this can be helpful:

- https://pypi.org/project/nb-mypy/

About linters, you can install `jupyterlab-lsp` and `python-lsp-ruff` together for instance.