Hacker News: mattbit

Confidence Sets, Confidence Intervals

mattbit — Fri, 12 Jun 2026 16:11:50 +0000

Article URL: https://bactra.org/notebooks/confidence-sets.html

Comments URL: https://news.ycombinator.com/item?id=48505947

Points: 1

# Comments: 0

Ear Training Practice

mattbit — Mon, 08 Jun 2026 16:38:07 +0000

Article URL: https://tonedear.com/

Comments URL: https://news.ycombinator.com/item?id=48447598

Points: 331

# Comments: 128

Recommended Mystery Novels

mattbit — Sat, 06 Jun 2026 06:18:16 +0000

Article URL: https://bactra.org/notebooks/mystery-recs.html

Comments URL: https://news.ycombinator.com/item?id=48421936

Points: 4

# Comments: 0

StereoTales: Multilingual Open-Ended Stereotype Discovery in LLMs

mattbit — Thu, 04 Jun 2026 09:10:46 +0000

Article URL: https://research.giskard.ai/blog/stereotales/

Comments URL: https://news.ycombinator.com/item?id=48396096

Points: 2

# Comments: 1

OpenClaw security vulnerabilities include data leakage, prompt injection risks

mattbit — Mon, 09 Feb 2026 09:28:38 +0000

Article URL: https://www.giskard.ai/knowledge/openclaw-security-vulnerabilities-include-data-leakage-and-prompt-injection-risks

Comments URL: https://news.ycombinator.com/item?id=46943350

Points: 2

# Comments: 1

Redis LangCache

mattbit — Thu, 23 Oct 2025 09:58:36 +0000

Article URL: https://redis.io/langcache/

Comments URL: https://news.ycombinator.com/item?id=45680135

Points: 3

# Comments: 0

NLP Models Think about Gender Stereotypes

mattbit — Mon, 05 Feb 2024 18:59:23 +0000

Article URL: https://www.opensamizdat.com/posts/gest/

Comments URL: https://news.ycombinator.com/item?id=39265369

Points: 1

# Comments: 0

New comment by mattbit in "Purple Llama: Towards open trust and safety in generative AI"

mattbit — Thu, 07 Dec 2023 23:01:15 +0000

From my experience, in a majority of real-world LLMs applications, prompt injection is not a primary concern.

The systems that I see most commonly deployed in practice are chatbots that use retrieval-augmented generation. These chatbots are typically very constrained: they can't use the internet, they can't execute tools, and essentially just serve as an interface to non-confidential knowledge bases.

While abuse through prompt injection is possible, its impact is limited. Leaking the prompt is just uninteresting, and hijacking the system to freeload on the LLM could be a thing, but it's easily addressable by rate limiting or other relatively simple techniques.

In many cases, for a company is much more dangerous if their chatbot produces toxic/wrong/inappropriate answers. Think of an e-commerce chatbot that gives false information about refund conditions, or an educational bot that starts exposing children to violent content. These situations can be a hugely problematic from a legal and reputational standpoint.

The fact that some nerd, with some crafty and intricate prompts, intentionally manages to get some weird answer out of the LLM is almost always secondary with respect to the above issues.

However, I think the criticism is legitimate: one reason we are limited to such dumb applications of LLMs is precisely because we have not solved prompt injection, and deploying a more powerful LLM-based system would be too risky. Solving that issue could unlock a lot of the currently unexploited potential of LLMs.

New comment by mattbit in "The Dunning-Kruger effect is autocorrelation"

mattbit — Sun, 26 Nov 2023 09:16:48 +0000

This is how McIntosh & Della Sala put it:

> in the academic literature, it has been suggested that the signature pattern of the DKE (Figure 1A) might be nothing more than a statistical artefact. In a typical study, people’s tendencies to under- or overestimation are analysed as a function of their ability for the task. This involves a ‘double dipping’ into the data because the task performance score is used once to rank people for ability, and then again to determine whether the self-estimate is an under- or over-estimate. This dubious double-dipping makes the analysis prone to a slippery statistical phenomenon called ‘regression to the mean’.

New comment by mattbit in "The Dunning-Kruger effect is autocorrelation"

mattbit — Sun, 26 Nov 2023 09:11:08 +0000

This is not ‘autocorrelation’, it is regression to the mean. I find the article unclear and imprecise. For those interested in a better overview of the Dunning–Kruger effect, I recommend this short article by McIntosh & Della Sala instead:

https://www.bps.org.uk/psychologist/persistent-irony-dunning...

OWASP Top 10 for LLM Applications

mattbit — Mon, 06 Nov 2023 15:43:11 +0000

Article URL: https://llmtop10.com/

Comments URL: https://news.ycombinator.com/item?id=38164057

Points: 2

# Comments: 0

New comment by mattbit in "Adding water to Martian soil samples might have been a bad idea"

mattbit — Fri, 25 Aug 2023 05:18:56 +0000

There’s actually two ways of seeing this: the three-domain system and the two-domain system.

The three-domain system divides life in Archaea, Bacteria, and Eukarya. In this system, Archaea and Bacteria can be grouped together as prokaryotes.

In the two-domain system, the division is between Archaea and Bacteria. In this case, eukaryotes are seen as a subgroup of Archaea.

Hope to have cleared up some of the confusion.

Don’t you (forget NLP): Prompt injection with control characters in ChatGPT

mattbit — Mon, 07 Aug 2023 16:48:04 +0000

Article URL: https://dropbox.tech/machine-learning/prompt-injection-with-control-characters-openai-chatgpt-llm

Comments URL: https://news.ycombinator.com/item?id=37037021

Points: 1

# Comments: 0

New comment by mattbit in "Show HN: Giskard – Testing framework dedicated to LLMs and Tabular ML models"

mattbit — Fri, 16 Jun 2023 14:31:16 +0000

Hey, Giskard team member here! I am around to discuss and read your feedback.

I’ve worked in particular on automatic scanning of ML models for bugs and problems, the idea was to systematically scan for general issues and automatically find segments of data on which the model performs worse than average. If you have questions, I am happy to discuss here.

Show HN: Python library to scan ML models for vulnerabilities

mattbit — Tue, 13 Jun 2023 12:20:21 +0000

Hi! I’ve been working on this automatic scanner for ML models to detect issues like underperforming data slices, overconfidence in predictions, robustness problems, and others. It supports all main Python ML frameworks (sklearn, torch, xgboost, …) and integrates with the quality assurance solution we are building at Giskard AI (https://giskard.ai) to systematically test models before putting them in production.

It is still a beta and I would love to hear your feedback if you have the time to try it out.

We have quite a few tutorials in the docs with ready-made colab notebooks to make it easy to get started.

If you are interested in the code:

https://github.com/Giskard-AI/giskard/tree/main/python-clien...

Comments URL: https://news.ycombinator.com/item?id=36309166

Points: 20

# Comments: 1

New comment by mattbit in "SafeGPT: New tool to detect LLMs' hallucinations, biases and privacy issues"

mattbit — Fri, 21 Apr 2023 16:16:03 +0000

Not exactly, metamorphic testing does not need an oracle. That’s actually the reason of its popularity in ML testing. It works by perturbing the input in a way that will produce a predictable variation of the output (or possibly no variation).

Take for example a credit scoring model: you can reasonably expect that if you increase the liquidity, the credit score should not decrease. In general it is relatively easy to come up with a set of assumptions on the effect of perturbation, which allows evaluating the robustness of a model without knowing the exact ground truth.

New comment by mattbit in "The future of deep learning is photonic"

mattbit — Mon, 05 Jul 2021 20:43:42 +0000

Yeah, LightOn has been doing this for years already, it's kind of strange that there was no mention of them in the article. I know them because their offices are close to the research center where I work (in Paris, France). If I'm not wrong they were planning to offer their optical processor as a cloud service too.

New comment by mattbit in "How much math you need for programming (2014)"

mattbit — Sat, 26 Dec 2020 11:27:24 +0000

> Now what does does a mathematician do? He tries to understand nature and uses mathematics as a language to do that.

I would argue that this is wrong. That's what physicists do, not mathematicians. Mathematics is about abstract ideas, which can live regardless of nature or application. Physics instead is about understanding nature. Most physicists use mathematics to do that, but that's just for practical reasons. They don't always take it for granted, there's a very famous article by Eugene Wigner on this: “The Unreasonable Effectiveness of Mathematics in the Natural Sciences”.

I think it's important to understand this. Sure, computer programming is not math. Physics is not math either. Mathematics is kind of a way of thinking, and mathematical language turns out to be very useful in describing and understanding nature and many other things. Computer programming theory stems out of mathematics, but I agree that everyday programming practice does not strictly require an in-depth math knowledge.

But it depends. One day you wake up and you want to solve a problem: sometimes you need programming, sometimes you need math, sometimes both, or maybe you need some business experience, psychology, whatever. We need different perspectives, I don't think we can compartmentalize these things any more.

New comment by mattbit in "Chicago Police Department shuts down its arrest API"

mattbit — Thu, 09 Jul 2020 20:29:14 +0000

“Obtaining structured data from this system is costly and time-consuming due to the need to tell the system you’re not a robot every few searches. It also doesn’t include some key categories of information formerly available via the API, including the primary charge, bond hearing dates, IR number, and the FBI code associated with the type of arrest.”

New comment by mattbit in "An update on uBlock"

mattbit — Sat, 28 Jul 2018 18:12:32 +0000

It seems like they are just trying to steal the name of uBlock. AdBlock CEO says they “do love the name” [0] and are "investing heavily" into the product, but the commits in the repo [1] are just cosmetic changes and rebranding. Oddly enough, the only active committer is anonymous.

I wonder if this will cause legal troubles to the true uBlock Origin.

[0]: https://twitter.com/judemaier/status/1020034358558670848 [1]: https://github.com/uBlock-LLC/uBlock/commits/master