Hacker News: SnowflakeOnIce

TruffleHog now detects public-key JWTs and verifies them for liveness

SnowflakeOnIce — Tue, 16 Dec 2025 21:12:18 +0000

Article URL: https://trufflesecurity.com/blog/trufflehog-now-detects-jwts-with-public-key-signatures-and-verifies-them-for-liveness

Comments URL: https://news.ycombinator.com/item?id=46294537

Points: 2

# Comments: 0

New comment by SnowflakeOnIce in "GPUHammer: Rowhammer attacks on GPU memories are practical"

SnowflakeOnIce — Wed, 16 Jul 2025 11:36:58 +0000

Suppose you have access to certain memory. If you repeatedly read from that memory, can't you still corrupt/alter the physically adjacent memory you don't have access to? Does it really need to be a write operation you repeatedly perform?

New comment by SnowflakeOnIce in "GPUHammer: Rowhammer attacks on GPU memories are practical"

SnowflakeOnIce — Wed, 16 Jul 2025 02:50:32 +0000

Example: A workstation or consumer GPU used both for rendering the desktop and running some GPGPU thing (like a deep neural network)

New comment by SnowflakeOnIce in "GPUHammer: Rowhammer attacks on GPU memories are practical"

SnowflakeOnIce — Wed, 16 Jul 2025 02:48:33 +0000

How is it a write-only attack vector?

New comment by SnowflakeOnIce in "Strategies for Fast Lexers"

SnowflakeOnIce — Tue, 15 Jul 2025 02:15:33 +0000

> Lexing, parsing and even type checking are interleaved in most C++ compilers due to the ambiguous nature of many construct in the language. > > It is very hard to profile only one of these in isolation. And even with compiler built-in instrumentation, the results are not very representative of the work done behind.

Indeed, precise cost attribution is difficult or impossible due to how the nature of the language imposes structure on industrial computers. But that aside, you still end up easily with hundreds of megabytes of source to deal with in each translation unit. I have so many scars from dealing with that...

New comment by SnowflakeOnIce in "Strategies for Fast Lexers"

SnowflakeOnIce — Mon, 14 Jul 2025 19:30:24 +0000

A simple hello world in C++ can pull in dozens of megabytes of header files.

Years back I worked at a C++ shop with a big codebase (hundreds of millions of LOC when you included vendored dependencies). Compile times there were sometimes dominated by parsing speed! Now, I don't remember the exact breakdown of lexing vs parsing, but I did look at it under a profiler.

It's very easy in C++ projects to structure your code such that you inadvertently cause hundreds of megabytes of sources to be parsed by each single #include. In such a case, lexing and parsing costs can dominate build times. Precompiled headers help, but not enough...

New comment by SnowflakeOnIce in "Introduction to the A* Algorithm (2014)"

SnowflakeOnIce — Wed, 18 Jun 2025 16:49:45 +0000

https://en.m.wikipedia.org/wiki/Simultaneous_localization_an...

New comment by SnowflakeOnIce in "What happens when people don't understand how AI works"

SnowflakeOnIce — Mon, 09 Jun 2025 17:41:41 +0000

> Well, the current generation of LLMs blow away that Turing Test

Maybe a weak version of Turing's test?

Passing the stronger one (from Turing's paper "Computing Machinery and Intelligence") involves an "average interrogator" being unable to distinguish between human and computer after 5 minutes of questioning more than 70% of the time. I've not seen this result published with today's LLMs.

New comment by SnowflakeOnIce in "The behavior of LLMs in hiring decisions: Systemic biases in candidate selection"

SnowflakeOnIce — Tue, 20 May 2025 13:11:40 +0000

A lot of AI-based PDF processing renders the PDF as images and then works directly with that, rather than extracting text from the PDF programmatically. In such systems, text that was hidden for human view would also be hidden for the machine.

Though surely some AI systems do not use PDF image rendering first!

New comment by SnowflakeOnIce in "The power of interning: making a time series database smaller"

SnowflakeOnIce — Mon, 03 Mar 2025 20:33:20 +0000

"String interning" is discussed in the Java Language Spec, going back to the 1990s, and wasn't novel there. It goes back at least to Lisp implementations from the 1970s. Probably even earlier than that.

New comment by SnowflakeOnIce in "The Secret That Colleges Should Stop Keeping"

SnowflakeOnIce — Sun, 23 Feb 2025 15:35:03 +0000

Yes, this was my own experience!

When looking at universities, when I saw a high sticker price, I ignored that university, even if in hindsight I had a good chance of being accepted.

I wish I had had someone when I was young who encouraged me to have broader horizons.

Show HN: Nosey Parker Explorer, a TUI for interactive review of secret exposures

SnowflakeOnIce — Tue, 18 Feb 2025 15:30:11 +0000

Nosey Parker Explorer is an interactive TUI tool for reviewing possible exposed secrets detected by Nosey Parker [1], a fast secrets detector designed for offensive security (e.g., red team engagements). It makes it feasible for one person to triage thousands of findings in a few hours.

How did Nosey Parker Explorer come about? 2-3 years ago I was working on training ML models for tasks related to hardcoded secrets, such as TP/FP classification and detection. These are pretty specific tasks and there are no open-access datasets. So, I had to build my own dataset of several thousand labeled examples.

None of the existing tools for data labeling that I ran across was a good fit for the task. They were all either too general-purpose, too complicated to set up, or too slow. So I built my own proof-of-concept purpose-built terminal-based labeling app using the excellent Textual TUI framework [2] and DuckDB [3] to build fast faceted search. A couple weeks later I had Nosey Parker Explorer.

Nosey Parker Explorer proved very effective at the dataset construction task. My team of a few people used it to label about 15k examples collected from 2TB of input.

However, beyond dataset creation, Nosey Parker Explorer was a huge boon for security engineers on engagements. Once you are dealing with more than a few dozen possible findings, it is _hugely_ helpful to be able to interactively slice-and-dice the data. The largest-scale use of it that I'm aware of was an assumed-breach engagement where we had tens of thousands of potential findings from 20TB of scanned inputs.

Nosey Parker Explorer was far too useful to be left as a proof-of-concept or only used internally. A couple weeks ago I released it under the Apache 2 license.

If you want to try it, It's a Python application (not yet on PyPI). There are prebuilt zipapp releases, but you can also clone the repository and `pip install`. You will first need to use Nosey Parker to scan something. See the project's README for details.

Happy to answer questions.

[1] Nosey Parker: https://github.com/praetorian-inc/noseyparker

[2] Textual: https://textual.textualize.io

[3] DuckDB: https://duckdb.org

Comments URL: https://news.ycombinator.com/item?id=43090633

Points: 3

# Comments: 0

New comment by SnowflakeOnIce in "You can use C-Reduce for any language"

SnowflakeOnIce — Thu, 28 Nov 2024 14:35:21 +0000

About a dozen OSes / configurations were supported, and the entire test suite would take a few days to run on each such configuration.

This was native desktop+server software, not a hosted SaaS thing.

Major releases were put out every few months.

Developers did run tests regularly with every change they would make, but it was infeasible to run all the tests in every configuration for each change. So they would try to choose tests to run that seemed most relevant to the code they had changed.

The entire test suite would run more or less constantly on shared machines, and every few days some new tricky failure would be detected. The tricky failures were almost always the result of some unanticipated interaction of features, frequently on the more obscure configurations (like IBM z/OS).

The problem was not that developers were not testing, but that the tests were infeasible to run every time. So instead, testing became an optimization problem.

New comment by SnowflakeOnIce in "You can use C-Reduce for any language"

SnowflakeOnIce — Thu, 28 Nov 2024 05:35:34 +0000

At a previous job, with a huge C++ codebase and ~100 developers over 20 years, many platforms supported, a build could take hours, and the test suite took so long to run that it would take several days or weeks of real time to get through it all.

This cycle time combined with occasional unexpected interactions between components meant that in every release cycle, there were dozens of complicated failing tests where it was not obvious which code change was responsible.

`bisect` here was extremely helpful: instead of having to pore over commit history and think very hard, I could bisect with a small wrapper script that would build and run the failing test in question. Builds still took hours, but I could usually autonatically pinpoint the responsible code change for one of the tests overnight.

(This was not using Git, but Perforce, for which I had to write `p4-bisect`. Alas, it's not open-source...)

New comment by SnowflakeOnIce in "Greppability is an underrated code metric"

SnowflakeOnIce — Tue, 03 Sep 2024 12:38:40 +0000

When doing appsec review in C or C++, yes!

New comment by SnowflakeOnIce in "GPU utilization can be a misleading metric"

SnowflakeOnIce — Thu, 22 Aug 2024 17:51:10 +0000

> you can get 100% GPU utilization by just reading/writing to memory while doing 0 computations

Indeed! Utilization is a proxy for what you actually want (which is good use of available hardware). 100% GPU utilization doesn't actually indicate this.

On the other hand, if you aren't getting 100% GPU utilization, you aren't making good use of the hardware.

New comment by SnowflakeOnIce in "Classifying all of the pdfs on the internet"

SnowflakeOnIce — Mon, 19 Aug 2024 16:04:06 +0000

The common crawl only pulls documents less than a small limit (1MiB last I checked). Without special handling in this project, bigger documents than that would be missing.

So indeed, not representative of the whole Internet.

New comment by SnowflakeOnIce in "Anyone can access deleted and private repository data on GitHub"

SnowflakeOnIce — Wed, 24 Jul 2024 22:13:28 +0000

'git clone --mirror' seems to pull down lots of additional content also.

New comment by SnowflakeOnIce in "Anyone can access deleted and private repository data on GitHub"

SnowflakeOnIce — Wed, 24 Jul 2024 22:04:28 +0000

There seems to be no such thing as a "private fork" on GitHub in 2024 [1]:

> A fork is a new repository that shares code and visibility settings with the upstream repository. All forks of public repositories are public. You cannot change the visibility of a fork.

[1] https://docs.github.com/en/pull-requests/collaborating-with-...

New comment by SnowflakeOnIce in "Ask HN: Fast data structures for disjoint intervals?"

SnowflakeOnIce — Wed, 24 Jul 2024 03:09:08 +0000

If your ranges end up sparsely distributed, using roaring bitmaps can speed things up a lot.

https://roaringbitmap.org/