Hacker News: 0d0a

New comment by 0d0a in "Zlib visualizer"

0d0a — Mon, 29 Sep 2025 06:49:14 +0000

Exactly, it misses out on explaining how the fixed Huffman table is interpreted to apply symbol and distance codes, or how dynamic tables are derived from the input itself. Sure it's the hardest part, but also the more interesting to visualize. As another commenter pointed out, we are just left with mysterious bit sequences for these codes.

It would be cool if we could supply our own Huffman table and see how that affects the stream itself. We might want to put our text right there! https://github.com/nevesnunes/deflate-frolicking?tab=readme-...

New comment by 0d0a in "Dissecting the gzip format (2011)"

0d0a — Wed, 18 Sep 2024 05:14:31 +0000

Even if it doesn't use block-based compression, if there isn't a huge range of corrupted bytes, corruption offsets are usually identifiable, as you will quickly end up with invalid length-distance pairs and similar errors. Although, errors might be reported a few bytes after the actual corruption.

I was motivated some years ago to try recovering from these errors [1] when I was handling a DEFLATE compressed JSON file, where there seemed to be a single corrupted byte every dozen or so bytes in the stream. It looked like something you could recover from. If you output decompressed bytes as the stream was parsed, you could clearly see a prefix of the original JSON being recovered up to the first corruption.

In that case the decompressed payload was plaintext, but even with a binary format, something like kaitai-struct might give you an invalid offset to work from.

For these localized corruptions, it's possible to just bruteforce one or two bytes along this range, and reliably fix the DEFLATE stream. Not really doable once we are talking about a sequence of four or more corrupted bytes.

[1]: https://github.com/nevesnunes/deflate-frolicking

Show HN: Z80 Sans

0d0a — Sun, 25 Aug 2024 21:58:05 +0000

Article URL: https://github.com/nevesnunes/z80-sans

Comments URL: https://news.ycombinator.com/item?id=41351829

Points: 65

# Comments: 13

New comment by 0d0a in "How fast are Linux pipes anyway? (2022)"

0d0a — Fri, 06 Oct 2023 07:47:34 +0000

Can you give some examples? When batching data, you benefit from picking something like io_uring. But for two-way communication, you still need to notify either side when data is ready (maybe you don't want to consume cpu just polling), and it isn't clear to me how those options handle that synchronization faster than pipes.

New comment by 0d0a in "Show HN: Ghidra Plays Mario"

0d0a — Sat, 16 Sep 2023 18:57:14 +0000

Nice, I'll give it a closer look. My only concern so far is memory hooking (still needed for hardware registers), which on Java side was called by FilteredMemoryState [1]. In memstate.cc it looks like just the simpler MemoryState is implemented [2], and there's no equivalent to MemoryAccessFilter. But it might not be that complicated to add...

[1]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...

[2]: https://github.com/NationalSecurityAgency/ghidra/blob/4561e8...

New comment by 0d0a in "Show HN: Ghidra Plays Mario"

0d0a — Mon, 11 Sep 2023 20:37:19 +0000

I think it's closer to the 2A03. Unless I missed something, there isn't any support implemented for binary-coded decimal mode.

New comment by 0d0a in "Show HN: Ghidra Plays Mario"

0d0a — Mon, 11 Sep 2023 20:33:57 +0000

Thanks, but I think I'm going to disappoint you: the demo is using pre-recorded manual inputs, which are then replayed when emulating in Ghidra. The only logic involved is checking when we are at the right instruction to then send the input. I mentioned it briefly in the README but maybe I wasn't very clear, sorry!

New comment by 0d0a in "Show HN: Ghidra Plays Mario"

0d0a — Mon, 11 Sep 2023 20:28:03 +0000

From what I've seen, it's usually read at the vblank interrupt.

The input recording has entries in format " ". If I press a button and it's read from the hardware register after let's say 0x1000 instructions have been stepped, it is stored as "0x1000 0x80", and in the Ghidra emulator script, I only need to count up to 0x1000 instructions before I send that memory write to the other emulator. While the real timings are vastly different, the input will be read after roughly the same number of vblank calls. I say "roughly" because indeed I found a differential on the expected call where it should be read, but it isn't yet clear if that's a logic bug on my side, I'll have to eventually look into it again.

New comment by 0d0a in "Show HN: Ghidra Plays Mario"

0d0a — Mon, 11 Sep 2023 20:04:03 +0000

Nice to see another CTF enjoyer :) I've always thought about using Ghidra for vm challenges, but I'm still not sure if it fits the typical timeframe. Although I never used it, something like binja seems more favourable to quick and dirty scripting.

About custom pcodeops, yeah I was really tempted to use them for TLCS-900. For example, instruction `daa` adjusts the execution result of an add or subtract as binary-coded decimal, and the pcode for that is just inglorious (but I'm sure there's worse out there): https://github.com/nevesnunes/ghidra-tlcs900h/blob/5ff4eb851...

Pretty amusing how a single instruction takes more than a dozen lines in the decompilation: https://gist.github.com/nevesnunes/7417e8bec2cddfcaf8d7653c9...

Show HN: Ghidra Plays Mario

0d0a — Sat, 09 Sep 2023 12:42:31 +0000

I've been exploring new ways of testing Ghidra processor modules. In this repo, I was able to emulate NES ROMs in Ghidra to test its 6502 specification, which resulted in finding and fixing some bugs.

Context: Ghidra is used for reverse engineering binary executables, complementing the usual disassembly view with function decompilation. Each supported architecture has a SLEIGH specification, which provides semantics for parsing and emulating instructions, not unlike the dispatch handlers you would find in interpreters written for console emulators.

Emulator devs have long had extensive test ROMs for popular consoles, but Ghidra only provides CPU emulation, so it can't run them without additional setup. What I did here is bridge the gap: by modifying a console emulator to instead delegate CPU execution to Ghidra, we can now use these same ROMs to validate Ghidra processor modules.

Previously [1], I went with a trace log diffing approach, where any hardware specific behaviour that affected CPU execution was also encoded in trace logs. However, it required writing hardware specific logic, and is still not complete. With the delegation approach, most of this effort is avoided, since it's easier to hook and delegate memory accesses.

I plan on continuing research in this space and generalizing my approaches, since it shows potencial for complementing existing test coverage provided by pcodetest. If a simple architecture like 6502 had a few bugs, who knows how many are in more complex architectures! I wasn't able to find similar attempts (outside of diffing and coverage analysis from trace logs), please let me know if I missed something, and any suggestions for improvements.

[1]: https://github.com/nevesnunes/ghidra-tlcs900h#emulation

Comments URL: https://news.ycombinator.com/item?id=37444683

Points: 207

# Comments: 29

New comment by 0d0a in "Ask HN: Could you share your personal blog here?"

0d0a — Wed, 05 Jul 2023 21:19:41 +0000

https://nevesnunes.github.io/blog/

Reverse engineering, debugging, and some silly contraptions.

New comment by 0d0a in "NSA Ghidra software reverse engineering framework"

0d0a — Thu, 30 Mar 2023 08:09:49 +0000

Oh nice, it wasn't clear from the test suite if that was the case, I'll give it a closer look.

Judging from the python scripts, it seems to expect a whole binutils toolchain (so not just compiler but also objdump, readelf...) and that would be a blocker for me.

New comment by 0d0a in "NSA Ghidra software reverse engineering framework"

0d0a — Wed, 29 Mar 2023 08:27:29 +0000

I'm also writing a processor module, and reading this is a bit encouraging to eventually write about it once it's finished.

Getting off the ground wasn't the hardest part so far. You can just pick the skeleton module that already comes with Ghidra, then lookup some existing simpler modules like the one for z80 to figure out how instructions are put together. You also have the script `DebugSleighInstructionParse` to check how bits are being decoded, very useful when you screw up some instruction definitions.

Unfortunately, you bump into a lot of jargon heavy error messages. The first time you hear about "Interior ellipsis in pattern", you sure have no idea what's that about. Now repeat that experience for several messages.

Then the hardest challenge is how to even test the module outside of some quick disassemblies. There's `pcodetest` but the setup is cumbersome and it seems more about validating instruction decoding rather than semantics. I might just write my own validation using pcode emulation and compare the register state against another emulator's instruction trace...

New comment by 0d0a in "NSA Ghidra software reverse engineering framework"

0d0a — Wed, 29 Mar 2023 08:17:14 +0000

> There's a discouragment the comes in the RE community that to be useful at all you need to be able to write your own exotic packer decoders

Unless you are talking about obfuscated / virtualized payloads, isn't it common to just "cheat" by running it in an emulator / debugger, then taking the unpacked code section from memory and work from there? It was the approach I took in a CTF task: https://nevesnunes.github.io/blog/2021/10/03/CTF-Writeup-TSG...

No Thumbnails for You

0d0a — Sun, 13 Nov 2022 17:45:16 +0000

Article URL: https://nevesnunes.github.io/blog/2022/11/12/No-thumbnails-for-you.html

Comments URL: https://news.ycombinator.com/item?id=33585320

Points: 3

# Comments: 0

New comment by 0d0a in "Decompressing a Gzip File by Hand"

0d0a — Thu, 25 Nov 2021 09:19:26 +0000

This tool served as a base for a fun project of mine [0], which consisted on injecting arbitrary bytes to a deflate stream without affecting the decompression output.

The main idea is that a stream is composed of blocks, and dynamic blocks can contain Huffman tables followed by a end-of-block symbol. Since there's no other symbols, such blocks won't produce any output when decompressed. So there's a few bytes for those tables that can be changed (e.g. to put a little ascii signature). However, this affects the codes present in the tables, which must pass some validations (for decompressors to unambiguously match parsed bits to codes). I used a SMT solver to figure out values for the remaining bytes, so that the tables had valid code counts.

[0]: https://nevesnunes.github.io/blog/2021/09/29/Decompression-M...