Hacker News: pag

New comment by pag in "Tuple Space (2014)"

pag — Fri, 28 Apr 2023 22:19:23 +0000

I implemented something like this! The Dr. Lojekyll [1] datalog system was built around the idea of inputs as receiving messages over time, and publishing outputs (also as messages) or responding to queries (using materialized views). It worked quite well, but it's differential nature ended up being the downfall for the usecase we had in mind.

Specifically, it had no way to "extend" its consistency model out to consumers of its messages, and so if consumers were also producers, then this distributed system as a whole could enter into a state where parts of it are lying to itself!

Otherwise, there were some really fun things you could do with it. For example, we could have per-client databases that would bring themselves up-to-date given the differential outputs of the server database. This would let you engage in a kind of manual sharding of data.

[1] https://www.petergoodman.me/docs/dr-lojekyll.pdf

New comment by pag in "Compilers and IRs: LLVM IR, SPIR-V, and MLIR"

pag — Sun, 30 Oct 2022 18:53:05 +0000

These are great questions! We think our approaches are complimentary. We think that CIR, or ClangIR, can be a codegen target from a combination of our high-level IR and our medium-level IR dialects.

Our understanding of ClangIR is that it has side-stepped the problem of trying to relate/map high-level values/types to low-level values/types -- a process which the Clang codegen brings about when generating LLVM IR. We care about explicitly representing this mapping so that there is data flow from a low-level (e.g. LLVM) representation all the way back up to a high-level. There's a lot of value and implied semantics in high-level representations that is lost to Clang's codegen, and thus to the Clang IR codegen. The distinction between `time_t` and `int` is an example of this. We would like to be able to see an `i32` in LLVM and follow it back to a `time_t` in our high-level dialect. This is not a problem that ClangIR sets out to solve. Thus, ClangIR is too low level to achieve some of our goals, but it is also at the right level to achieve some of our other goals.

New comment by pag in "Show HN: Symbolica – Try our symbolic code executor in the browser"

pag — Thu, 09 Sep 2021 02:48:36 +0000

Can you explain how your offering compares with open-source alternatives, such as KLEE [1] and DeepState [2]?

P.S. your logo is surprisingly similar to that of ForAllSecure [3], which also provides a symbolic execution engine called Mayhem [4].

[1] https://klee.github.io/ (online example: http://klee.doc.ic.ac.uk/)

[2] https://github.com/trailofbits/deepstate#readme

[3] https://forallsecure.com/

[4] https://users.ece.cmu.edu/~aavgerin/papers/mayhem-oakland-12...

New comment by pag in "Go: Fuzzing Is Beta Ready"

pag — Fri, 04 Jun 2021 15:31:25 +0000

DeepState [1] is a tool that lets you write Google Test-style unit tests, as well as property tests, in either C or C++, and plug in fuzzers and symbolic executors. That is, DeepState bridges this gap between fuzz testing and property testing.

[1] https://github.com/trailofbits/deepstate

New comment by pag in "Capstone Disassembler Framework"

pag — Sat, 06 Mar 2021 02:46:01 +0000

It sounds like what you want is Mishegos [1], described here [2]. In fact, this work shows that Capstone performs rather poorly compared to more permissively licensed open-source competitors such as Intel XED and Zydis. Really, if your focus is x86, then Capstone is not the right choice. If you need cross-architecture support, then perhaps Capstone is the right choice.

[1] https://github.com/trailofbits/mishegos

[2] https://blog.trailofbits.com/2019/10/31/destroying-x86_64-in...

New comment by pag in "Warren Abstract Machine"

pag — Thu, 12 Mar 2020 21:59:02 +0000

And lets not forget that Rust's borrow checker is implemented with Datalog [1].

[1] https://smallcultfollowing.com/babysteps/blog/2018/04/27/an-...

New comment by pag in "Warren Abstract Machine"

pag — Thu, 12 Mar 2020 21:57:27 +0000

Datalog also comes up in cluster management / provisioning. For example: Differential Datalog [1].

[1] https://github.com/vmware/differential-datalog

New comment by pag in "Warren Abstract Machine"

pag — Thu, 12 Mar 2020 21:55:21 +0000

Datalog and variants of it come up a bunch in program analysis and software security. For example, GitHub/Microsoft acquired a company, Semmle [1], that has an object-oriented query language that compiles down into Datalog, and can be used to query source code in interesting ways. Souffle [2] comes up in static analysis as well, and is used in systems such as Doop [3]. A kind of predecessor to Doop is bddbddb [4].

Souufle has been used in Ddisasm [5] for disassembling binaries. XSB Prolog has been used in OOAnalyzer [6] for inferring class hierarchies and virtual tables in binaries as well.

I myself spend some time working on datalog compilation for program analysis and decompilation :-)

[1] https://semmle.com

[2] https://souffle-lang.github.io/index.html

[3] https://bitbucket.org/yanniss/doop/src/master

[4] https://suif.stanford.edu/bddbddb

[5] https://github.com/GrammaTech/ddisasm

[6] https://resources.sei.cmu.edu/library/asset-view.cfm?assetid...

New comment by pag in "Binary Symbolic Execution with KLEE-Native"

pag — Fri, 30 Aug 2019 20:28:36 +0000

If you can modify the kernel, you could modify binfmt_elf.c in the case of the Linux kernel, and log out the load address to dmesg or something like that.

Another alternative, which I believe are both options available with PIN and DynamoRIO, is to implement or use an existing ELF loader, forking yourself, and loading in the target binary of interest at a known location.

New comment by pag in "Binary Symbolic Execution with KLEE-Native"

pag — Fri, 30 Aug 2019 20:26:09 +0000

If you're inside the process, as the LD_PRELOAD interceptor mentioned in the article is, then in theory you could try to get the address of the binary's main function or some other symbol that you expect to be exported, and scan down in until you find the lowest mapped page, and then store that address somewhere that you could know of in advance. You could scan down and detect the first unmapped page by clever use of system calls that return EFAULT as a possible return code.

New comment by pag in "Wrapper’s Delight"

pag — Mon, 26 Aug 2019 18:18:43 +0000

This is a really cool SQLite wrapper. One of these things this wrapper lets you do is bind C++ functions/lambdas to ones in SQLite. You can use that functionality in things like SQLite expression indexes, or for customizing a full-text indexer.

One cool example of expression indexes is this: you want to store structured data (e.g. a protobuf) 'natively' in an SQL database, but still give the database visibility into the data stored in that serialized protobuf. You can create an expression index with functions that can read into the protodbuf data and give SQLite that visibility. Another benefit of things like expression indexes is storing comrpessed data. E.g. you want to store a big blob of compressed data, and give SQLite some access to it, but not access to all of it.

In the near future, we'll be transitioning another open-source Trail of Bits project, McSema [1], to using this wrapper. McSema currently stores lots of data in protocol buffers, which isn't very convenient for some reasons. Transitioning over to an SQLite database, hidden behind an API, is going to make things a lot nicer.

[1] https://github.com/trailofbits/mcsema

New comment by pag in "A Survey of Symbolic Execution Techniques (2018)"

pag — Mon, 27 May 2019 14:50:13 +0000

Hopefully we'll have a Dockerfile soon that gets things all set up :-D

New comment by pag in "A Survey of Symbolic Execution Techniques (2018)"

pag — Mon, 27 May 2019 14:49:38 +0000

Nope :-P

New comment by pag in "A Survey of Symbolic Execution Techniques (2018)"

pag — Tue, 21 May 2019 23:35:13 +0000

By the way, did you know that DeepState is actually pure C, and that all the C++ is just window dressing on top? This means that you could feasibly integrate components of it in Ada, or call to those components in Ada, assuming that you haven't already produced your own variants of things.

New comment by pag in "A Survey of Symbolic Execution Techniques (2018)"

pag — Tue, 21 May 2019 23:29:51 +0000

There are a number of lifters. The McSema repo has a table detailing the features of most of them.

Glad that DeepState had an impact on you :-D We continue to evolve DeepState, both in the direction of better fuzzing, and better test case reduction.

Remill is instruction granularity, and so all it requires is raw bytes. McSema uses Remill in conjunction with a disassembly frontend (IDA Pro, Binary Ninja, or Dyninst).

If you have source code you can likely be more precise/efficient. Sometimes you may have access to source but not the ability to change/influence the build.

I think there's a lot of room for improvement with KLEE. If I were to write an LLVM symbolic executor from scratch then I think I would do some things differently.

New comment by pag in "A Survey of Symbolic Execution Techniques (2018)"

pag — Tue, 21 May 2019 17:05:17 +0000

We at Trail of Bits are also working on "klee-native," a port of KLEE using Remill to dynamically lift and emulate binaries. This project is very early on, though.

New comment by pag in "An optimization guide for assembly programmers and compiler makers (2018) [pdf]"

pag — Tue, 12 Feb 2019 02:54:46 +0000

I found an old paper I wrote about my first DBT, and it has a nice figure about how I implemented indirect branch resolution: https://imgur.com/a/dCpY7V2. Feel free to ping me at pag on the Empire Hacking slack if you like talking about this stuff :-D

New comment by pag in "An optimization guide for assembly programmers and compiler makers (2018) [pdf]"

pag — Tue, 12 Feb 2019 02:42:14 +0000

Take a look at the various block chaining approaches of dynamic binary translators (DynamoRIO, PIN, etc.). There is a lot of work done in this space.

New comment by pag in "Fuzzing an API with DeepState"

pag — Tue, 22 Jan 2019 17:34:38 +0000

DeepState provides a Google Test-compatible interface to writing C++ unit tests; however, underneath it all, it is really a C unit testing framework. That is the reason for some of the strange naming of functions like DeepState_Int: these are the underlying C interfaces. If you're using C++, you can choose to use Symbolic or symbolic_int. However, if your codebase is pure C, then have no fear, DeepState can still help you!

New comment by pag in "Stensal SDK: Retrofitting C/C++ code with quasi-memory-safety"

pag — Mon, 14 Jan 2019 05:47:33 +0000

I think you could produce much better machine code by "templatizing" the stack allocations into function-specific pattern variables, passed to a single alloca-like function which uses the template to figure out how much stack space to allocate, displaces the stack pointer, and sets up all your metadata in one swoop. Also, this would improve the upgradability of the runtime/metadata, as it would be decoupled.