Hacker News: felixge

New comment by felixge in "OpenTelemetry profiles enters public alpha"

felixge — Fri, 27 Mar 2026 19:54:12 +0000

We had reached out to y'all last year to explore taking ideas from your format. It definitely looks interesting! But IIRC nobody from your team ended up making it to one of our SIG meetings?

https://github.com/yandex/perforator/issues/13

New comment by felixge in "OpenTelemetry profiles enters public alpha"

felixge — Thu, 26 Mar 2026 17:13:55 +0000

OTel Profiling SIG maintainer here: I understand your concern, but we’ve tried our best to make things efficient across the protocol and all involved components.

Please let us know if you find any issues with what we are shipping right now.

Optimizing Ruby performance: Observations from real-world services

felixge — Thu, 20 Nov 2025 18:38:14 +0000

Article URL: https://www.datadoghq.com/blog/ruby-performance-optimization/

Comments URL: https://news.ycombinator.com/item?id=45996036

Points: 3

# Comments: 1

New comment by felixge in "Go has added Valgrind support"

felixge — Tue, 23 Sep 2025 14:22:13 +0000

Do you think the GC roots alone (goroutine stacks with goroutine id, package globals) would be enough?

I think in many cases you'd want the reference chains.

The GC could certainly keep track of those, but at the expense of making things slower. My colleagues Nick and Daniel prototyped this at some point [1].

Alternatively the tracing of reference chains can be done on heap dumps, but it requires maintaining a partial replica of the GC in user space, see goref [2] for that approach.

So it's not entirely trivial, but rest assured that it's definitely being considered by the Go project. You can see some discussions related to it here [3].

Disclaimer: I contribute to the Go runtime as part of my job at Datadog. I can't speak on behalf of the Go team.

[1] https://go-review.googlesource.com/c/go/+/552736

[2] https://github.com/cloudwego/goref/blob/main/docs/principle....

[3] https://github.com/golang/go/issues/57175

New comment by felixge in "Go has added Valgrind support"

felixge — Tue, 23 Sep 2025 11:37:06 +0000

I'd love to hear more! What kind of profiling issues are you running into? I'm assuming the inuse memory profiles are sometimes not good enough to track down leaks since they only show the allocation stack traces? Have you tried goref [1]?. What kind of memory pressure issues are you dealing with?

[1] https://github.com/cloudwego/goref

Disclaimer: I work on continuous profiling for Datadog and contribute to the profiling features in the runtime.

New comment by felixge in "Go allocation probe"

felixge — Tue, 22 Jul 2025 18:34:08 +0000

Hacking into the Go runtime with eBPF is definitely fun.

But for a more long term solution in terms of reliability and overhead, it might be worth raising this as a feature request for the Go runtime itself. Type information could be provided via pprof labels on the allocation profiles.

New comment by felixge in "Go Optimization Guide"

felixge — Tue, 01 Apr 2025 14:19:23 +0000

+1. In particular []byte slice allocations are often a significant driver of GC pace while also being relatively easy to optimize (e.g. via sync.Pool reuse).

Don't Clobber the Frame Pointer

felixge — Fri, 03 Jan 2025 06:57:26 +0000

Article URL: https://nsrip.com/posts/clobberfp.html

Comments URL: https://news.ycombinator.com/item?id=42583246

Points: 77

# Comments: 42

Orchestrion: Compile-time auto-instrumentation for Go

felixge — Mon, 09 Dec 2024 16:13:13 +0000

Article URL: https://www.datadoghq.com/blog/go-instrumentation-orchestrion/

Comments URL: https://news.ycombinator.com/item?id=42367528

Points: 3

# Comments: 0

We used Datadog to save $17.5M annually

felixge — Thu, 03 Oct 2024 16:03:34 +0000

Article URL: https://www.datadoghq.com/blog/how-datadog-saved-17-million-dollars/

Comments URL: https://news.ycombinator.com/item?id=41731962

Points: 3

# Comments: 0

New comment by felixge in "Go 1.23 Released"

felixge — Tue, 13 Aug 2024 19:56:09 +0000

This is great, thank you.

The One-Instruction Window

felixge — Mon, 12 Aug 2024 15:48:06 +0000

Article URL: https://nsrip.com/posts/oneinstruction.html

Comments URL: https://news.ycombinator.com/item?id=41225804

Points: 2

# Comments: 0

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 21:06:19 +0000

I'm skeptical that it's worth it myself, this was just a fun research project for me. But once hardware shadow stacks are available, I think this could be great.

To answer your first question: For most Go applications, the average stack trace depth for profiling/execution tracing is below 32 frames. But some applications use heavy middleware layers that can push the average above this limit.

That being said, I think this technique will amortize much earlier when the fixed cost per frame walk is higher, e.g. when using DWARF or gopclntab unwinding. For Go that doesn't really matter while the compiler emits frame pointers. But it's always good to have options when it comes to evolving the compiler and runtime ...

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 15:47:12 +0000

That seems to be windows only? My main target OS is Linux.

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 12:53:44 +0000

Thank you so much, this is very helpful and interesting. I'll try to experiment with this at some point.

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 11:46:26 +0000

Thanks for the reply!

What does the API for accessing the shadow stack from user space look like? I didn't see anything for it in the kernel docs [1].

I agree about the need for switching the shadow stacks in the Go scheduler. But this would probably require an API that is a bit at odds with the security goals of the kernel feature.

I'm not sure I follow your thoughts on CGO and how this would work on older CPUs and kernels.

[1] https://docs.kernel.org/next/x86/shstk.html

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 09:33:53 +0000

I don't think any obvious 10%+ opportunities have been overlooked. Go is optimizing for fast and simple builds, which is a bit at odds with optimal code gen. So I think the biggest opportunity is to use Go implementations that are based on aggressively optimizing compilers such as LLVM and GCC. But those implementations tend to be a few major versions behind and are likely to be less stable than the official compiler.

That being said, I'm sure there are a lot of remaining incremental optimization opportunities that could add up to 10% over time. For example a faster map implementation [1]. I'm sure there is more.

Another recent perf opportunity is using pgo [2] which can get you 10% in some cases. Shameless plug: We recently GA'ed our support for it at Datadog [3].

[1] https://github.com/golang/go/issues/54766 [2] https://go.dev/doc/pgo [3] https://www.datadoghq.com/blog/datadog-pgo-go/

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 07:46:53 +0000

That's what hardware shadow stacks in modern intel/arm CPUs can do! It just needs to be exposed to user space and become widely available.

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 07:25:44 +0000

I know that at least two engineers from the runtime team have seen the post in the #darkarts channel of gopher slack. One of them left a fire emoji :).

I'll probably bring it up in the by-weekly Go runtime diagnostics sync [1] next Thursday, but my guess is that they'll have the same conclusion as me: Neat trick, but not a good idea for the runtime until hardware shadow stacks become widely available and accessible.

[1] https://github.com/golang/go/issues/57175

New comment by felixge in "Fast Shadow Stacks for Go"

felixge — Thu, 30 May 2024 07:10:25 +0000

Thanks! And to answer you question: No, it won't speed up Go programs for now. This was mostly a fun research project for me.

The low hanging fruits to speed up stack unwinding in the Go runtime is to switch to frame pointer unwinding in more places. In go1.21 we contributed patches to do this for the execution tracer. For the upcoming go1.23 release, my colleague Nick contributed patches to upgrade the block and mutex profiler. Once the go1.24 tree opens, we're hoping to tackle the memory profiler as well as copystack. The latter would benefit all Go programs, even those not using profiling. But it's likely going to be relative small win (<= 1%).

Once all of this is done, shadow stacks have the potential to make things even faster. But the problem is that we'll be deeply in diminishing returns territory at that point. Speeding up stack capturing is great when it makes up 80-90% of your overhead (this was the case for the execution tracer before frame pointers). But once we're down to 1-2% (the current situation for the execution tracer), another 8x speedup is not going to buy us much, especially when it has downsides.

The only future in which shadow stacks could speed up real Go programs is one where we decide to drop frame pointer support in the compiler, which could provide 1-2% speedup for all Go programs. Once hardware shadow stacks become widely available and accessible, I think that would be worth considering. But that's likely to be a few years down the road from now.