<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: felixge</title><link>https://news.ycombinator.com/user?id=felixge</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 25 Jun 2026 10:38:40 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=felixge" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by felixge in "OpenTelemetry profiles enters public alpha"]]></title><description><![CDATA[
<p>We had reached out to y'all last year to explore taking ideas from your format. It definitely looks interesting! But IIRC nobody from your team ended up making it to one of our SIG meetings?<p><a href="https://github.com/yandex/perforator/issues/13" rel="nofollow">https://github.com/yandex/perforator/issues/13</a></p>
]]></description><pubDate>Fri, 27 Mar 2026 19:54:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47547408</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=47547408</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47547408</guid></item><item><title><![CDATA[New comment by felixge in "OpenTelemetry profiles enters public alpha"]]></title><description><![CDATA[
<p>OTel Profiling SIG maintainer here: I understand your concern, but we’ve tried our best to make things efficient across the protocol and all
involved components.<p>Please let us know if you find any issues with what we are shipping right now.</p>
]]></description><pubDate>Thu, 26 Mar 2026 17:13:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47533056</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=47533056</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47533056</guid></item><item><title><![CDATA[Optimizing Ruby performance: Observations from real-world services]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.datadoghq.com/blog/ruby-performance-optimization/">https://www.datadoghq.com/blog/ruby-performance-optimization/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45996036">https://news.ycombinator.com/item?id=45996036</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 20 Nov 2025 18:38:14 +0000</pubDate><link>https://www.datadoghq.com/blog/ruby-performance-optimization/</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=45996036</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45996036</guid></item><item><title><![CDATA[New comment by felixge in "Go has added Valgrind support"]]></title><description><![CDATA[
<p>Do you think the GC roots alone (goroutine stacks with goroutine id, package globals) would be enough?<p>I think in many cases you'd want the reference chains.<p>The GC could certainly keep track of those, but at the expense of making things slower. My colleagues Nick and Daniel prototyped this at some point [1].<p>Alternatively the tracing of reference chains can be done on heap dumps, but it requires maintaining a partial replica of the GC in user space, see goref [2] for that approach.<p>So it's not entirely trivial, but rest assured that it's definitely being considered by the Go project. You can see some discussions related to it here [3].<p>Disclaimer: I contribute to the Go runtime as part of my job at Datadog. I can't speak on behalf of the Go team.<p>[1] <a href="https://go-review.googlesource.com/c/go/+/552736" rel="nofollow">https://go-review.googlesource.com/c/go/+/552736</a><p>[2] <a href="https://github.com/cloudwego/goref/blob/main/docs/principle.md" rel="nofollow">https://github.com/cloudwego/goref/blob/main/docs/principle....</a><p>[3] <a href="https://github.com/golang/go/issues/57175" rel="nofollow">https://github.com/golang/go/issues/57175</a></p>
]]></description><pubDate>Tue, 23 Sep 2025 14:22:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=45347449</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=45347449</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45347449</guid></item><item><title><![CDATA[New comment by felixge in "Go has added Valgrind support"]]></title><description><![CDATA[
<p>I'd love to hear more! What kind of profiling issues are you running into? I'm assuming the inuse memory profiles are sometimes not good enough to track down leaks since they only show the allocation stack traces? Have you tried goref [1]?. What kind of memory pressure issues are you dealing with?<p>[1] <a href="https://github.com/cloudwego/goref" rel="nofollow">https://github.com/cloudwego/goref</a><p>Disclaimer: I work on continuous profiling for Datadog and contribute to the profiling features in the runtime.</p>
]]></description><pubDate>Tue, 23 Sep 2025 11:37:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=45345569</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=45345569</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45345569</guid></item><item><title><![CDATA[New comment by felixge in "Go allocation probe"]]></title><description><![CDATA[
<p>Hacking into the Go runtime with eBPF is definitely fun.<p>But for a more long term solution in terms of reliability and overhead, it might be worth raising this as a feature request for the Go runtime itself. Type information could be provided via pprof labels on the allocation profiles.</p>
]]></description><pubDate>Tue, 22 Jul 2025 18:34:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44651227</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=44651227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44651227</guid></item><item><title><![CDATA[New comment by felixge in "Go Optimization Guide"]]></title><description><![CDATA[
<p>+1. In particular []byte slice allocations are often a significant driver of GC pace while also being relatively easy to optimize (e.g. via sync.Pool reuse).</p>
]]></description><pubDate>Tue, 01 Apr 2025 14:19:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43547083</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=43547083</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43547083</guid></item><item><title><![CDATA[Don't Clobber the Frame Pointer]]></title><description><![CDATA[
<p>Article URL: <a href="https://nsrip.com/posts/clobberfp.html">https://nsrip.com/posts/clobberfp.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42583246">https://news.ycombinator.com/item?id=42583246</a></p>
<p>Points: 77</p>
<p># Comments: 42</p>
]]></description><pubDate>Fri, 03 Jan 2025 06:57:26 +0000</pubDate><link>https://nsrip.com/posts/clobberfp.html</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=42583246</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42583246</guid></item><item><title><![CDATA[Orchestrion: Compile-time auto-instrumentation for Go]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.datadoghq.com/blog/go-instrumentation-orchestrion/">https://www.datadoghq.com/blog/go-instrumentation-orchestrion/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42367528">https://news.ycombinator.com/item?id=42367528</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 09 Dec 2024 16:13:13 +0000</pubDate><link>https://www.datadoghq.com/blog/go-instrumentation-orchestrion/</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=42367528</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42367528</guid></item><item><title><![CDATA[We used Datadog to save $17.5M annually]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.datadoghq.com/blog/how-datadog-saved-17-million-dollars/">https://www.datadoghq.com/blog/how-datadog-saved-17-million-dollars/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41731962">https://news.ycombinator.com/item?id=41731962</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 03 Oct 2024 16:03:34 +0000</pubDate><link>https://www.datadoghq.com/blog/how-datadog-saved-17-million-dollars/</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=41731962</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41731962</guid></item><item><title><![CDATA[New comment by felixge in "Go 1.23 Released"]]></title><description><![CDATA[
<p>This is great, thank you.</p>
]]></description><pubDate>Tue, 13 Aug 2024 19:56:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=41239107</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=41239107</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41239107</guid></item><item><title><![CDATA[The One-Instruction Window]]></title><description><![CDATA[
<p>Article URL: <a href="https://nsrip.com/posts/oneinstruction.html">https://nsrip.com/posts/oneinstruction.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41225804">https://news.ycombinator.com/item?id=41225804</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 12 Aug 2024 15:48:06 +0000</pubDate><link>https://nsrip.com/posts/oneinstruction.html</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=41225804</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41225804</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>I'm skeptical that it's worth it myself, this was just a fun research project for me. But once hardware shadow stacks are available, I think this could be great.<p>To answer your first question: For most Go applications, the average stack trace depth for profiling/execution tracing is below 32 frames. But some applications use heavy middleware layers that can push the average above this limit.<p>That being said, I think this technique will amortize much earlier when the fixed cost per frame walk is higher, e.g. when using DWARF or gopclntab unwinding. For Go that doesn't really matter while the compiler emits frame pointers. But it's always good to have options when it comes to evolving the compiler and runtime ...</p>
]]></description><pubDate>Thu, 30 May 2024 21:06:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=40528716</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40528716</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40528716</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>That seems to be windows only? My main target OS is Linux.</p>
]]></description><pubDate>Thu, 30 May 2024 15:47:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=40525074</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40525074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40525074</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>Thank you so much, this is very helpful and interesting. I'll try to experiment with this at some point.</p>
]]></description><pubDate>Thu, 30 May 2024 12:53:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=40523077</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40523077</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40523077</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>Thanks for the reply!<p>What does the API for accessing the shadow stack from user space look like? I didn't see anything for it in the kernel docs [1].<p>I agree about the need for switching the shadow stacks in the Go scheduler. But this would probably require an API that is a bit at odds with the security goals of the kernel feature.<p>I'm not sure I follow your thoughts on CGO and how this would work on older CPUs and kernels.<p>[1] <a href="https://docs.kernel.org/next/x86/shstk.html" rel="nofollow">https://docs.kernel.org/next/x86/shstk.html</a></p>
]]></description><pubDate>Thu, 30 May 2024 11:46:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=40522586</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40522586</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40522586</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>I don't think any obvious 10%+ opportunities have been overlooked. Go is optimizing for fast and simple builds, which is a bit at odds with optimal code gen. So I think the biggest opportunity is to use Go implementations that are based on aggressively optimizing compilers such as LLVM and GCC. But those implementations tend to be a few major versions behind and are likely to be less stable than the official compiler.<p>That being said, I'm sure there are a lot of remaining incremental optimization opportunities that could add up to 10% over time. For example a faster map implementation [1]. I'm sure there is more.<p>Another recent perf opportunity is using pgo [2] which can get you 10% in some cases. Shameless plug: We recently GA'ed our support for it at Datadog [3].<p>[1] <a href="https://github.com/golang/go/issues/54766">https://github.com/golang/go/issues/54766</a>
[2] <a href="https://go.dev/doc/pgo" rel="nofollow">https://go.dev/doc/pgo</a>
[3] <a href="https://www.datadoghq.com/blog/datadog-pgo-go/" rel="nofollow">https://www.datadoghq.com/blog/datadog-pgo-go/</a></p>
]]></description><pubDate>Thu, 30 May 2024 09:33:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=40521737</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40521737</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40521737</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>That's what hardware shadow stacks in modern intel/arm CPUs can do! It just needs to be exposed to user space and become widely available.</p>
]]></description><pubDate>Thu, 30 May 2024 07:46:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=40521192</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40521192</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40521192</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>I know that at least two engineers from the runtime team have seen the post in the #darkarts channel of gopher slack. One of them left a fire emoji :).<p>I'll probably bring it up in the by-weekly Go runtime diagnostics sync [1] next Thursday, but my guess is that they'll have the same conclusion as me: Neat trick, but not a good idea for the runtime until hardware shadow stacks become widely available and accessible.<p>[1] <a href="https://github.com/golang/go/issues/57175">https://github.com/golang/go/issues/57175</a></p>
]]></description><pubDate>Thu, 30 May 2024 07:25:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=40521074</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40521074</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40521074</guid></item><item><title><![CDATA[New comment by felixge in "Fast Shadow Stacks for Go"]]></title><description><![CDATA[
<p>Thanks! And to answer you question: No, it won't speed up Go programs for now. This was mostly a fun research project for me.<p>The low hanging fruits to speed up stack unwinding in the Go runtime is to switch to frame pointer unwinding in more places. In go1.21 we contributed patches to do this for the execution tracer. For the upcoming go1.23 release, my colleague Nick contributed patches to upgrade the block and mutex profiler. Once the go1.24 tree opens, we're hoping to tackle the memory profiler as well as copystack. The latter would benefit all Go programs, even those not using profiling. But it's likely going to be relative small win (<= 1%).<p>Once all of this is done, shadow stacks have the potential to make things even faster. But the problem is that we'll be deeply in diminishing returns territory at that point. Speeding up stack capturing is great when it makes up 80-90% of your overhead (this was the case for the execution tracer before frame pointers). But once we're down to 1-2% (the current situation for the execution tracer), another 8x speedup is not going to buy us much, especially when it has downsides.<p>The only future in which shadow stacks could speed up real Go programs is one where we decide to drop frame pointer support in the compiler, which could provide 1-2% speedup for all Go programs. Once hardware shadow stacks become widely available and accessible, I think that would be worth considering. But that's likely to be a few years down the road from now.</p>
]]></description><pubDate>Thu, 30 May 2024 07:10:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=40520999</link><dc:creator>felixge</dc:creator><comments>https://news.ycombinator.com/item?id=40520999</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40520999</guid></item></channel></rss>