Hacker News: etep

New comment by etep in "Spade Hardware Description Language"

etep — Mon, 12 May 2025 17:44:20 +0000

Surfer deserves to hit the front page also. Much better than gtk wave. Nice work Spade & Surfer!

New comment by etep in "Ask HN: Who is hiring? (May 2025)"

etep — Thu, 01 May 2025 22:16:49 +0000

PAX Markets (YC W25) | Founding SW or HW Engineer | Menlo Park, CA | Full-time

We're building a crypto exchange, on a chip. We sell ultra-low latency market access to HFT participants through our on-chip λ API. For all other participants, we offer free trading with cash rebates for making and taking.

PAX is the fastest exchange from crypto to Wall Street.

Apply here: https://app.dover.com/jobs/pax

New comment by etep in "Pprof++: A Go Profiler with Hardware Performance Monitoring"

etep — Tue, 11 May 2021 19:51:00 +0000

I'm curious about the results under the section titled "Demonstration."

The author claims that ground truth is that each goroutine utilizes 10% of the CPU time (so stipulated that this should be the case). But, what if the results shown are accurate, i.e. that the results are the actual CPU time (because of idiosyncrasies of scheduling between the OS, the go runtime, and anything else happening on that system).

Does running the new profiler show less variance in the results from that initial experiment? Showing this result would strengthen the claim that the "out of the box" solution is inaccurate.

New comment by etep in "The Urgent Quest for Slower, Better News"

etep — Wed, 10 Apr 2019 19:49:20 +0000

Wasn't aware of it, will try it - thank you!

New comment by etep in "The Urgent Quest for Slower, Better News"

etep — Wed, 10 Apr 2019 19:40:04 +0000

I have a theory that news occurs with a far lower frequency than people think, far lower than even when people try to account for their knowledge of the 24 hr. news cycle. In my theory, most of what counts for news, is either a story update (737 Max stories), or not news (say Trump "news").

My hypothetical news organization would strive to identify the unifying element of a given news item, and then keep one article about them. The article would include the following elements: a summary, a timeline, a fact set, and a commentary or critique. All subsections would be allowed to evolve, but the "story" would be one thing.

I've considered trying to self fund this somehow, but I've never convinced myself that it would really get traction.

New comment by etep in "A 50-year-old design came back to haunt Boeing with its troubled 737 Max jet"

etep — Sun, 17 Mar 2019 15:42:12 +0000

Thank you for the insight! Just for clarity, I'm using "autopilot" as a catch-all term that refers the automated systems that plausibly seem to have caused these crashes.

New comment by etep in "A 50-year-old design came back to haunt Boeing with its troubled 737 Max jet"

etep — Sun, 17 Mar 2019 15:23:41 +0000

It's not nice that so many writers parrot that the solution is "more pilot training." Under the hypothesis that there needs to be a solution (one that I think is correct), then the answer of more training amounts to mere hope. One hopes that the pilot a) knows and b) remembers to turn off the autopilot when the emergency starts.

New comment by etep in "Dark Site Finder: tracking light pollution to find locations for stargazing"

etep — Fri, 26 Jan 2018 17:46:35 +0000

https://xkcd.com/1138/

New comment by etep in "Firefox’s new streaming and tiering compiler"

etep — Thu, 18 Jan 2018 03:20:16 +0000

For chips, power scales with voltage squared. Is also true that P=IV (since both are true, these observations cannot be in contradiction). Apparently, for chips, the current must be proportional to voltage also. Glossing over some details, turning on (off) a transistor is the same as charging (discharging) a capacitor. The energy stored on a capacitor is 1/2 C V^2. If you turn on and off the transistor periodically (say with frequency f) you use 1/2 C V^2 energy f times per second (energy per unit time is power). Normally the capacitance is ignored when discussing how power changes because for a given design the capacitance is a fixed quantity.

New comment by etep in "Firefox’s new streaming and tiering compiler"

etep — Thu, 18 Jan 2018 02:38:54 +0000

I think it is linear for frequency and non-linear for voltage, i.e. P~fCV^2. But in many current CPUS, the feature that adjusts frequency also adjusts voltage. That's why I stipulated that, for my comments to be true, such shenanigans as dynamic frequency (and voltage) scaling must be "turned off." I think the OP was asking, what happens to CPU energy if you load the web page with and without the optimized compilation. The OP was interested in core sleep states, but I think that dynamic frequency scaling is a confounding factor. It would be interesting to see the measurements w/ and w/out that feature perhaps.

New comment by etep in "Firefox’s new streaming and tiering compiler"

etep — Wed, 17 Jan 2018 21:21:42 +0000

It should be the case that if the same amount of work is done, then the energy used will be the same. If it takes less work to compile the web assembly, then less energy (holding all other parameters the same). If you have to idle a CPU, then you probably use more energy (holding all other parameters the same) i.e. because you will spend more time and accomplish the same amount of real work (but waste energy on the idled core, albeit waste very little, accomplishing extra, but non-productive, work). Cannot let some other CPU parameter changes as a result of cores being idled (e.g. frequency gets boosted on non-idle cores as a result of dynamic frequency scaling with idle cores) to run this experiment. Thinking about CPU energy use is interesting :)

New comment by etep in "Jeff Bezos Surpasses Bill Gates as World's Richest Person"

etep — Thu, 27 Jul 2017 17:03:55 +0000

If Bezos gave all his money to everyone in the world we would all get maybe $13 or so. If all the billionaires did the same we would all get $1000 or so.

New comment by etep in "Pattern-defeating quicksort"

etep — Thu, 29 Jun 2017 22:24:02 +0000

Hi stjepang,

If the following criteria is met, then perhaps the branch mis-predict penalty is less of a problem: 1. you are sorting a large amount of data, much bigger than the CPU LLC 2. you can effectively utilize all cores, i.e. your sort algorithm can parallelize Perhaps in this case you are memory bandwidth limited. If so, you are probably spending more time waiting on data than waiting on pipe flushes (i.e. consequence of mis-predicts).

New comment by etep in "Cache Organization in Intel CPUs (2009)"

etep — Thu, 08 Jun 2017 14:58:52 +0000

I think maybe we were talking past each other. Yes there is more than one reason.

It's far easier to add capacity by adding sets, as opposed to ways. But they can't add sets in the L1 because of the aliasing problem. When they do increase L1 capacity, if nothing else has changed, then it will be by adding ways.

New comment by etep in "Cache Organization in Intel CPUs (2009)"

etep — Wed, 07 Jun 2017 18:49:46 +0000

Hi, it is the main reason L1 hasn't grown.

By your reasoning, no cache should be able to grow, because then their latency would increase too much. But instead, all other CPU caches are growing basically with iso-latency. The reason this is possible is technology scaling. Anyway...

But yes, the L1 does have to be small and fast, but it doesn't have to be that small to be that fast. It has to be that small because of virtual indexing combined with the cost of adding ways breaking other design constraints (possibly a latency constraint, fine). But you could grow the L1 by adding sets and get your required latency.

New comment by etep in "Cache Organization in Intel CPUs (2009)"

etep — Wed, 07 Jun 2017 18:45:55 +0000

At a high level it's true that smaller is faster, but it's also true that those L1s could have grown by adding sets (not ways) and achieved the same latency. L2 has grown, but stayed iso-latency. This seems to say that "smaller is faster" does not always hold.

Always impressed that Agner Fog takes the time to publish his results. Pretty amazing. But I think focusing your thinking on the register count in MIPs or the the uarch for some random opcode does not get into the real constraints on L1 cache design at all. One could say that x86 should be even faster, because hey, far less than 32 registers (or historically at least).

My response is like this: yes, the L1 has to be small to be fast, but it has been stuck at 32KB forever now. It could have grown! So it's not as simple as small is fast.

New comment by etep in "Cache Organization in Intel CPUs (2009)"

etep — Wed, 07 Jun 2017 18:34:53 +0000

If it was easy to build high performance caches with high associativity, we would certainly see higher associativity. Ideally, you want a fully associative cache, but it's too expensive. In CPU caches, once a set is selected, all N associative ways are compared simultaneously. So growing associativity costs area and power for extra comparators. This growth could cause timing issues, i.e. add latency to the memory access or cause CPU freq. to be lowered.

New comment by etep in "Cache Organization in Intel CPUs (2009)"

etep — Wed, 07 Jun 2017 14:15:19 +0000

I had wondered, why are the L1 caches not growing, while L2 and L3 capacities continue to grow: a significant limitation on L1 cache size is actually the fixed tradeoff between associativity, page size (i.e. the 4KB pages allocated by the OS to processes).

Because a 4 KB page has 64 cache lines, then you can have at most 64 cache sets. With an 8 way associative cache this works out to 32 KB. Using 128 sets would cause aliasing, but with 64 sets the cache index is built from the LSBs that just index into the page (i.e. not used in the TLB lookup). Thus, the only way to grow increase L1 capacity is to: - totally abandon 4KB pages in favor of (e.g.) 2MB pages (not likely) - increase cache associativity (likely imo) - stop using virtual index+physical tag (not likely imo)

New comment by etep in "Python For Finance: Algorithmic Trading"

etep — Fri, 02 Jun 2017 22:13:05 +0000

My point exactly, not the most lucrative. So what remains is the implicit assertion they are fleecing the big time guys. And somehow the big time guys remain in business... so what's going on?

New comment by etep in "Python For Finance: Algorithmic Trading"

etep — Fri, 02 Jun 2017 22:03:32 +0000

People like to say the market is a zero sum game, but I have always been suspicious of this truism.

Its fair to say that participants can leave the market, but in practice that isn't what we see. In practice there are firms doing this trading, and they are staying in business, and obviously making money.

Are they fleecing the little guy then? This explanation falls flat for me, i.e. for the amount of money they seem to be making, it would take a lot of small time participants losing everything every day. Most people I know aren't even active traders.

So why is it an accepted truism that the market is zero sum?