Hacker News: vsrinivas

New comment by vsrinivas in "I ran Gemma 4 as a local model in Codex CLI"

vsrinivas — Mon, 13 Apr 2026 02:42:59 +0000

Hey - I use the same, w/ both gemma4 and gpt-oss-*; some things I have to do for a good experience:

1) Pin to an earlier version of codex (sorry) - 0.55 is the best experience IME, but YMMV (see https://github.com/openai/codex/issues/11940, https://github.com/openai/codex/issues/8272).

2) Use the older completions endpoint (llama.cpp's responses support is incomplete - https://github.com/ggml-org/llama.cpp/issues/19138)

New comment by vsrinivas in "Remembering Bell Labs as legendary idea factory prepares to leave N.J. home"

vsrinivas — Mon, 22 Jan 2024 02:26:23 +0000

FYI, Lumon HQ / the Eeron Saarinen masterpiece is Holmdel Bell Labs; this article is about the Murray Hill building.

New comment by vsrinivas in "Chevy Bolt EV and Bolt EUV production is ending this year"

vsrinivas — Wed, 26 Apr 2023 16:03:29 +0000

Don't forget BECM on Gen2

New comment by vsrinivas in "Lanai, the mystery CPU architecture in LLVM"

vsrinivas — Tue, 22 Mar 2022 17:30:02 +0000

I'm sorry to hear Jakov passed away - he was a brilliant, kind engineer.

New comment by vsrinivas in "Intel Software Defined Silicon: additional CPU features after license activation"

vsrinivas — Mon, 22 Nov 2021 22:12:32 +0000

Intel Cooper Lake - it's a 14 nm server part that fit a particular niche (4S support, VNNI/bfloat16). It was partly due to FB, but also partly to provide a platform that was larger than ICX server could / can.

New comment by vsrinivas in "AMD lands Google, Twitter as customers with newest server chip"

vsrinivas — Fri, 09 Aug 2019 00:55:17 +0000

The TLB bug (Errata 298, doc 41322 if you really care - while the processor was attempting to set the A/D bits in a page table entry, an L2->L3 eviction of that PTE could occur) was one of a great many things wrong with that chip.

* A number of errata (not just 298) delayed full production, sapped performance, or negatively impacted idle power. Take a look at doc 41322, DR-BA step for many samples.

* It was late and didn't achieve performance targets; it missed clock rate targets and 2 MiB L3 was insufficient.

* Intel delivered a very compelling server part (Nehalem) during the lifecycle of family 10h.

New comment by vsrinivas in "Intel Analysis of Speculative Execution Side Channels [pdf]"

vsrinivas — Fri, 05 Jan 2018 17:58:01 +0000

Control Flow Enforcement (ENDBRANCH requirement at branch targets) looks like a nice feature, looking forward to it.

New comment by vsrinivas in "Ask HN: What is your favorite CS paper?"

vsrinivas — Thu, 24 Aug 2017 20:06:23 +0000

From the perspective of - 'take a fresh look at something we take for granted' - "A Preliminary Architecture for a Basic Data-Flow Processor" (Dennis & Misunas 1975)

Focusing on the flow of data between operators and greedily executing a linear program is what an out-of-order processor is.

New comment by vsrinivas in "Bottleneck Bandwidth and RTT"

vsrinivas — Sun, 18 Sep 2016 17:17:15 +0000

You mean LRO instead of GRO, right?

New comment by vsrinivas in "Python 3 on Google App Engine flexible environment now in beta"

vsrinivas — Wed, 10 Aug 2016 17:38:00 +0000

Just wondering, why do you think so?

(It wasn't obvious to me why, given a first read of the docs; the fact that applications run in VMs in Flex seems more like an implementation detail. Am I missing something?)

New comment by vsrinivas in "Implementing Software Timers (1990)"

vsrinivas — Wed, 06 Jul 2016 04:16:37 +0000

Yes it is. Recommend browsing for great UNIX / TCP history.

New comment by vsrinivas in "Crowding out OpenBSD"

vsrinivas — Sun, 18 Nov 2012 07:08:23 +0000

And DragonFly's dntpd! It's time sync algorithms are actually pretty cool.

New comment by vsrinivas in "DragonFlyBSD 3.2 Release"

vsrinivas — Sun, 04 Nov 2012 18:30:53 +0000

That these benchmarks were done with 9.3-devel was important; part of what triggered this work was PostGres moving to a different way of mapping it's shared memory segment (mmap vs shm), the benchmarks were initially done to see how much that hurt.

New comment by vsrinivas in "DragonFlyBSD 3.2 Release"

vsrinivas — Sun, 04 Nov 2012 18:27:45 +0000

Still hard to say.

In the network stack, I think the DragonFly approach is definitely better (than fine-grained data structure locking). In DF, connections are routed to per-CPU network service threads based on a consistent hash; as a connection will always route to the same CPU, locking requirements are simpler. There are some places where the 'lower-half' network code does unfortunately take locks (when delivering data to socket buffers), but on the whole it's a good design. Solaris chose the same model, called it FireEngine; they have more writeups, if you're interested.

The changes in the 3.2 release that improved PostGres/pgbench performance were just good-old-fashioned lock breaking (spreading UNIX domain socket locking out), bypassing inefficient paths when possible (ex: bypass instantiating a buffercache buffer when data can be read directly from the page cache), and the new scheduler work. One other change was a neat trick, a form of page-table sharing; many of those changes would make sense on FreeBSD or similar systems.

The page-table sharing work addressed a problem seen years ago in Linux -- the cost of pv_entry manipulations. Linux moved from rmap to objrmap (http://lwn.net/Articles/75198/); FreeBSD reduced the cost of pv_entries by reducing their size and packing them densely. DFly's page-table sharing reduced the number of pv_entries (basically de-duped them) in particular situations (correct alignment/size of matched mapping; SHM segments, like in X or PostGres benefit most).

New comment by vsrinivas in "Rob Pike: Dennis Ritchie has died"

vsrinivas — Thu, 13 Oct 2011 17:11:41 +0000

The first edition has opening brackets for functions on a new line. It also used a space before parenthesis for while loops and conditionals. Return types, when required, were on the same line as a function name.

New comment by vsrinivas in "Hammer2 Filesystem Design Docs (DragonflyBSD)"

vsrinivas — Sat, 14 May 2011 03:31:50 +0000

Hi,

If you're interested in reading a paper about the execution model, Jeffrey Hsu's 2004 "The DragonFlyBSD Operating System" (http://www.dragonflybsd.org/presentations/dragonflybsd.asiab...) describes the LWKT and port abstractions.

The DragonFly network stack is a pretty interesting subsystem to understand the SMP model through; it uses a form of connection-oriented parallelism rather than fine-grained locks. The same approach was taken by Solaris in their FireEngine system (they call it 'vertical partitioning'). References that would be interesting to look at to understand how the netstack works, "An Evaluation of Network Stack Parallelization Strategies in Modern Operating Systems", along with the second part of the earlier paper. FreeBSD chose to build a fine-grain-locked netstack in a style the paper above called 'Message Parallelism'.

The DragonFly kernel memory allocator is another interesting subsystem to look at SMP design through. The kernel allocator (kmalloc) is a slab allocator, like most other kernel allocators. The DragonFly slab allocator differs from Bonwick's classic by using fixed size and fixed alignment slabs, so the traditional reverse-mapping hash table is not required; the slab headers are always at the slab ... head. The SMP strategy was not to put a per-CPU cache in front of each zone, however. The slab allocator itself was duplicated across each CPU (each slab for a given size is CPU-private). Remote frees were handled via passive IPIs (as described in the first paper). The only lock in the allocator is at the bottom, to allocate kernel address space and frames for slabs. Other systems (think Solaris, say) build per-CPU caches of objects in front of the allocator and lock the slab layer.

The pattern you might see is that DFly chose to replicate resources across each CPU in an SMP system where reasonable...

Throughout the rest of the kernel, Dfly uses a curious lock called a 'token'. A token lock is automatically released when a thread holding it stalls and is reacquired when it becomes runnable; think of a token as a having the semantics of the older *BSD MPLOCK, but there can be more than one token, where there was one MPLOCK. Token locks can't deadlock (as sleep-and-hold is not possible; they do introduce a new class of error, though, where a lock "holder" slept, meaning earlier assumptions don't hold). They probably won't work as well as conventional mutexes when the token chains get larger than a few elements. Most importantly, they allowed the MPLOCK to be broken up very quickly, mostly over the 2.8 and 2.10 release cycles. Don't know that there is too much other there to read about tokens; there might be something in the XNU kernel notes about 'funnels' (which have similar semantics).

-- vs