Hacker News: peterstjohn

New comment by peterstjohn in "Ask HN: What was your "oh shit" moment with GenAI?"

peterstjohn — Sat, 06 Jun 2026 18:27:29 +0000

Okay, a couple of hours later…thanks for the hint as that's fucking dark magic ;) and I now have access to the entire New Yorker again after around 15 years :)

New comment by peterstjohn in "Ask HN: What was your "oh shit" moment with GenAI?"

peterstjohn — Sat, 06 Jun 2026 17:08:03 +0000

Ooooh, you don't happen to have the code for the New Yorker decryption in a form you could send, do you? Or put up on github or even just give me the starting prompt…

New comment by peterstjohn in "How we built Bluey’s world"

peterstjohn — Mon, 04 Aug 2025 18:43:17 +0000

Try Hey Duggee - it's not as explicitly British-coded, but there's a ton of stuff in there if you were watching Spaced in your late teens and now find yourself a parent…

New comment by peterstjohn in "Four Lectures on Standard ML (1989) [pdf]"

peterstjohn — Sun, 30 Mar 2025 19:58:30 +0000

For my sins, I didn't actually realise how great that was until quite a bit afterwards! ;)

New comment by peterstjohn in "Four Lectures on Standard ML (1989) [pdf]"

peterstjohn — Sun, 30 Mar 2025 18:22:30 +0000

Ha, same here! It really helped my imposter syndrome, as I overheard a couple of guys talking about the ARM assembly they were doing on their Archimedes on the first day…and I hadn't written anything fancier than QuickBASIC at the time…

New comment by peterstjohn in "How Britain got its first internet connection (2015)"

peterstjohn — Fri, 10 Jan 2025 02:19:19 +0000

I even hosted a mirror of the original Mozilla source code dump from St. Anselm Hall, and nobody ever complained ;P

New comment by peterstjohn in "Concrete clickbait: next time you share a spomenik photo (2016)"

peterstjohn — Sun, 08 Sep 2024 16:20:59 +0000

If you think that of Owen's output, for heaven's sake I fear for you if you ever read a Jonathan Meades article…

New comment by peterstjohn in "The Humanise Campaign call for an end to boring buildings"

peterstjohn — Sun, 21 Apr 2024 13:03:59 +0000

You would just film almost _directly_ across the river and shoot on the South Bank, one of the major brutalist outposts in London. It's lovely.

New comment by peterstjohn in "Are we at peak vector database?"

peterstjohn — Fri, 26 Jan 2024 13:40:57 +0000

I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.

Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…

(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)

New comment by peterstjohn in "Hasbro laying off Wizards of the Coast staff is baffling"

peterstjohn — Sun, 17 Dec 2023 15:58:14 +0000

Well, why wouldn't they sell (license) the rights to make Transformers films (which as far as I know is just extending their existing contract with Paramount)?

They still own the underlying IP[^1], so as long as the contract is a decent one, Paramount has to deal with the actual making/distributing the film, and Hasbro just gets the money, and a toy line off the back of the film. Feels like an easier set up than taking the risk on movie-making yourself (which they did attempt with eOne for other properties, but seemingly have decided that it's probably not a good deal with them)

[1] yes, yes, it's a bit more complicated with Takara in the mix too, but you can essentially view it as a Hasbro-owned property

New comment by peterstjohn in "Vespa.ai is spinning out of Yahoo as a separate company"

peterstjohn — Wed, 04 Oct 2023 21:40:56 +0000

+1 to everybody that mentioned that Vespa has great vector support _and_ lexical filtering. And you likely will end up needing both.

Don't sleep on some of its newer features like multi-vector document fields, either…

New comment by peterstjohn in "Do we think about vector storage wrong?"

peterstjohn — Tue, 05 Sep 2023 11:50:26 +0000

That paper does a terrible job of making Lucene look useful, though. 10qps from a server with 1TB of RAM is not great (and I know Lucene HNSW can perform better than that in the real world, so I am somewhat mystified that this paper is being pushed by the community).

New comment by peterstjohn in "What is a Vector Database? (2021)"

peterstjohn — Fri, 05 May 2023 16:09:01 +0000

It definitely depends on your use case. If you are just searching through the entire array at all times, then this is certainly an acceptable option (you could even flip it all onto a GPU too).

But when you start to require filtering or combining the vector search with a lexical search, then something like Pinecone, Vespa, Qdrant, Lucene-based options (e.g. Solr and ES) etc. become a lot more practical than you building all that functionality yourself.

New comment by peterstjohn in "What is a Vector Database? (2021)"

peterstjohn — Fri, 05 May 2023 16:04:19 +0000

Yes! We've been running Milvus in production for about three years now, powering some customers that do have queries at that scale. It has its foibles like all of these systems (the lack of non-int id fields in the 1.x line is maddening and has required a bunch of additional engineering by us to work with our other systems), but it has held up pretty well in our experience.

(I can't speak to Milvus 2.x as we are probably not going to upgrade to that for a number of non-performance reasons)

New comment by peterstjohn in "What is a Vector Database? (2021)"

peterstjohn — Fri, 05 May 2023 15:56:22 +0000

Are they forking Lucene or somehow getting the Lucene devs to increase that limit? Because this PR has been open for over a year now: https://github.com/apache/lucene/issues/11507

New comment by peterstjohn in "A Low Cost Approach to Improving Pedestrian Safety with Deep Learning"

peterstjohn — Thu, 27 Apr 2023 17:17:45 +0000

Fun project, with a bit of a kicker as I see the words "Colerain Avenue" and realize it was literally across the road from me.

New comment by peterstjohn in "StableLM: A new open-source language model"

peterstjohn — Wed, 19 Apr 2023 22:46:27 +0000

So just use their base model and fine-tune with a non-restrictive dataset (e.g. Databricks' Dolly 2.0 instructions)? You can get a decent LoRA fine-tune done in a day or so on consumer GPU hardware, I would imagine.

The point here is that you can use their bases in place of LLaMA and not have to jump through the hoops, so the fine-tuned models are really just there for a bit of flash…

New comment by peterstjohn in "UK airport scraps 100ml liquid rule with new scanners"

peterstjohn — Tue, 04 Apr 2023 12:32:44 +0000

I once travelled with a 5kg vat of fondant icing on a transatlantic flight. "Yes, it looks very much like Semtex, but it's fine!" Still not exactly sure how I got away with it…

New comment by peterstjohn in "Algolia Acquires Search.io"

peterstjohn — Thu, 15 Sep 2022 14:05:07 +0000

Heh, my eyes did pop at that one, considering we've also been doing that over here since 2020 at least ;)

New comment by peterstjohn in "Algolia Acquires Search.io"

peterstjohn — Thu, 15 Sep 2022 13:37:08 +0000

It really does give you the best of both worlds - resistant to typos, handling synonyms without all the usual hand-written rules, but still able to handle direct searches like ISBNs.

(disclaimer: I work on Semantic Search at Lucidworks)