Hacker News: jackpirate

New comment by jackpirate in "Xs of Y – roguelike that names itself every run. Written in 4kLoC"

jackpirate — Wed, 13 May 2026 18:10:21 +0000

The name let-go of your programming language is awesome!

New comment by jackpirate in "AI uses less water than the public thinks"

jackpirate — Fri, 01 May 2026 21:45:37 +0000

I "like" how in their graphic agriculture and cities are both putting water into the lake, and only data centers are removing water from the lake.

The prompter should have redone this image a couple of times until they had all three actually draining the lake.

New comment by jackpirate in "ChatGPT Images 2.0"

jackpirate — Tue, 21 Apr 2026 19:47:57 +0000

Also, the racoon it circled isn't in the original.

New comment by jackpirate in "Parse, Don't Validate (2019)"

jackpirate — Tue, 10 Feb 2026 16:51:31 +0000

> Edit: Changed this from email because email validation is a can of worms as an example

Email honestly seems much more straightforward than dates... Sweden had a Feb 30 in 1712, and there's all sorts of date ranges that never existed in most countries (e.g. the American colonies skipped September 3-13 in 1752).

New comment by jackpirate in "Don't Be a Sucker (1943) [video]"

jackpirate — Mon, 13 Oct 2025 22:05:16 +0000

Why not have a conversation instead of downvoting. What did I say is wrong?

Your second paragraph is implying that the half of Americans who voted for Trump are "bad Americans". That seems to be sowing the division that your first paragraph warns against (even if it is a reason to dislike Trump).

I don't think either democrats or republicans can claim the moral high ground about sowing division.

New comment by jackpirate in "TODOs aren't for doing"

jackpirate — Tue, 22 Jul 2025 15:59:35 +0000

What's the origin of XXX? I've seen FIXME/NOTE/TODO all over the place, but never encountered XXX before.

New comment by jackpirate in "Harnessing the Universal Geometry of Embeddings"

jackpirate — Wed, 21 May 2025 23:01:31 +0000

Except those papers are 8ish years old; they actually were among the first 2-3 algs for this task; and they studied the fully general vector space alignment problem. But I agree that naming things is hard and don't have a better name.

New comment by jackpirate in "Harnessing the Universal Geometry of Embeddings"

jackpirate — Wed, 21 May 2025 22:57:12 +0000

> We tested all of the methods in the Python Optimal Transport package (https://pythonot.github.io/) and reported the max in most of our tables.

Sorry if I'm being obtuse, but I don't see any mention of the POT package in your paper or of what specific algorithms you used from it to compare against. My best guess is that you used the linear map similar to the example at <https://pythonot.github.io/auto_examples/domain-adaptation/p...>. The methods I mentioned are also linear, but contain a number of additional tricks that result in much better performance than a standard L2 loss, and so I would expect those methods to outperform your OT baseline.

> As for the name – the paper you recommend is called 'vecmap' which seems equally general, doesn't it? Google shows me there are others who have developed their own 'vec2vec'. There is a lot of repetition in AI these days, so collisions happen.

But both of those papers are about generic vector alignment, so the generality of the name makes sense. Your contribution here seems specifically about the LLM use case, and so a name that implies the LLM use case would be preferable.

I do agree though that in general naming is hard and I don't have a better name to suggest. I also agree that there's lots of related papers, and you can't cite/discuss them all reasonably.

And I don't mean to be overly critical... the application to LLMs is definitely cool. I wouldn't have read the paper and written up my critiques if I didn't overall like it :)

New comment by jackpirate in "Harnessing the Universal Geometry of Embeddings"

jackpirate — Wed, 21 May 2025 20:30:38 +0000

I hate to be "reviewer 2", but:

I used to work on what your paper calls "unsupervised transport", that is machine translation between two languages without alignment data. You note that this field has existed since ~2016 and you provide a number of references, but you only dedicate ~4 lines of text to this branch of research. There's no comparison about why your technique is different to this prior work or why the prior algorithms can't be applied to the output of modern LLMs.

Naively, I would expect off-the-shelf embedding alignment algorithms (like <https://github.com/artetxem/vecmap> and <https://github.com/facebookresearch/fastText/tree/main/align...>, neither of which are cited or compared against) to work quite well on this problem. So I'm curious if they don't or why they don't.

I can imagine there is lots of room for improvements around implicit regularization in the algorithms. Specifically, these algorithms were designed with word2vec output in mind (typically 300 dimensional vectors with 200000 observations), but your problem has higher dimensional vectors with fewer observations and so would likely require different hyperparameter tuning. IIRC, there's no explicit regularization in these methods, but hyperparameters like stepsize/stepcount can implicitly add L2 regularization, which you probably need for your application.

---

PS.

I *strongly dislike* your name of vec2vec. You aren't the first/only algorithm for taking vectors as input and getting vectors as output, and you have no right to claim such a general title.

---

PPS.

I believe there is a minor typo with footnote 1. The note is "Our code is available on GitHub." but it is attached to the sentence "In practice, it is unrealistic to expect that such a database be available."

New comment by jackpirate in "AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms"

jackpirate — Wed, 14 May 2025 19:11:27 +0000

It seems like you have some misconceptions about Strassen's alg:

1. It is a standard example of the divide and conquer approach to algorithm design, not the dynamic programming approach. (I'm not even sure how you'd squint at it to convert it into a dynamic programming problem.)

2. Strassen's does not require complex valued matrices. Everything can be done in the real numbers.

New comment by jackpirate in "Show HN: Roons – Mechanical Computer Kit"

jackpirate — Fri, 02 May 2025 03:25:50 +0000

As a CS prof, I'd love to have this in my office for students to play with. Looks awesome!

New comment by jackpirate in "PEP 750 – Template Strings"

jackpirate — Fri, 11 Apr 2025 00:49:29 +0000

That all make senses to me. But it definitely won't make sense to my intro to programming students. They already have enough weird syntax to juggle.

New comment by jackpirate in "PEP 750 – Template Strings"

jackpirate — Thu, 10 Apr 2025 21:54:27 +0000

Building off this question, it's not clear to me why Python should have both t-strings and f-strings. The difference between the two seems like a stumbling block to new programmers, and my "ideal python" would have only one of these mechanisms.

New comment by jackpirate in "Rules for writing software tutorials"

jackpirate — Thu, 02 Jan 2025 23:03:07 +0000

> I've seen a lot of examples that use CSS to show the prompt or line number without it becoming part of copied text, and I'm highly in favor of that.

This is unfortunately not compatible with writing the tutorial in markdown to be rendered on github.

New comment by jackpirate in "Rules for writing software tutorials"

jackpirate — Thu, 02 Jan 2025 21:02:35 +0000

I have a minor nit to pick. I actually prefer when tutorials provide the prompts for all code snippets for two reasons:

1. Many tutorials reference many languages. (I frequently write tutorials for students that include bash, sql, and python.) Providing the prompts `$`, `sqlite>` and `>>>` makes it obvious which language a piece of code is being written in.

2. Certain types of code should not be thoughtlessly copy/pasted, and providing multiline `$` prompts enforce that the user copy/pastes line by line. A good example is a sequence of commands that involves `sudo dd` to format a harddrive. But for really intro-level stuff I want the student/reader to carefully think about all the commands, and forcing them to copy/paste line by line helps achieve that goal.

That said, this is an overall good introduction to writing that I will definitely making required reading for some of my data science students. When the book is complete, I'll be happily buying a copy :)

New comment by jackpirate in "The Cheating Device (ChatGPT on a TI-84) [video]"

jackpirate — Thu, 19 Sep 2024 17:02:04 +0000

> I can talk about concepts like "atoms" or "bacteria" or "black holes" with anyone, and they'll know what they are - even if their knowledge of those subjects isn't in depth.

I'm not convinced this is an unalloyed good. Knowing that a disease is caused by "bacteria" instead of "demons" isn't really helpful if you don't have a deep understanding of exactly what bacteria is. See, for example, all of the people who want antibiotics whenever they're sick for any reason. We've just replaced one set of weird beliefs in the general populace with another and given it a veneer of science.

New comment by jackpirate in "The case for not sanitising fairy tales"

jackpirate — Mon, 24 Jun 2024 21:41:39 +0000

I think you're wrong.

Suicide does not have stable reporting rates. It was very stigmatized in the past, and so investigators would notoriously report suicides as "unknown cause of death" if they could.

Violent crime, on the other hand, is much more correlated with things like poverty than with mental health.

I think it's quite obviously the case that there are no clear indicators about what "mental health" looked like 100 years ago and there. Any projections into the past will involve a lot of extrapolation and have all sorts of biases.

New comment by jackpirate in "LLMs and the Harry Potter problem"

jackpirate — Tue, 23 Apr 2024 19:20:20 +0000

They very clearly explain why this matters in the "Why should I care?" section. Partially quoting them:

> Harry Potter is an innocent example, but this problem is far more costly when it comes to higher value use-cases. For example, we analyze insurance policies. They’re 70-120 pages long, very dense and expect the reader to create logical links between information spread across pages (say, a sentence each on pages 5 and 95). So, answering a question like “what is my fire damage coverage?” means you have to read: Page 2 (the premium), Page 3 (the deductible and limit), Page 78 (the fire damage exclusions), Page 94 (the legal definition of “fire damage”).

It's not at all obvious how you could write code to do that for you. Solving the "Harry Potter Problem" as stated seems like a natural prerequisite for doing this much more high stakes (and harder to benchmark) task, even if there are "better" ways of solving the Harry Potter problem.

New comment by jackpirate in "The NSA is just days away from taking over the internet"

jackpirate — Wed, 17 Apr 2024 15:52:30 +0000

The WHO list of essential medicines is not just over-the-counter drugs. It includes things like the chemotherapy drug cisplatin. I happened to need that for testicular cancer ~10 years ago, and the treatment cost was $50k (as "payed" by insurance). That overall seems pretty reasonable to me for the treatment I received, but definitely not something I'd expect the median American to be able to pay out of pocket.

New comment by jackpirate in "Limitless: Personalized AI powered by what you've seen, said, and heard"

jackpirate — Mon, 15 Apr 2024 18:50:54 +0000

You're wording in this comment (and the twitter/comment video) gives off the same vibes as the google april 1st videos for things like gmail motion (https://www.youtube.com/playlist?list=PLAD8wFTLnQKeDsINWn8Wj...). I honestly thought this was full sarcasm at first.