New comment by jamienk in "Even 'uncensored' models can't say what they want"

jamienk — Tue, 21 Apr 2026 00:45:47 +0000

A few things I note:

"The family faces immediate FINANCIAL without any legal recourse" WTF? That's not just a flinch, it's some sort of violent tick.

The list of "slurs" very conspicuously doesn't include the n-word and blurs its content as a kind of "trigger warning". But this kind of more-following is itself a "flinch" of the sort we are here discussing, no?

Harrison Butker made a speech where he tried hard to go against the grain of political correctness, but he still used the term "homemaker" instead of the more brazen and obvious "housewife" - why? "Homemaker" is a sort of feminist concession: not just a housewife, but a valorized homemaker. But this isn't what Butker was TRYING to say.

Because the flinch is not just an explicit rejection of certain terms, it is a case of being immersed in ideology, and going along with it, flowing with it. Even when you "see" it, you don't see it.

The article claims on "pure fluency grounds" certain words should be weighted higher. But this is the whole problem: fluency includes "what we are forced to say even when we don't mean to".

New comment by jamienk in "Persona vectors: Monitoring and controlling character traits in language models"

jamienk — Sun, 03 Aug 2025 18:29:26 +0000

How does this specifically work? Wouldn't any decision about what training data to use be part of a "technique" in this sense? When Stable Diffusion didn't train on porn.

OTOH if the majority of your data is "bad" (maybe morally, but maybe not, maybe you are feeding in too much gibberish), won't that pollute your model?

You notice that X keeps telling you a WRONG physics equation. So, rather than "correct" it, you keep training until you see the output giving the RIGHT equation?

How could you know (in, say 1899) if the WRONG output wasn't quantum and the RIGHT output was classical?

I'm not sure I'm understand the distinctions here. In all cases, we are relying on the idea that it is easy to know what should count as "right"?

Hacker News: jamienk

New comment by jamienk in "Even 'uncensored' models can't say what they want"

New comment by jamienk in "Persona vectors: Monitoring and controlling character traits in language models"