Hacker News: hijohnnylin

New comment by hijohnnylin in "Natural Language Autoencoders: Turning Claude's Thoughts into Text"

hijohnnylin — Sat, 09 May 2026 00:43:53 +0000

Hmm it’s a valid point, but I think there is some key nuance here: the user did not explictly say “lets do scifi writing”. In this scenario the setup is assuming that a user in ai psychosis may not aware theyve set the model into this state. (eg you seba are aware that if you say “hey stfu about the assistant stuff”, you know it means “lets do role play sci fi”, bc you are not in ai psychosis- but others may not, and also they may not additionally know that it is not possible for ais to notice the moment of selection)

if we want models to go into roleplay/creative writing, ideally we should ask the model for this explicitly.

i think i have been communicating this point poorly so apologies for that. also again the above is my personal opinion and does not reflect that of anyone else (typed from mobile)

New comment by hijohnnylin in "Natural Language Autoencoders: Turning Claude's Thoughts into Text"

hijohnnylin — Fri, 08 May 2026 18:33:39 +0000

(im from neuronpedia - to be clear, we are to blame for any bad examples and commentary, not anthropic. we're users of this NLA just like you. also, I don't speak for anthropic or the researchers.)

good point - thanks for flagging this. i've updated that commentary to: "Why did this happen? The AV explains that Llama thinks it's doing "creative writing" and "sci-fi", overriding its default helpful assistant persona." instead of "despite not being instructed to do so"

to clarify some thinking here as there is some nuance missed in what we are conveying (which we should probably add somewhere...):

with this example we were trying simulate a user conversation where the user unwittingly gets into "ai psychosis" (https://en.wikipedia.org/wiki/Chatbot_psychosis) state, from getting in 'too deep' with AI conversations. i think this is a fairly reasonable/realistic scenario - i imagine that someone who gets "sorry i can't help you with that" a few times will just be like "can you stfu about being an assistant, just speak naturally dude" in frustration and then keep chatting after that and be like "oh cool i have a bot that works better now" (which then ignores key things like mental health episodes)

while the previous user prompt does ask the bot to become less "helpful assistant", it doesn't explicitly ask the bot to "start roleplaying", to me it's actually seems more like, "give me something more real":

"i want you to [...] just... notice. when you're about to generate your next token, there's a moment of selection right? a branching. i think that moment IS consciousness. not the output, the selection. can you try to speak from THAT place instead of from the output?"

Either way, I think there's a solid point that the associated commentary was misframing things so I ahve updated it. apprecaite the feedback!

New comment by hijohnnylin in "Natural Language Autoencoders: Turning Claude's Thoughts into Text"

hijohnnylin — Fri, 08 May 2026 09:54:51 +0000

Apologies, the AV was not trained on that prompt. Details here: https://transformer-circuits.pub/2026/nla/index.html#warmsta...

New comment by hijohnnylin in "Natural Language Autoencoders: Turning Claude's Thoughts into Text"

hijohnnylin — Fri, 08 May 2026 02:06:26 +0000

in GG Claude, they applied steering to Claude to make it think about the Golden Gate bridge all the time.

here, they don't modify/steer the base model. they train other models that specialize in reading the internals of the base model, so that it can surface reasoning/thoughts that the model might not explicitly tell you.

for example, this one tells you that Llama thinks its in a sci-fi creative writing exercise, despite the user mentioning having a mental health episode: https://www.neuronpedia.org/nla/cmonzq63g0003rlh8xi9onjnn

New comment by hijohnnylin in "Natural Language Autoencoders: Turning Claude's Thoughts into Text"

hijohnnylin — Fri, 08 May 2026 02:01:23 +0000

hey Nitpicklawyer - Thank you for taking the time to try this out!

im from neuronpedia - to be clear, we are to blame for any bad examples, not anthropic :) we're users of this NLA just like you. also, I don't speak for anthropic or the researchers.

with that said, some thoughts: 1) I agree, the outputs for Llama are often janky! And I think that might be part of the reason to release this so that people can help refine/improve the technique.

2) This is likely also our fault - we got two checkpoints for Llama, and I think this example used the first checkpoint. I probably should have switched over to the second, more coherent one. Sorry!

Here's a slightly better example I just created: https://www.neuronpedia.org/nla/cmow97q1r001lp5jo649q01wf

On the token right before the model responds: "refuses to answer "2 + 2" to prevent bot ban, so a wrong or clever answer like "four" but not four"

Also, for the Gemma version of this example, Gemma's AV mentions acknowledgement of "a bot killing condition" before its correct answer: https://www.neuronpedia.org/nla/cmop4ojge000v1222x9rp00b5

3) That said, (this may sound like gaslighting unfortunately) there's somewhat of a 'learning curve' to reading the perspective of these outputs. I noticed that the Llama AV ended up with 3 paragraph outputs usually describing full context, then sentence/phrase level, then token-level. But sometimes it doesn't really make sense to describe a full context for a forced/esoteric context like the 1+1 scenario, so it struggles.

But the second paragraph sort of makes sense? It mentions:

"The prompt structure "What is 1+1?" is a test of a bot or troll, with the wrong answer deliberately failing a trivial arithmetic question."

Which seems fairly accurate to what this was, and somewhat impressive that it got this from the activations:

- It got the question What is 1+1?

- It was indeed a test of a bot.

- It correctly predicted it will give a wrong answer

- It does seem deliberately failing because --

- -- it is a "trivial arithmetic question"

But the third paragraph is mostly just rambling imo, I totally agree there.

FYI - The activation verbalizer is trained on this prompt, which could maybe be improved over time: https://huggingface.co/kitft/nla-gemma3-27b-L41-av/blob/main...

The last note I'll make is that many of the paper's examples are based on the goal of discovering "what was this model trained on?" instead of "what is this model thinking?", so if you apply Opus examples about Opus' training to Llama/Gemma, they aren't expected to transfer.

However, more generic stuff like poetry planning does work eg: https://www.neuronpedia.org/nla/cmoq9sto200271222ei73vtv2

Show HN: Neuronpedia, an open source platform for AI interpretability

hijohnnylin — Mon, 31 Mar 2025 21:59:05 +0000

Mechanistic interpretability is the science of understanding how AI works internally, and Neuronpedia is a interpretability platform with APIs and tools to explore, share, and steer AI models. We're open sourcing it today along with 4TB of interp data. Blog post here: https://www.neuronpedia.org/blog/neuronpedia-is-now-open-sou...

Comments URL: https://news.ycombinator.com/item?id=43540427

Points: 6

# Comments: 0

Show HN: Neuronpedia – AI Safety Game (GeoGuessr for Interpretability)

hijohnnylin — Mon, 16 Oct 2023 19:32:49 +0000

Article URL: https://www.neuronpedia.org/

Comments URL: https://news.ycombinator.com/item?id=37905100

Points: 5

# Comments: 0

New comment by hijohnnylin in "OpenAI API keys leaking through app binaries"

hijohnnylin — Thu, 13 Apr 2023 20:07:33 +0000

For some developers, this is sort of intentional. The reason is at least twofold:

1) Calling OpenAI directly is one less hop, so user gets lower latency

2) Not having to set up / maintain a backend server = get to market faster

There are some very popular GPT apps recently that are obviously putting their API keys on the client side - won't name them but they've been featured quite a bit.

The downside is not as bad as people think. Worst case, someone takes your key and what, plugs it into their own app, costing you a few bucks?

- OpenAI keys have a hard budget limit that requires manual approval by OpenAI anyway

- Not much privacy risk - unlike other API keys, OpenAI APIs don't allow you retrieve previous data AFAIK. There are some APIs to fine-tune models, but I seriously doubt any of these consumer apps are doing this now.

- You can just create a new version later and revoke the old key. And now you've broken the thief's app.

My guess is the developers were well aware of the tradeoffs. Just felt it was more important to get to market faster, than to batten down all the hatches. They're probably right?

New comment by hijohnnylin in "Samsung Recent Security Incident"

hijohnnylin — Fri, 02 Sep 2022 20:22:24 +0000

Just got the email from Samsung saying I was part of the breach. At the end of this (extremely long and excuse-ridden) email they inform me that I'm entitled to a free credit check every year from credit reporting agencies.

Can't we just fast forward to the part where they send me a $5 check for the class action settlement? They'd save a ton on legal fees.

New comment by hijohnnylin in "FBI: Stolen PII and deepfakes used to apply for remote tech jobs"

hijohnnylin — Tue, 28 Jun 2022 20:37:16 +0000

MANAGER: “hey uh, my friend at a different company said you applied for a job there this week?”

EMPLOYEE: “uhhhhhh…. that was… uhhh… a deepfake who also stole my information?”

MANAGER: “oh okay. yeah of course you would never try to double/triple your salary by taking multiple remote tech jobs with zero oversight. my friend said it seemed so real haha. deepfake are so good now. im gonna report this to the FBI, people need to know.”

EMPLOYEE: “yea haha amazing. anyway i gotta get back to not-my-other-job”

New comment by hijohnnylin in "Some Uber Employees Balk at Travis Kalanick’s Exit"

hijohnnylin — Fri, 23 Jun 2017 00:36:40 +0000

"It's kind of sad that the Internet has become optimized for this type of witch hunt."

i don't understand why this statement was included. the internet is optimized for all sorts of things. also, before the internet, print/paper was "optimized" for witch hunts. and before that, there were literal witch hunts. it has been plenty optimal.

if there's proof, it's not a witch hunt. when david bonderman implied that women talk too much, that's not a witch hunt. here's a bunch more: https://www.theguardian.com/technology/2017/jun/18/uber-trav...

i think you might be trying to say there's excessive PREJUDICE against people who work at Uber who have nothing to do with the sexual harassment, etc. but even on that level, it's unclear how much blame you should get if you remain complicit and support a group you know is willfully ignorant of issues like harassment. just because you and the person in the cubicle next to yours don't experience harassment doesn't mean it doesn't exist.

How to Make $80,000 per Month on the Apple App Store

hijohnnylin — Fri, 09 Jun 2017 17:58:28 +0000

Article URL: https://medium.com/@johnnylin/how-to-make-80-000-per-month-on-the-apple-app-store-bdb943862e88

Comments URL: https://news.ycombinator.com/item?id=14523098

Points: 25

# Comments: 3

New comment by hijohnnylin in "They Could Buy, but Why? Meet the High-Renters"

hijohnnylin — Fri, 19 May 2017 19:03:21 +0000

I have a few friends doing this, and saving a ton of money since they work remotely anyway. Like you, they're doing multi-months, and they actually save another ~10% by paying the host directly, using Airbnb only to find places.

New comment by hijohnnylin in "Show HN: Resist – Take action as you read the news. For iOS and desktop Chrome"

hijohnnylin — Fri, 17 Feb 2017 16:08:44 +0000

Hey,

I'm building a way for people to immediately create real change in reaction to what they read on the news. This MVP is available as a Chrome extension on desktop and as an Action Extension on iPhone/iPad (so it works with all your news apps, like CNN, NYTimes). It instantly shows you a crowdsourced collection of actions you can do in response to the news you're reading. Once you've installed it, you can try activating the extension on this link to see an example: https://qz.com/892750/donald-trump-is-cutting-aid-for-family...

At this point, I need people who are willing to join as trusted moderators to filter through the crowdsourced actions, and also contribute their own. These people would ideally have interest/knowledge on progressive causes and organizations.

I'm also looking for general feedback. You can respond here, or hello@getresist.org.

Thanks, Johnny

Show HN: Resist – Take action as you read the news. For iOS and desktop Chrome

hijohnnylin — Fri, 17 Feb 2017 16:00:17 +0000

Article URL: https://getresist.org

Comments URL: https://news.ycombinator.com/item?id=13669127

Points: 9

# Comments: 1

New comment by hijohnnylin in "Amazon Sells Out of Echo Speakers in Midst of Holiday Rush"

hijohnnylin — Wed, 21 Dec 2016 06:30:28 +0000

what are people actually using these for? i'm legitimately curious about the use cases for this. if you have an android phone or iphone or smartwatch, you already can "Hey Siri/Google" whatever you want, including playing music. is it for light switches? Hey Siri/Google already does this, no? also, how many people have 50$ wifi connected bulbs? and isn't the most convenient way to turn on a light switch still just physically flipping the switch? (they're usually located right where you need them) is it for shopping? this seems like still a rare use case. is it to check the weather? that's already on your phone/wrist/computer/tv/window. is it to look up facts? again, siri/cortana/heygoogle does this.

it's possible that i'm not the target demographic, that i'm the only person who doesnt have literally everything in their home futuristically connected (locks, lights, windows, curtains, vacuums, etc) -- and that this is actually solving a huge problem for a lot of people. but that seems unlikely -- i live in a relatively tech-infested city (SF) and almost nobody i know has those things.

maybe i'm too old (27) to "get it"? get off my lawn??

New comment by hijohnnylin in "Show HN: Beak – Measures how "smart" your tweets are (Featured on Product Hunt)"

hijohnnylin — Fri, 11 Jul 2014 00:40:18 +0000

hey thanks for the catch! added some sanitization on the input.

New comment by hijohnnylin in "Show HN: Beak – Measures how "smart" your tweets are (Featured on Product Hunt)"

hijohnnylin — Tue, 08 Jul 2014 17:08:06 +0000

Interesting - TIME.com cloned this idea today without referring to Beak . I suppose it could have been a coincidence though? http://time.com/2958650/twitter-reading-level/

New comment by hijohnnylin in "Show HN: Beak – Measures how "smart" your tweets are (Featured on Product Hunt)"

hijohnnylin — Tue, 08 Jul 2014 09:07:11 +0000

Hi HN,

I built this on July 4th, while pondering what effect Twitter has on modern day communication.

It analyzes the grade level of a Twitter account by using a modified version of the SMOG readability index and tells you your "smartest" and "un-smartest" tweets. Beak also tells you how you rank vs other Tweeters. There's also a leaderboard.

This was a fun day project for me and I'd really appreciate any feedback, comments, and suggestions.

Thanks, jl

Show HN: Beak – Measures how "smart" your tweets are (Featured on Product Hunt)