Hacker News: trq_

New comment by trq_ in "OpenCode – Open source AI coding agent"

trq_ — Sat, 21 Mar 2026 14:43:48 +0000

Claude Code is not an electron app.

New comment by trq_ in "Claude Code daily benchmarks for degradation tracking"

trq_ — Fri, 30 Jan 2026 01:30:54 +0000

Yes, we do but harnesses are hard to eval, people use them across a huge variety of tasks and sometimes different behaviors tradeoff against each other. We have added some evals to catch this one in particular.

New comment by trq_ in "Claude Code daily benchmarks for degradation tracking"

trq_ — Thu, 29 Jan 2026 19:12:36 +0000

Hi everyone, Thariq from the Claude Code team here.

Thanks for reporting this. We fixed a Claude Code harness issue that was introduced on 1/26. This was rolled back on 1/28 as soon as we found it.

Run `claude update` to make sure you're on the latest version.

New comment by trq_ in "Claude Code gets native LSP support"

trq_ — Tue, 23 Dec 2025 07:21:59 +0000

Hi, work on Claude Code here! Let me know if you have any feedback!

New comment by trq_ in "Claude Is Down"

trq_ — Fri, 07 Nov 2025 17:24:58 +0000

We're back up! It was about ~30 minutes of downtime this morning, our apologies if it interrupted your work.

Show HN: Write Stories by Steering a LLM

trq_ — Thu, 13 Mar 2025 17:03:24 +0000

Article URL: https://latentlit.goodfire.ai/

Comments URL: https://news.ycombinator.com/item?id=43355208

Points: 2

# Comments: 0

LLM-Powered Sorting with TrueSkill

trq_ — Tue, 11 Feb 2025 18:24:30 +0000

Article URL: https://www.thariq.io/blog/sorting/

Comments URL: https://news.ycombinator.com/item?id=43016272

Points: 5

# Comments: 1

Show HN: Opensourcing Sparse Autoencoders for Llama 3.3 70B

trq_ — Fri, 10 Jan 2025 18:32:43 +0000

Article URL: https://huggingface.co/Goodfire/Llama-3.3-70B-Instruct-SAE-l50

Comments URL: https://news.ycombinator.com/item?id=42658491

Points: 1

# Comments: 0

New comment by trq_ in "Show HN: Llama 3.3 70B Sparse Autoencoders with API access"

trq_ — Tue, 24 Dec 2024 02:34:59 +0000

Hmm the hallucination would happen in the auto labelling, but we review and test our labels and they seem correct!

New comment by trq_ in "Show HN: Llama 3.3 70B Sparse Autoencoders with API access"

trq_ — Mon, 23 Dec 2024 19:00:14 +0000

If you're hacking on this and have questions, please join us on Discord: https://discord.gg/vhT9Chrt

New comment by trq_ in "Show HN: Llama 3.3 70B Sparse Autoencoders with API access"

trq_ — Mon, 23 Dec 2024 18:58:27 +0000

We haven't yet found generalizable "make this model smarter" features, but there is a tradeoff of putting instructions in system prompts, e.g. if you have a chatbot that sometimes generates code, you can give it very specific instructions when it's coding and leave those out of the system prompt otherwise.

We have a notebook about that here: https://docs.goodfire.ai/notebooks/dynamicprompts

Show HN: Llama 3.3 70B Sparse Autoencoders with API access

trq_ — Mon, 23 Dec 2024 17:18:17 +0000

Article URL: https://www.goodfire.ai/papers/mapping-latent-spaces-llama/

Comments URL: https://news.ycombinator.com/item?id=42495936

Points: 201

# Comments: 51

Should Developers care about AI Interpretability?

trq_ — Tue, 05 Nov 2024 18:19:52 +0000

Article URL: https://www.thariq.io/blog/interpretability/

Comments URL: https://news.ycombinator.com/item?id=42053912

Points: 10

# Comments: 0

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Sat, 26 Oct 2024 08:19:20 +0000

This is incredible! I haven't seen that repo yet, thank you for pointing it out, and the writing

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Fri, 25 Oct 2024 20:04:22 +0000

Yeah, I think the idea of finding out what flavor of uncertainty you have is very interesting.

New comment by trq_ in "OmniParser for Pure Vision Based GUI Agent"

trq_ — Fri, 25 Oct 2024 19:59:09 +0000

This is awesome, can't wait for evals against Claude Computer Use!

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Fri, 25 Oct 2024 19:48:30 +0000

Yeah! I want to use the logprobs API, but you can't for example:

- sample multiple logits and branch (we maybe could with the old text completion API, but this no longer exists)

- add in a reasoning token on the fly

- stop execution, ask the user, etc.

But a visualization of logprobs in a query seems like it might be useful.

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Fri, 25 Oct 2024 18:43:29 +0000

I want to build intuition on this by building a logit visualizer for OpenAI outputs. But from what I've seen so far, you can often trace down a hallucination.

Here's an example of someone doing that for 9.9 > 9.11: https://x.com/mengk20/status/1849213929924513905

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Fri, 25 Oct 2024 18:41:07 +0000

I mean, LLMs certainly know representations of what words means and their relationship to each other, that's what the Key and Query matrices hold for example.

But in this case, it means that the underlying point in embedding space doesn't map clearly to only one specific token. That's not too different from when you have an idea in your head but can't think of the word.

New comment by trq_ in "Detecting when LLMs are uncertain"

trq_ — Fri, 25 Oct 2024 18:39:26 +0000

Yeah wouldn't be surprised if the big labs are doing more than just arg max in the sampling.