Hacker News: cfcf14

New comment by cfcf14 in "Changes in the system prompt between Claude Opus 4.6 and 4.7"

cfcf14 — Sun, 19 Apr 2026 12:57:19 +0000

I would assume so too, so the costs would not be so substantial to Anthropic.

New comment by cfcf14 in "Changes in the system prompt between Claude Opus 4.6 and 4.7"

cfcf14 — Sun, 19 Apr 2026 11:42:45 +0000

I'm curious as to why 4.7 seems obsessed with avoiding any actions that could help the user create or enhance malware. The system prompts seem similar on the matter, so I wonder if this is an early attempt by Anthropic to use steering vector injection?

The malware paranoia is so strong that my company has had to temporarily block use of 4.7 on our IDE of choice, as the model was behaving in a concerningly unaligned way, as well as spending large amounts of token budget contemplating whether any particular code or task was related to malware development (we are a relatively boring financial services entity - the jokes write themselves).

In one case I actually encountered a situation where I felt that the model was deliberately failing execute a particular task, and when queried the tool output that it was trying to abide by directives about malware. I know that model introspection reporting is of poor quality and unreliable, but in this specific case I did not 'hint' it in any way. This feels qualitatively like Claude Golden Gate Bridge territory, hence my earlier contemplation on steering vectors. I've been many other people online complaining about the malware paranoia too, especially on reddit, so I don't think it's just me!

New comment by cfcf14 in "Scaling Latent Reasoning via Looped Language Models"

cfcf14 — Sun, 04 Jan 2026 11:23:39 +0000

This makes me think it would be nice to see some kinda child of modern transformer architecture and neural ODEs. There was such interesting work a few years ago on how neural ode/pdes could be seen as a sort of continuous limit of layer depth. Maybe models could learn cool stuff if the embeddings were somehow dynamical model solutions or something.

New comment by cfcf14 in "O3-mini simulated scikit calculations"

cfcf14 — Mon, 24 Feb 2025 07:49:56 +0000

The obvious next step here is to see how well this generalises to arbitrary inputs :)

New comment by cfcf14 in "Internal representations of LLMs encode information about truthfulness"

cfcf14 — Wed, 30 Oct 2024 15:28:07 +0000

Did your read the paper? Do you have specific criticisms of their problem statement, methodology, or results? There is a growing body of research indicating that in fact, there _is_ a taxonomy of 'hallucinations', that they might have different causes and representations, and that there are technical mitigations which have varying levels of effectiveness.

New comment by cfcf14 in "Do AI detectors work? Students face false cheating accusations"

cfcf14 — Mon, 21 Oct 2024 07:39:41 +0000

AI detectors do not work. I have spoken with many people who think that the particular writing style of commercial LLMs (ChatGPT, Gemini, Claude) is the result of some intrinsic characteristic of LLMs - either the data or the architecture. The belief is that this particular tone of 'voice' (chirpy sycophant), textual structure (bullet lists and verbosity), and vocab ('delve', et al) serves and and will continue to serve as an easy identifier of generated content.

Unfortunately, this is not the case. You can detect only the most obvious cases of the output from these tools. The distinctive presentation of these tools is a very intentional design choice - partly by the construction of the RLHF process, partly through the incentives given to and selection of human feedback agents, and in the case of Claude, partly through direct steering through SA (sparse autoencoder activation manipulation). This is done for mostly obvious reasons: it's inoffensive, 'seems' to be truth-y and informative (qualities selected for in the RLHF process), and doesn't ask much of the user. The models are also steered to avoid having a clear 'point of view', agenda, point-to-make, and on on, characteristics which tend to identify a human writer. They are steered away from highly persuasive behaviour, although there is evidence that they are extremely effective at writing this way (https://www.anthropic.com/news/measuring-model-persuasivenes...). The same arguments apply to spelling and grammar errors, and so on. These are design choices for public facing, commercial products with no particular audience.

An AI detector may be able to identify that a text has some of these properties in cases where they are exceptionally obvious, but fails in the general case. Worse still, students will begin to naturally write like these tools because they are continually exposed to text produced by them!

You can easily get an LLM to produce text in a variety of styles, some which are dissimilar to normal human writing entirely, such as unique ones which are the amalgamation of many different and discordant styles. You can get the models to produce highly coherent text which is indistinguishable from that of any individual person with any particular agenda and tone of voice that you want. You can get the models to produce text with varying cadence, with incredible cleverness of diction and structure, with intermittent errors and backtracking and _anything else you can imagine. It's not super easy to get the commercial products to do this, but trivial to get an open source model to behave this way. So you can guarantee that there are a million open source solutions for students and working professionals that will pop up to produce 'undetectable' AI output. This battle is lost, and there is no closing pandora's box. My earlier point about students slowly adopting the style of the commercial LLMs really frightens me in particular, because it is a shallow, pointless way of writing which demands little to no interaction with the text, tends to be devoid of questions or rhetorical devices, and in my opinion, makes us worse at thinking.

We need to search for new solutions and new approaches for education.

New comment by cfcf14 in "Nobel Prize in Physics awarded to John Hopfield and Geoffrey Hinton [pdf]"

cfcf14 — Tue, 08 Oct 2024 10:17:00 +0000

So uh, things are not looking so good for actual physics these days, I gather?

New comment by cfcf14 in "Self-experiment with L-theanine: effect on sleep and cognition"

cfcf14 — Tue, 08 Oct 2024 07:00:32 +0000

L-theanine (200mg) with around 100-150mg of caffeine has an extremely noticeable, positive effect on my ability to focus, feeling of "well-situatedness", and overall calmness. L-theanine by itself doesn't seem to do much. Caffeine on its own wakes me but makes me feel jittery and anxious, so it's definitely an interaction effect. Taurine has a much smaller effect on calmness, sans interactions - often indistinguishable from any other mild focus exercise like box breathing or stretching.

New comment by cfcf14 in "San Francisco seeks ban of software critics say is used to inflate rents"

cfcf14 — Tue, 13 Aug 2024 14:46:38 +0000

It's not - the FTC released a statement on this very topic a few months ago: https://www.ftc.gov/business-guidance/blog/2024/03/price-fix...

New comment by cfcf14 in "Room inspections at Resorts World confuse, annoy DEF CON attendees"

cfcf14 — Tue, 13 Aug 2024 12:43:37 +0000

Police in large American cities are not likely to be of much assistance in this situation. Assuming they attend at all, I would expect them to not understand the nature of the issue and probably proceed to make it much worse.

New comment by cfcf14 in "Antifragility in complex dynamical systems"

cfcf14 — Tue, 13 Aug 2024 09:22:43 +0000

After reading the paper, I'm really unsure what the novel contribution is. It feels like they're attempting to rebrand well-understood concepts within various fields (control systems theory, etc). The provided mathematical definition of antifragility is somewhat unconvincing too: it's not that it's wrong, per say, but in the effort to find something sufficiently broad to apply to many different fields of applied dynamical theory they've had to adopt a definition which is a bit unintuitive, and overly general.

New comment by cfcf14 in "Using GPT as an HVAC control system"

cfcf14 — Wed, 09 Aug 2023 09:13:27 +0000

This is really funny - it's bordering on truly absurd, almost incomprehensible madness to consider doing this seriously. I can't think of a single property you'd desire in a control system (state observability, auditability, guarantees on out-of-band input behaviour, stability under shocks, etc, etc) that would be present in an LLM control model.

I don't want to be disrespectful to the authors, and it's (vaguely) interesting to see how far they've been able to go with this, but this idea is still an abomination.

New comment by cfcf14 in "We now believe that the game is over. LK-99 is not a superconductor"

cfcf14 — Tue, 08 Aug 2023 11:45:18 +0000

Was fun while it lasted! Will be interesting watching the internal story of the original lab unfold as it all becomes public eventually.

New comment by cfcf14 in "Bard’s latest update: more features, languages and countries"

cfcf14 — Thu, 13 Jul 2023 14:54:32 +0000

Yeah - more or less.

I still use it sometimes for trivial stuff like giving me recipe or travel inspiration (using the web search API), but I haven't been using it for any sort of algorithms/coding stuff.

It's useful for tasks which have a high tolerance for not being completely correct. Which is definitely a subset of all tasks that interest me, but it's a smaller subset that I originally thought it would be. It's just really hard to find out where the models are wrong when it's a complex question. If it wasn't, then I wouldn't need the tool, I guess.

New comment by cfcf14 in "Reddit is OpenAI’s moat"

cfcf14 — Wed, 14 Jun 2023 14:20:32 +0000

Some strange claims in this post. The reddit post datasets are already 'out there' in the wild, and I'm fairly certain every other major LLM release has used their data. Also - did Midjourney "steal" DALLE-2's lunch? It's a restrictive service with essentially a discord-only based CLI.

New comment by cfcf14 in "Ask HN: Is it just me or GPT-4's quality has significantly deteriorated lately?"

cfcf14 — Wed, 31 May 2023 08:31:15 +0000

Yeah, definitely. Combination of expert-system gating (some requests probably get routed to weaker models), distillation (for performance/cost), and RLHF lobotomization.

New comment by cfcf14 in "SciPy: Interested in adopting PRIMA, but little appetite for more Fortran code"

cfcf14 — Thu, 18 May 2023 15:06:50 +0000

It's turtles all the way down, except for the final turtle, which is Fortran...

New comment by cfcf14 in "Prompt Engineering: Steer a large pretrained language model to do what you want"

cfcf14 — Mon, 20 Mar 2023 14:43:39 +0000

Lilian Weng's blog is my go-to example for an extremely high quality tech blog, it's truly remarkable how consistently excellent each post is. The only downside is the sadness I feel for being incapable of producing content even remotely near that level of quality myself.

New comment by cfcf14 in "Tell HN: Chase shadow banned and closed my bank accounts"

cfcf14 — Thu, 16 Mar 2023 09:11:13 +0000

This is 100% related to (suspected) fraud, anti money laundering, or other types of financial/political sanctions. You may be 100% innocent, but they will never disclose any information to you about their reasoning (and in fact it is illegal for them to do so). Sorry this has happened.

New comment by cfcf14 in "Urgent: Sign the petition now"

cfcf14 — Sat, 11 Mar 2023 23:55:25 +0000

Absolutely not.