Hacker News: mofeien

New comment by mofeien in "Small models also found the vulnerabilities that Mythos found"

mofeien — Sun, 12 Apr 2026 08:28:25 +0000

These results were based on "a trivial snippet from the OWASP benchmark". In the section "caveats and limitations" they state that sonnet 4.6 and opus 4.6 now pass.

And they decided to base the false positive examination on a single snippet of a publicly known benchmark question (that small models are known to be heavily fine tuned for) instead of the real use case of finding actual vulnerabilities across an entire codebase by using a for loop and checking the false positive rate there.

This is disingenuous at best, or even misleading by omission if the second approach _was_ done but not mentioned because it just confirmed that the false positive rate of small models is enormous. Given how all seven small models identified the FreeBSD Bug when pointed to it, and how how 6/7 small models still identified the "bug" even after the patch was applied, that second outcome seems likely...

New comment by mofeien in "Small models also found the vulnerabilities that Mythos found"

mofeien — Sun, 12 Apr 2026 08:12:58 +0000

... or maybe when you see them triggered or exploited reproducibly, then the underlying bug will also be pretty easy to discover. But at that point, it's already too late. :)

I really like your original point, I never thought about it this way.

Anthropic has just built an AI that could take down the internet

mofeien — Fri, 10 Apr 2026 19:41:41 +0000

Article URL: https://pauseai.substack.com/p/anthropic-has-just-built-an-ai-that

Comments URL: https://news.ycombinator.com/item?id=47722727

Points: 3

# Comments: 0

New comment by mofeien in "System Card: Claude Mythos Preview [pdf]"

mofeien — Wed, 08 Apr 2026 14:35:01 +0000

I can think of several possible messy outcomes that would be able to directly affect me, not all mutually exclusive:

- Job loss by me being replaced by an AI or by somebody using an AI. Or by an AI using an AI.

- Resulting societal instability once blue collar jobs get fully automated at scale, and there is no plan in place to replace this loss of peoples' livelihoods.

- People turning to AI models instead of friends for emotional support, loss of human connection.

- Erosion of democracy by making authoritarianism and control very scalable, broad in-detail population surveillance and automated investigation using LLMs that was previously bounded by manpower.

- Autonomous weapons, "Slaughterbots" as in the short film from 2017

- Biorisk through dangerous biological capabilities that enable a smaller team of less skilled terrorists to use a jailbroken LLM to create something dangerous.

- Other powers in the world deciding that this technology is too powerful in the hands of the US, or too dangerous to be built at all and has to be stopped by all means.

- Loss of/Voluntary ceding of control over something much smarter than us. "If Anyone Build It, Everyone Dies"

New comment by mofeien in "System Card: Claude Mythos Preview [pdf]"

mofeien — Tue, 07 Apr 2026 19:13:41 +0000

I am freaking out. The world is going to get very messy extremely quickly in one or two further jumps in capability like this.

New comment by mofeien in "System Card: Claude Mythos Preview [pdf]"

mofeien — Tue, 07 Apr 2026 19:08:34 +0000

Fictional timeline that holds up pretty well so far: https://ai-2027.com/

New comment by mofeien in "AI (2014)"

mofeien — Fri, 20 Mar 2026 12:17:36 +0000

> > The most positive outcome I can think of is one where computers get really good at doing, and humans get really good at thinking.

> This is where LLM is currently going.

This is not where LLMs are currently going. They are trained and benchmarked explicitly in all areas that humans produce economically and cognitively valuable work: STEM fields, computer use, robotics, etc.

Systems are already emerging where AI agents autonomously orchestrate subagents which again all work towards a goal autonomously and only from time to time communicate with you to give you status updates.

Thinking that you as a slow human will be needed for much longer to fill some crucial role in this AI system that it cannot solve by itself, and to bring some crucial skill of creativity or thinking to the table that it cannot generate itself is just wishful thinking. And to me personally, telling an AI to "do cool thing X" without having made any contribution to it beyond the initial prompt also feels very depressing and seems like much less fun than actually feeling valued in what I do. I'm sorry for sounding harsh.

New comment by mofeien in "Meta’s AI smart glasses and data privacy concerns"

mofeien — Tue, 03 Mar 2026 11:12:27 +0000

> Even if they wanted to fix this by making the light sensor do a constant check it wouldn't work as the privacy led light indicator is triggering the same sensor,

The privacy led light could just turn off for a couple of milliseconds (or less) while the light sensor performs its check.

New comment by mofeien in "New Nick Bostrom Paper: Optimal Timing for Superintelligence [pdf]"

mofeien — Fri, 13 Feb 2026 08:44:30 +0000

There are some basic reasoning steps about the environment that we live in that don't only apply to humans, but also other animals and geterally any goal-driven being. Such as "an agent is more likely to achieve its goal if it keeps on existing" or "in order to keep existing, it's beneficial to understand what other acting beings want and are capable of" or "in order to keep existing, it's beneficial to be cute/persuasive/powerful/ruthless" or "in order to more effectively reach it's goals, it is beneficial for an agent to learn about the rules governing the environment it acts in".

Some of these statements derive from the dynamics in our current environment were living in, such as that we're acting beings competing for scarce resources. Others follow even more straightforwardly logically, such as that you have more options for agency if you stay alive/turned on.

These goals are called instrumental goals and they are subgoals that apply to most if not all terminal goals an agentic being might have. Therefore any agent that is trained to achieve a wide variety of goals within this environment will likely optimize itself towards some or all of these sub-goals above. And this is no matter by which outer optimization they were trained by, be it evolution, selective breeding of cute puppies, or RLHF.

And LLMs already show these self-preserving behaviors in experiments, where they resist to be turned off and e. g. start blackmailing attempts on humans.

Compare these generally agentic beings with e. g. a chess engine stockfish that is trained/optimized as a narrow AI in a very different environment. It also strives for survival of its pieces to further its goal of maximizing winning percentage, but the inner optimization is less apparent than with LLMs where you can listen to its inner chain of thought reasoning about the environment.

The AGI may very well have pacifistic values, or it my not, or it may target a terminal goal for which human existence is irrelevant or even a hindrance. What can be said is that when the AGI has a human or superhuman level of understanding about the environment then it will converge toward understanding of these instrumental subgoals, too and target these as needed.

And then, some people think that most of the optimal paths towards reaching some terminal goal the AI might have don't contain any humans or much of what humans value in them, and thus it's important to solve the AI alignment problem first to align it with our values before developing capabilities further, or else it will likely kill everyone and destroy everything you love and value in this universe.

New comment by mofeien in "We tasked Opus 4.6 using agent teams to build a C Compiler"

mofeien — Thu, 05 Feb 2026 20:19:45 +0000

Obviously a human in the loop is always needed and this technology that is specifically trained to excel at all cognitive tasks that humans are capable of will lead to infinite new jobs being created. /s

New comment by mofeien in "Anki ownership transferred to AnkiHub"

mofeien — Tue, 03 Feb 2026 13:47:00 +0000

Regarding the "wrong direction" issue: In my experience it could also have just been the case that both directions had card templates, but due to some sorting order of new cards setting all Chinese->English cards would appear before any English->Chinese.

If that is the case, it could be corrected in the deck options. And if the English->Chinese cards are missing altogether they can be created from the note by adding a new card template to the note.

New comment by mofeien in "AGI fantasy is a blocker to actual engineering"

mofeien — Fri, 14 Nov 2025 14:28:32 +0000

> As a technologist I want to solve problems effectively (by bringing about the desired, correct result), efficiently (with minimal waste) and without harm (to people or the environment).

> LLMs-as-AGI fail on all three fronts. The computational profligacy of LLMs-as-AGI is dissatisfying, and the exploitation of data workers and the environment unacceptable.

It's a bit unsatisfying how the last paragraph only argues against the second and third points, but is missing an explanation on how LLMs fail at the first goal as was claimed. As far as I can tell, they are already quite effective and correct at what they do and will only get better with no skill ceiling in sight.

New comment by mofeien in "AGI is an engineering problem, not a model training problem"

mofeien — Sun, 24 Aug 2025 07:30:30 +0000

There is the concept of n-t-AGI, which is capable of performing tasks that would take n humans t time. So a single AI system that is capable of rediscovering much of science from basic principles could be classified as something like 10'000'000humans-2500years-AGI, which could already be reasonably considered artificial superintelligence.

New comment by mofeien in "AI is different"

mofeien — Sat, 16 Aug 2025 09:12:01 +0000

> there are plenty of things that LLMs cannot do that a professor could make his students do.

Name three?

New comment by mofeien in "AI is different"

mofeien — Sat, 16 Aug 2025 09:00:43 +0000

> What makes you think that? Self driving cars [...]

AI is intentionally being developed to be able to make decisions in any domain humans work in. This is unlike any previous technology.

The more apt analogy is to other species. When was the last time there was something other than homo sapiens that could carry on an interesting conversation with homo sapiens. 40,000 years?

And this new thing has been in development for what? 70 years? The rise in its capabilities has been absolutely meteoric and we don't know where the ceiling is.

New comment by mofeien in "The new science of “emergent misalignment”"

mofeien — Fri, 15 Aug 2025 08:55:41 +0000

People like yudkowsky might have polarizing opinions and may not be the easiest to listen to, especially if you disagree with them. Is this your best rebuttal, though?

New comment by mofeien in "Seven replies to the viral Apple reasoning paper and why they fall short"

mofeien — Sun, 15 Jun 2025 08:37:54 +0000

> We accomplish this by forming concepts such as "ledge", "step", "person", "gravity", etc., as we experience them until they exist in our mind as purely rational concepts we can use to reason about new experiences.

So we receive inputs from the environment and cluster them into observations about concepts, and form a collection of truth statements about them. Some of them may be wrong, or apply conditionally. These are probabilistic beliefs learned a posteriori from our experiences. Then we can do some a priori thinking about them with our eyes and ears closed with minimal further input from the environment. We may generate some new truth statements that we have not thought about before (e. g. "stepping over the ledge might not cause us to fall because gravity might stop at the ledge") and assign subjective probabilities to them.

This makes the a priori seem to always depend on previous a posterioris, and simply mark the cutoff from when you stop taking environmental input into account for your reasoning within a "thinking session". Actually, you might even change your mind mid-reasoning process based on the outcome of a thought experiment you perform which you use to update your internal facts collection. This would give the a priori reasing you're currently doing an even stronger a posteriori character. To me, these observations above basically dissolve the concept of a priori thinking.

And this makes it seem like we are very much working from probabilistic models, all the time. To answer how we can know anything: If a statement's subjective probability becomes high enough, we qualify it as a fact (and may be wrong about it sometimes). But this allows us to justify other statements (validly, in ~ 1-sometimes of cases). Hopefully our world model map converges towards a useful part of the territory!

New comment by mofeien in "OpenAI o3-pro"

mofeien — Thu, 12 Jun 2025 07:47:16 +0000

Can you explain why?

New comment by mofeien in "Claude 4"

mofeien — Thu, 22 May 2025 21:56:32 +0000

There is lots of discussion in this comment thread about how much this behavior arises from the AI role-playing and pattern matching to fiction in the training data, but what I think is missing is a deeper point about instrumental convergence: systems that are goal-driven converge to similar goals of self-preservation, resource acquisition and goal integrity. This can be observed in animals and humans. And even if science fiction stories were not in the training data, there is more than enough training data describing the laws of nature for a sufficiently advanced model to easily infer simple facts such as "in order for an acting being to reach its goals, it's favorable for it to continue existing".

In the end, at scale it doesn't matter where the AI model learns these instrumental goals from. Either it learns it from human fiction written by humans who have learned these concepts through interacting with the laws of nature. Or it learns it from observing nature and descriptions of nature in the training data itself, where these concepts are abundantly visible.

And an AI system that has learned these concepts and which surpasses us humans in speed of thought, knowledge, reasoning power and other capabilities will pursue these instrumental goals efficiently and effectively and ruthlessly in order to achieve whatever goal it is that has been given to it.

New comment by mofeien in "Show HN: Resonate – real-time high temporal resolution spectral analysis"

mofeien — Tue, 15 Apr 2025 22:30:53 +0000

If you already have the implementation for the CQT, wouldn't you just be able to replace the morlet wavelet used in the CQT by the gammatone wavelet without much of on efficiency hit? I'm just learning about the gammatone filter, and it sounds interesting since it apparently better models human hearing.