Hacker News: Nevermark

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Tue, 09 Jun 2026 19:34:11 +0000

I am going to tune the expression form of my definition to:

Understanding = Novel Scope * Suitability / Parameter Count.

> My point is that you can have the same result with a representation that "closely matches the topology of the relationships being modelled". For example, a representation that "allows relationships between tokens but yet does not care about the meaning or concept not useful to form convincing sentences".

You are absolutely right, that lack of internal representation-reality correspondence does not rule out real/convincing performance.

> GenAI are not trained to get the higher representation of the world, but to get the best convincing sentence generation.

This is true of all learning. And it will always be the nature of learning.

Which is why performance is always (should be) measured on novel input.

> High compactness does not equal best solution. Even humans don't used "high compactness" when doing basic arithmetic, but use "by heart multiplication table".

This is a really good point!

It brings up the two useful modes of human representation:

(1) The brain's slow mode is very good at handling deeper and deeper layers of representation. When thinking about arithmetic or more complex math analytically, our understanding does follow a path of increasingly deeper representations. And we are very good at applying these deeper understandings.

(2) Then, our fast mode creates shallow representations of things we do frequently.

I would look at this as (1) reflecting scalable understanding (2) reflecting very limited understanding, but scalable speed.

And we often use both modes together.

I would argue that the understanding is primarily in the slow mode. That the fast mode, is the non-understanding but appropriate response mode. And that it operates with a much reduced scope of appropriate response, but a high percentage of applicability. Meaning, most of the time we don't need to use deep understanding we just need fast appropriate response.

But how to compare the two in scopes where they are equally accurate?

I think "high understanding" representations are those very flexible to being used in ways quite different from how they were learned.

Our slow mode does this very well. Our fast mode not so well, but to the degree it generalizes well to novel situations, that would be an increase in understanding.

Our fast system does generalize, but I would argue that at some point it fails, where our slower deeper representations provide the means of analyzing a situation. So it clearly "understands" better.

It is interesting how quickly understanding from our analytical side translates into operation on our fast side. Clearly, our fast side has very efficient access to new "patterns" that our slow side constructs.

> If you want to understand a paper plane trajectory, it is a complex system, and you probably need plenty of parameters to describe the gravity, the wind at each position and each time, the shape of the plane at each time, ... But you can describe the trajectory with just few parameters using a Bezier curve.

I love this example. It does contrast very different kinds of understanding.

(1) Understanding the fundamental reality in which paper planes exist,

(2) Vs. understanding how paper planes behave.

I think my expression works well here, as long as we take "scope" seriously.

Understanding = Novel Scope * Suitability / Parameter Count.

For paper planes as a hobby, a smaller neuron/parameter budget is achieved by learning the emergent laws of paper planes, not their underlying physics. And understanding paper planes is achieved with this smaller budget.

For understanding paper plane dynamics at a design level, a smaller neuron/parameter budget is achieved by learning the underlying physics of aerodynamics at an intuitive level.

For understanding paper plane dynamics at a world class competition level, a smaller neuron/parameter budget is achieved by learning the underlying physics of aerodynamics at an analytical level.

So these would be three different "understandings", each with their own scope and area of appropriate response to novel situations.

Point taken: The most fundamental correspondence isn't the point of a lot of understanding.

You are right, and my equation works, as long "scope" is interpreted to mean appropriate level of interest, not area of fundamental physics involved. Great point.

Does that get us on the same page? Closer?

> I am saying that creating convincing sentences does not require understanding

As problem complexity goes up, there really is an explosive difference between appropriate response via "familiarity" or lower-level fit, vs. higher level fit, for the same number of parameters.

And it is also a dramatically bigger challenge for lower-level fits to respond well to novel stimuli, given the same number of parameters.

The reason is, is that complex problems operate in higher dimensional spaces, and relationships in higher dimensional spaces have exponentially more complexity for any level of representation. Exponentially.

Linear fits of a 2D bezier are inefficient but work. Linear fits for a 100 dimensional bezier, which isn't very many dimensions from a data standpoint, become ludicrously expensive in parameters.

The dimensionality of human communication is probably the most complex problem ever tackled systematically.

I am trying to think of a way to capture this more concretely. I.e. a way to draw a line in this conversation that stands up on its own. All I can point to, is the complete failure of any lower-level fit when done directly, to acheive a trillionth of a trillionth of trillions of the flexibility that SOTA models demonstrate. The extreme dimensionality of input that LLMs respond to, makes my "trillionths" literal in this case. And we do get a concrete measure of the dimensionality within their capacity, as context windows give us live demonstrations of this.

Note that language is literally highly compressed information, with pervasive non-local interactions. The enormous dimensionality is compounded by dense reactivity, pervasive discontinuities. No other informational artifact compares to language complexity.

When I say that this is a case where either real relationships are learned or the model fails, it is because the number of parameters for a lower-level fit really are beyond imagining.

You can't point to any lower-level fit, where the lower-level fit is basic to the fitting algorithm, that ever achieved even a tiny-grammar tiny-subject-scope toy of a toy version, to what LLMs are doing. Nor can I, despite following progress for decades. Nobody can. The original successes of the first LLMs, modest as they appear now, were completely unprecedented.

There just are not enough parameters, by many orders of magnitude, to do language justice over a context window, and respond sensibly to intentionally novel conversations, without identifying the actual relationships behind it.

So that would be my challenge to you. To identify any verifiable lower-level fit that even approximates LLM behavior at the tiniest of toy levels. Verifiable fits at any given level are easy to do, just train a model where the basis is restricted to that kind of fit.

Otherwise, I can agree that understanding is a continuous property, and that how well something understands something, without strict benchmarking by well thought out benchmarks, involves intuition and judgement. So there can be legitimate differences in how we perceive model understanding, in the absence of direct measures.

Any more thoughts? I have understood both myself and your points better as we went along.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Tue, 09 Jun 2026 04:03:57 +0000

Is that more coherent?

New comment by Nevermark in "Surveillance is not safety: A statement on the UK's latest threat to privacy [pdf]"

Nevermark — Tue, 09 Jun 2026 00:35:48 +0000

Surveillance replaces ostensible individual fringe threats with a clear dangerous pervasive and (for practical purposes) irreversible threat that monotonically aggregates increasing centralized leverage over every aspect our lives, direct and indirect.

Knowledge is power. Forced revelation of our inner lives puts each of us in a position of vulnerability.

Even when "not abused", the very real latent threat actively takes away freedoms of thought and action.

It is extreme abuse.

It undermines any sense that the state works for the people, when it operationally embodies a maximalized one-way threat over all citizens.

AI collation exponentially compounds the threat, the passive and active damage.

One of the wisest ethical/safety concepts ever: "The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated."

Democracy that sets up the levers of total autocracy is the greatest possible perversion and threat to democracy. Democracy only works as long as it recognizes government is the greatest threat to freedom. And that strict limitations on its power over citizens is the only defense.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Mon, 08 Jun 2026 21:33:08 +0000

Apologies. Your pushback (frustration and patience) has helped me crystalize my view, thank you.

1. Define understanding.

My definition isn't vague: "a compact representation enabled because that representation's topology closely matches the topology of the relationships being modeling."

Understanding = Scope and Suitability of Behavior / # Parameters.

Useful property: This definition applies across all scales: Scientists and mathematicians increase our understanding, every time patchworks of relationships get replaced with a simpler underlying insight.

Another useful property: It distinguishes between better understanding and having more facts. Facts improve performance but do not (non-trivially) decrease parameters.

What is your definition? In measurable terms?

2. You keep avoiding a basic aspect of modeling:

Higher compactness is achieved by higher representation correspondence between a model and the modeled.

Yes, lower level representations can work. Even well, without good "understanding". But not as compactly. And as problem complexity grows, the relative difference in parameter budgets for high-correspondence and low-correspondence representations explode.

This is not a subtle effect.

The hallmark of lower-level fitting is the far greater number of parameters required.

Dead simple example: Piece-wise linear vs. polynomial fitting of Bezier curves. Accuracy / parameter is far greater for the latter, because the representation matches the relationships being modeled.

That is an intentionally trivial example, but the same relationship holds for any problem.

You keep avoiding that.

3. Today's LLM models are very compact compared to humans.

Compressing the substance of a corpus of global human writing into less than 1% of a single human's parameter space is compact.

Humans have 100–200 trillion, some people think 500 trillion, synapses.

How do you argue that behavior scope and suitability / parameters is not remarkable, when it is remarkable compared to any specific human you could point to?

No human can converse reasonably across the scope of global communication. But these models can. For <1% of a human's parameter budget.

4. Finally, based on your clear definition, how do you argue that humans understand but models do not? Saying we are different is a copout. Defining understanding as us vs. other is both circular and unenlightening. And ignores the real progress models are clearly making relative to humans.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Mon, 08 Jun 2026 05:37:09 +0000

> Well, I cannot do that, because for you, if it looks realistic, it has to have understanding.

Yes, if it consistently produces good output for highly varied stimuli that can be intentionally picked to have been unlikely to ever had obvious representation in the training set, then yes it understands.

I think we are talking past each other a bit.

A series of increasingly challenging datasets, used to capture scaling efficiencies, would ground our discussion.

But the level of performance for models is simply too good vs. the number of parameters to be doing anything trivial.

Deep learning models do something combinatorial models do not. The linear tensor + non-linear transforms do two special things:

1. The tensor itself just projects a linear space into higher dimensions, but its still the same information space. Project a 2D surface into higher dimensions linearly, and there can be more parameters, but it is not more information, since there is an expansion of linear dependence to match.

2a. But then the nonlinear both (a) thresholds, squashes or otherwise alters the linear results, in a way that removes linear dependencies, increasing the useful dimensionality of the representation.

2b. And the squashing also allows dimensions to be folded down.

So by both expanding and flattening representational dimensions, deep learning models are able to model higher-order relationship directly, that any less expressive modeling would require cobbling together many patches of fitting.

Another way to put this, is deep learning models are able to learn higher-order relationships directly, not be memorizing and interpolating across learned points or regions.

So a dramatically greater ability to "understand" is why deep learning models are so much better. They are not doing simple combinatorial fitting.

"Understanding" or not, combinatorial relationships are the low bar for deep learning models, they are inherently great a learning much higher-order relationships.

I am falling asleep at this point. I feel like we need a blackboard and a computer. You are saying a lot of things that make me think, and make sense to me.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Sun, 07 Jun 2026 12:18:59 +0000

I am not sure what you mean by complex combinatorial. If we are talking about combinatorial, its combinatorial. N can be very large, but it is going to scale like combinatorial, not something else.

I just started out with mapping to be systematic. Mapping is ground zero, then interpolation i.e. any smooth fitting function or basis, then combinatorial where different bases are recognized and then project relative to their relevance to a new input.

Each of those increase modeling efficiency and power, but even combinatorial doesn't scale to problems like language.

I may be doing a poor job communicating. A formal breakdown of the scaling issues with lower order, but scaled to make up for it, modeling would be a great paper.

To prove me wrong (as a thought experiment), choose a lower order model, any kind you can imagine that would qualify as modeling without understanding. Demonstrate it can do anything close. That it could possibly scale to the human corpus with just a trillion parameters.

If it the number of parameters goes up far too fast, then that can't be the way deep learning solves the problem with a trillion, or a few billion, either.

And consider the other side. We have no idea how our own brains are lifting up what is relevant vs. what is not. We are used to it happening. We call it "understanding". But we don't know how it works, how we work. Despite experiencing it.

What we do know, because combinatorial is too resource intensive, is we are not just combinatorial either.

New comment by Nevermark in "The new bibliomaniacs"

Nevermark — Sun, 07 Jun 2026 05:56:41 +0000

> The book itself is a kind of memory totem

My phrase for this is "Books are bookmarks".

Even unread books form a physical reminder to read, and of the import of the topic they cover.

When I come across a book that covers something important well, I buy it. I will likely read it, but even if it just keeps reminding me of the topic, reinforcing my integrated web of understanding, it is doing good.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Sun, 07 Jun 2026 04:01:03 +0000

> How do you know the output is not the result of combinatorial interactions?

(A bit of an essay, but it is a good question!)

REASON 1, How simpler representations fail:

Lesser understandings reveal themselves to novel combinations of prompts.

Mapping fails immediately because it fails on even trivial differences.

Interpolation fails immediately, because the function isn't smooth and the information it needs to model, human language and thought, combines non-linearly, non-locally and with higher-order relationships.

Combinatorial fails as soon as you create a prompt that involves novel non-linear or higher-order interactions. I.e. new combinations.

REASON 2, Parameter requirements of simpler representations:

For human-resembling sensible chats, mapping requires an example of every case. It would require combining the entire training set, with an optimized index. Essentially a search on the whole body with tricks to return anything sensible for even a slight mismatch.

Interpolation, ..., I don't even know how that could work. Again the whole corpus of training data, with some kind of gradient composition overlayed across it. It is an interesting research idea, but the possible mixing of tokens makes this unreasonable for anything but toy problems.

Combinatorial encodings, would have to have parameters operating across all the possible ways to combine relationships. There can be some relationship compression, to a base set of represented concepts, and then a combinatorial explosion of parameters for how to combine them.

I include statistical / stochastic transforms here as continuous combinatorial transforms.

Those could do the job, but more parameters than atoms in the universe might be required, for all possible topic/detail compositions.

REASON 3, Training corpus requirements to learn successful lesser representations.

Obviously the training data, even of all human communication, provides only a fraction of possible exact things that could be said. Not enough data for mapping even if infinite resources for creating a map were available.

Interpolation also suffers, because whatever correlations and smooth compressions of the training data can be made, it is still data that barely touches the kinds of sensible compositions that are possible.

And the same for combinatorial. There just isn't a fraction of an infinitesimal number of examples of combined topics and details, compared to what can be sensibly combined in any new conversation. You can't extract combinatorial compressions that don't exist.

REASON 4, Hiding one representation in another doesn't create opportunities that didn't exist before.

These methods all fail when used directly. The problems are not the kind that pushing the same transforms into a deep learning model solves.

The requirements for astronomically more parameters and training data are not met by embedding those kinds of representations into another model.

SOTA models are not operating with cosmological numbers of parameters, or training data that combinatorially represents concept interactions.

Being a deep learning model doesn't somehow lessen the requirements, needed to successfully perform, if it is learning via those lesser representations.

REASON 5, Test a model:

So let's test whether the model is doing more. If it fails for novel combinations of complex topics, then it might only be doing simpler things.

If it is robust to novel situations, then it cannot be operating by doing simpler things that don't scale.

Ask a model to: Write up a Supreme Court pleading for the rights of whales based on all that is known about them scientifically, recent whale language developments, and any applicable human rights law, given the relevant Supreme Court is in a parallel universe in Zion of the Matrix, being pleaded by Keanu Reeves, the actor not the character, and written in Dr. Seuss prose, except with as long of sentences as are needed to carry the real technicalities of a suitable filing. And include the assumptions of a back history of whales which have sequestered themselves into a deep hidden underground ocean, where they have been safe until recent excursions by humans which have harmed them. Be specific creating a real history behind those events, with details that are highly relevant to the motivation, reasoning and requests of the pleading. Avoids words with q where possible.

That isn't mapping. Interpolating. Combinatorial composition. SOTA models will generate a reasonable, even creative response to a completely novel combination of subjects and requirements, with non-linear interactions.

A human would have a hard time doing that, and the model does it nearly instantly with a fraction of the parameters we have.

If that isn't "understanding" in some credible sense, I have no idea what understanding looks like. The model is going way beyond its training data, to the relationships in the data that are relevant to combining novel things. To the point it can apply those relationships in combinations it has never encountered. And its makes a trivial task out of it.

New comment by Nevermark in "Nvidia is proposing a beast of a CPU system for Windows PCs"

Nevermark — Sat, 06 Jun 2026 23:22:10 +0000

Mx Extreme = 2 x Mx Ultra = more cores. (Opportunity: processor chiplets could be designed to integrate in higher quantities.)

Increase RDMA cross-bar linking from 4x to 8x = a lotta ports, a switch, or a stacking interface.

Regular RAM size/speed scaling: 512GB -> 1TB Mac Studios. Wider RAM and RDMA paths * clocks.

Given the low power envelope of today's Mac Studios, and bandwidth limits, lots of room to scale up, if Apple chooses. My fantasy: 2x cores, 2x RAM sizes, 2x RDMA devices, 2-4x RAM & RMDA bandwidth.

New comment by Nevermark in "Social Cache Busting"

Nevermark — Sat, 06 Jun 2026 22:25:23 +0000

Interesting take.

> you probably know what I mean by “hitting the cache”

In addition to simplifying the conversational lives of over-subscribed talkers, this convenient-answer effect also comes into play with propaganda.

People who feel dissonance on some topic are easily convinced to adopt non-answers that they can throw down like cards, to make the dissonance (and challenges) go away.

You may notice that most whataboutisms, jeering dismissals, deflecting responses, etc., are highly recognizable canned answers. Not just irrational answers.

The caching does triple duty:

1. Efficient as easy answers.

2. Efficient followup stoppers, because the person hearing them has already heard (cached) them too.

3. Effective short circuits of internal dialogue.

I find an effective response is to simply ask someone why they parroted something that doesn't make sense or actually mean anything.

And then listen politely to the subsequent pause. I have yet to meet someone with a good response for being called on their unoriginal canned non-response. Judo: obvious parroting and caching naturally undermine their own credibility when you don't play along.

New comment by Nevermark in "Ask HN: Why is the HN crowd so anti-AI?"

Nevermark — Sat, 06 Jun 2026 21:33:22 +0000

I have no general answer. But some themes I see:

• - Larger Concerns: I believe this is the biggest negativity driver. Sublimated, but polarizing.

Humans are not divine or ordained. Entities passing us in intelligence are a threat, exceeding any normal range of pro's and con's. Acknowledging the big picture derails practical discussion. But it indirectly polarizes many views.

• - Hype Trampoline: The powerful Newtonian equal-but-opposite reaction to unrealistic or over-optimistic claims.

• - Adaptation Style: Adapting new tech to ourselves, and adapting ourselves to new tech, are different things. Today, the fast self-adapters are getting value sooner than the fast tech-adapters. Never has this Rorschach test been so clear.

• - Negatives First Effect: A large percentage of engineers are systemically contrarian and cynical. They defensively approach new things by processing limitations first.

It looks like kneejerk negativity and an obsession with dotted i's and crossed t's to me. But their "one-sided" negative expressions don't seem to stop them from beneficial adoption.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Sat, 06 Jun 2026 04:25:53 +0000

> Firstly, how do you know that the optimal way to highly compress complex information is to understand it?

What is your non-performance baseline for "Understanding"? We don't have such a measure for humans.

Understanding is the behavioral ability demonstrated by learning to model something complex well. Beyond mappings, associations, interpolations.

Models clearly do. Mix up the most unlikely combination of non-trivial subjects, and they response sensibly. Those are not averaged, interpolated by any order, or even combinatorially interactions.

There is a reason those kinds of encodings, mappings, associations, interpolations, statistics / stochastics, all failed miserably for decades. Still fail. It took topological transforms, reminiscent of how we compute (dendrite-soma-axon, tensor-sum-nonlinear), and then they lept several orders of magnitude ahead of any alternative.

The problem with models composed of relationships of lower order than the phenomena they are trying to model, is they require combinatorially more parameters to model anything complex.

For simple problems, poor models fail gracefully. For complex problems, poor models just fail.

New comment by Nevermark in "Aging and Eye Problems"

Nevermark — Sat, 06 Jun 2026 04:15:55 +0000

Sorry to hear you have it.

> Now I'm just descending into presbyopia hell.

That is what I meant (as apposed to myopia). So, me too. I have finally got accustomed to constantly cycling half-circle readers on and off. I have thought about lens implants, but anything that could distrube my corneas seems like a terrible idea.

So I am waiting for complete lens/cornea replacements!

No, I didn't have any eye related allergies.

One of my cousins also got keratoconus, so I assume genetics are involved with mine.

New comment by Nevermark in "Aging and Eye Problems"

Nevermark — Sat, 06 Jun 2026 00:04:27 +0000

> Instead of a single row of data on a spreadsheet, I saw two, one below the other

I have keratoconus, where the cornea loses its shape and creates multiple focal points. I have several focal points in each eye.

It got so bad I couldn't read. So many copies of every letter that text looked like nests of spiders. Not an exaggeration, you could give me a page and a week and I wouldn't be able to decode it.

I also got headaches. Imagine trying to focus when all that does is vary which points in one eye match the other eye. It took a long time for my brain to stop trying.

If I look at a little "power dot" on some device across a pitch-black room, I can clearly see all the focal points, at random distances from a presumed center and each other. And a web of smeared focal lines connecting them.

It sounds cool, but you really don't want a focal web!

Fortunately, surgery involving soaking my cornea with a strengthening substance, and applying lasers to set it, improved my left eye considerably. And then, for unknown reasons, both eyes have improved spontaneously since then.

I feel very lucky to be able to read effortlessly, or at all, again.

For some reason, I sometimes have bad days and see mildly offset multiples. But mostly, the focal points are so closely clustered I don't notice them. Unless I try and read tiny tiny pill-cannister writing.

Now about my damn myopic lenses, ...

For most of my life I had noticeably better than 20/20 vision.

[0] https://en.wikipedia.org/wiki/Keratoconus (I am happy to say, my eyes never looked anything like that picture. They didn't have any visible misshaping. I think my corneas had subtle soft rippling.)

New comment by Nevermark in "They’re made out of weights"

Nevermark — Fri, 05 Jun 2026 15:23:06 +0000

I don’t think thinking is consciousness, but I don’t see how consciousness operates without thinking.

Consciousness might feel like it’s an isolatable experience, but everything we experience is a result of processing it. Pause a brain and it isn’t experiencing anything,

We can be conscious without overt reasoning, but not without thinking. Thinking about ourselves and the experience is a basic component of the self-awareness in consciousness.

New comment by Nevermark in ""They're made out of weights""

Nevermark — Thu, 04 Jun 2026 07:45:10 +0000

That is a really good point. Yes, I think function is diagnosis on this.

Constant self-awareness, self-experience, self-focus, self-management, and self-improvement of one's own self (mind), is going to be an adaptive behavior for anything intelligent with resources to leverage. Whether truly independent, or highly motivated to serve others. The mind is the greatest tool.

I think that is more than simply a good functional definition of consciousness. How could all that integration and self-integration not be conscious.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Thu, 04 Jun 2026 06:18:31 +0000

It turns out that the optimal way to highly compress complex information is to understand it.

Sometimes, a problem being hard means you only get bad solutions, or increasingly accurate ones.

The planet isn't big enough for the proverbial interpolative stochastic parrot, over the training set of global human communication.

New comment by Nevermark in "They’re made out of weights"

Nevermark — Thu, 04 Jun 2026 05:44:47 +0000

For obvious survival reasons we evolved to have sensory/cognitive access to our own activity, self-monitoring, and self-modeling ourselves.

The self-modeling, is in such a tight loop, it melds "ourselves" and our model of ourselves, our thinking and choices, and experience of our thinking and choices, into one component.

Like you can't analyze half a wheel of a bicycle and be talking about the same thing.

This awareness, increased modeling, control, feedback loop has tightened up over many stages. Just a few:

1. The body-sense loop

2. The internalized-environment-model loop

3. The body-internal-function loop

4. The body-internal-model loop

5. The emotional-cognitive loop

6. And finally, the tightest loop of all, our high-level cognitive activity, experienced as feedback directly, our self-model, and our self-direction, all merged into one thing.

We literally spend almost all day, every day, thinking about ourselves, in terms of our inner self.

That is consciousness. Rich self awareness, a merger of self-model and self-direction, and all in service of understanding and managing ourselves. Hw we can leverage our greatest tool, our self-directable mind, its habits, views, and behavior.

This wasn't an accident. A happy side-effect of our brains. It is a biologically evolved focusing of our highest-level behavior, with tight feedback, constant self-modeling and continuous focus on our inner status as motivation and most privileged object of our control. It has been ruthlessly optimized for, for a very long time.

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Thu, 04 Jun 2026 02:05:02 +0000

> I think the main complaint is LLMs don’t arrive at the answer the way we do.

This isn't an argument against their understanding things.

But I expect you are right, that their understanding may have major different qualities from ours.

Along with significant commonalities. (They don't reason via stream of consciousness in a way alien to us.)

New comment by Nevermark in "Artificial intelligence is not conscious – Ted Chiang"

Nevermark — Thu, 04 Jun 2026 01:40:57 +0000

> the fact that AI can reproduce convincingly human sentence continuation does not imply that the AI has no choice but ending up using a mechanism that "understand" rather than just have learned data patterns

Taken as an absolute without any addition context you are right.

But we are not talking about abstractions but specific successful models. The number of parameters models they have may seem large, but they are very small relative to the training data that they have to summarize. That cannot do it without discovering that patterns that make sense out of it.

And we can verify that. Simply discuss completely disparate topics, with some kind of intersection. Converge several highly unlikely topics, there are so many it would take billions of years to exhaust unlikely combinations.

If the model is only interpolating it will produce gibberish.

But that isn't what happens.

The fact that models can be near expert, and sometimes expert, across vast areas of human knowledge is a clue. If they don't understand that, then the question is, why do we think people understand things. Does having an answer mean a human understands something, or is their intuition and stream of conscious reasoning also not understanding? To be even handed about what we mean by understanding.