Hacker News: skyde

New comment by skyde in "LLM Neuroanatomy II: Modern LLM Hacking and Hints of a Universal Language?"

skyde — Tue, 24 Mar 2026 15:37:18 +0000

Actually not surprised. I guess this is for the same reason “say it twice” [1] is working. Because LLm are trained as causal language model, past token cannot attend to future token. One copy of the layer set solve this. [1]https://arxiv.org/html/2512.14982v1

New comment by skyde in "Apple's MLX adding CUDA support"

skyde — Mon, 14 Jul 2025 22:42:24 +0000

this is CUDA backend to MLX not MLX backend for CUDA!

New comment by skyde in "Math Academy pulled me out of the Valley of Despair"

skyde — Wed, 05 Mar 2025 19:03:51 +0000

Can you give more detail on what you mean by it can be a valuable experience with the right people around to help.

My son (7 years old) is gifted in Math and as a parent I find it extremely hard to decide how much I should push him (register him to math competition, weekend math club ...) and how much I should just let him get 100% on exam and not accelerate the learning.

New comment by skyde in "Does X cause Y? An in-depth evidence review (2021)"

skyde — Fri, 14 Feb 2025 21:33:39 +0000

is it only me or this completely miss all the recent research on causal inference using causal graphical model ?

New comment by skyde in "Desmos Animated Graphing Calculator"

skyde — Fri, 31 Jan 2025 16:07:53 +0000

Des os is the best graphic calculator ever built. And its amazing it has un directly in your browser or without internet on your phone.

Just wish it was open source :-)

Anyone know of an open source library like 3blue1brown Manim library that can work run client side in the web browser like Desmos ?

New comment by skyde in "The era of open voice assistants"

skyde — Fri, 20 Dec 2024 16:53:39 +0000

how does this compare to ESP32-S3-BOX-3B ?

New comment by skyde in "Math is still catching up to the genius of Ramanujan"

skyde — Wed, 23 Oct 2024 18:30:51 +0000

Thanks a lot for writing that. I agree 100% with you.

But I always wondered how polymath like Leonard davinci and Isaac newton that are excellent in many area are possible.

New comment by skyde in "Students paid thousands for a Caltech boot camp that Caltech didn't teach"

skyde — Sun, 29 Sep 2024 15:54:00 +0000

Is MIT class also taught by outsourced instructor instead of MIT instructor?

New comment by skyde in "Distributed Locks with Redis (2014)"

skyde — Thu, 22 Aug 2024 20:50:54 +0000

Redis Sentinel provides high availability and monitoring for Redis, but it does not guarantee strong consistency.

Linearizability requires that once a write is acknowledged, all subsequent reads should reflect that write.

if min-replicas-to-write is set to the number of Redis replica then if a single node goes down you won't be able to do any write (take lock or release lock).

if min-replicas-to-write is set to any number smaller than the total number or Redis replica some replica could still be lagging because of Asynchronous replication.

Also when a replica acknowledges a write in Redis, it means that the write has been received and logged by the replica, but it doesn’t necessarily mean that the write has been fully processed and applied to the data set.

This mean reading from replica that acknowledges a write from master might still return the Old value for the Key.

New comment by skyde in "Distributed Locks with Redis (2014)"

skyde — Thu, 22 Aug 2024 04:30:46 +0000

Paxos and Raft are consensus algorithms that provide certain guarantees and capabilities that a master-slave system with synchronous replication, such as PostgreSQL, cannot offer.

These algorithms ensure that a majority of nodes (a quorum) must agree on any proposed chAnge. This agreement guarantees that once a decision is made (e.g., to commit a transaction), it is final and consistent across all nodes. This strong consistency is critical in distributed systems to avoid split-brain scenarios.

This is easily caused by :

1-network partition

2-latency issues.

3-Async failover (2 nodes think they are the master)

4-replica lag (some but not all replica acknowledged the write) while master send confirmation to client

New comment by skyde in "Distributed Locks with Redis (2014)"

skyde — Thu, 22 Aug 2024 04:27:34 +0000

Redis is a very bad store for a distributed lock but Postgres is only slightly better.

What you truly need is something like ZooKeeper and etcd that are designed to achieve distributed consensus using algorithms like Paxos or Raft.

This ensures strong consistency and reliability in a distributed system, making them ideal for tasks like leader election, configuration management, and lease management where consistency across nodes is critical.

New comment by skyde in "IsoFLOP curves of large language models are flat"

skyde — Sat, 03 Aug 2024 01:14:52 +0000

But inside on epoch there is a lot of duplication already.

By duplication I mean if context length is N there is many sequence of N word that are not unique.

New comment by skyde in "Launch HN: Undermind (YC S24) – AI agent for discovering scientific papers"

skyde — Fri, 26 Jul 2024 06:53:23 +0000

Could not try it. Saying valid institutional or company email address.

It doesn’t recognize my university.

New comment by skyde in "Large Enough"

skyde — Wed, 24 Jul 2024 18:50:27 +0000

It “work” but the LLM having to use the calculator mean the LLM doesn’t understand arithmetic enough and doesn’t know how to use an follow a set of step (algorithm ) natively to find the answer for bug numbers.

I believe this could be fixed and is worth fixing. Because it’s the only way LLM will be able to help math and physic researcher write proof and make real scientific progress

New comment by skyde in "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision"

skyde — Fri, 12 Jul 2024 16:25:31 +0000

Just discovered e-graph recently and I have a good understanding of compiler from taking compiler class at university.

I would like to understand why you say e-graph would need control-flow to be revamped.

Do you have anything I could read on it ?

New comment by skyde in "FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-Precision"

skyde — Fri, 12 Jul 2024 16:20:48 +0000

https://github.com/uwplse/tensat

New comment by skyde in "Chebyshev approximation and how it can help (2012)"

skyde — Sat, 08 Jun 2024 16:23:44 +0000

What do you mean by close to CNN?

What is your architecture? Is it just a fully connected layer of chebyshev?

New comment by skyde in "Simple tasks showing reasoning breakdown in state-of-the-art LLMs"

skyde — Thu, 06 Jun 2024 08:36:26 +0000

Given that Alice has 13 brothers and 31 sisters, we can update the Prolog program with this information. We need to adjust the fact about Alice's siblings and then use the rule to calculate the number of sisters her brothers have.

Here is the updated program:

```prolog % Facts siblings(alice, 13, 31). % Alice has 13 brothers and 31 sisters

% Rules % Alice's brothers have M + 1 sisters, since Alice is also their sister. sisters_of_brother(TotalSisters) :- siblings(alice, _, M), TotalSisters is M + 1.

% Initialization goal :- initialization(main).

% Predicate to find and print the total number of sisters Alice's brothers have main :- sisters_of_brother(TotalSisters), writeln(TotalSisters). ```

In this program, the fact `siblings(alice, 13, 31)` represents that Alice has 13 brothers and 31 sisters. The rule `sisters_of_brother(TotalSisters)` calculates the total number of sisters that Alice's brothers have by adding 1 to the number of sisters Alice has (since Alice is also their sister).

When we run this program, it will print the total number of sisters Alice's brothers have:

1. Alice has 31 sisters. 2. Each of Alice's brothers will also consider Alice as a sister. 3. Therefore, each brother has 31 + 1 = 32 sisters.

The output of the program will be:

``` 32 ```

Thus, Alice's brothers have 32 sisters.

New comment by skyde in "Simple tasks showing reasoning breakdown in state-of-the-art LLMs"

skyde — Thu, 06 Jun 2024 08:30:00 +0000

Asking gpt to first output prolog program seem to 100% fix it!

Here is the updated program:

```prolog % Facts siblings(alice, 13, 31). % Alice has 13 brothers and 31 sisters

% Rules % Alice's brothers have M + 1 sisters, since Alice is also their sister. sisters_of_brother(TotalSisters) :- siblings(alice, _, M), TotalSisters is M + 1.

% Initialization goal :- initialization(main).

% Predicate to find and print the total number of sisters Alice's brothers have main :- sisters_of_brother(TotalSisters), writeln(TotalSisters). ```

When we run this program, it will print the total number of sisters Alice's brothers have:

1. Alice has 31 sisters. 2. Each of Alice's brothers will also consider Alice as a sister. 3. Therefore, each brother has 31 + 1 = 32 sisters.

The output of the program will be:

``` 32 ```

Thus, Alice's brothers have 32 sisters.

New comment by skyde in "Transformers Represent Belief State Geometry in Their Residual Stream"

skyde — Sat, 01 Jun 2024 17:07:01 +0000

What do you mean by simplest in term of optimization?

I get it find solution that are easy for SGD or Adam optimizer to find.

But why would such solution be less simple than other ?