Hacker News: charlescurt123

New comment by charlescurt123 in "Ask HN: SWEs how do you future-proof your career in light of LLMs?"

charlescurt123 — Tue, 17 Dec 2024 02:17:26 +0000

Companies that have no interest in growth and are already heavily entrenched have no purpose for increasing output though. They will fire everyone and behave the same?

On the other hand that means they are weaker if competition comes along as it's expected that consumers and business would demand significantly more due to comparisons.

New comment by charlescurt123 in "M4 MacBook Pro"

charlescurt123 — Wed, 30 Oct 2024 20:00:31 +0000

comparing a laptop to a A100 (312 teraFLOPS) or H100 (~1P FLOPS) server is a stretch to say the least.

An M2 is according to a reddit post around 27 tflops

So < 1/10 the performance of just computation. let alone the memory.

What workflow would use something like this?

New comment by charlescurt123 in "Were RNNs all we needed?"

charlescurt123 — Tue, 08 Oct 2024 18:49:46 +0000

I'm not proposing a change in model size; rather, I'm suggesting a higher dimensionality within the current structure. There’s an interesting paper on LLM explainability, which found that individual neurons often represent a superposition of various data elements.

What I’m advocating is a substantial increase in this aspect—keeping model size the same while expanding dimensionality. The "curse of dimensionality" illustrates how a modest increase in dimensions leads to a significantly larger volume.

While I agree that backpropagation isn’t a complete solution, it’s ultimately just a stochastic search method. The key point here is that expanding the dimensionality of a model’s space is likely the only viable long-term direction. To achieve this, backpropagation needs to work within an increasingly multidimensional space.

A useful analogy is training a small model on random versus structured data. With structured data, we can learn an extensive amount, but with random data, we hit a hard limit imposed by the network. Why is that?

New comment by charlescurt123 in "Were RNNs all we needed?"

charlescurt123 — Thu, 03 Oct 2024 22:08:05 +0000

I find the entire field lacking when it comes to long-horizon problems. Our current, widely used solution is to scale, but we're nowhere near achieving the horizon scales even small mammal brains can handle. Our models can have trillions of parameters, yet a mouse brain would still outperform them on long-horizon tasks and efficiency. It's something small, simple, and elegant—an incredible search algorithm that not only finds near-optimal routes but also continuously learns on a fixed computational budget.

I'm honestly a bit envious of future engineers who will be tackling these kinds of problems with a 100-line Jupyter notebook on a laptop years from now. If we discovered the right method or algorithm for these long-horizon problems, a 2B-parameter model might even outperform current models on everything except short, extreme reasoning problems.

The only solution I've ever considered for this is expanding a model's dimensionality over time, rather than focusing on perfect weights. The higher dimensionality you can provide to a model, the greater its theoretical storage capacity. This could resemble a two-layer model—one layer acting as a superposition of multiple ideal points, and the other layer knowing how to use them.

When you think about the loss landscape, imagine it with many minima for a given task. If we could create a method that navigates these minima by reconfiguring the model when needed, we could theoretically develop a single model with near-infinite local minima—and therefore, higher-dimensional memory. This may sound wild, but consider the fact that the human brain potentially creates and disconnects thousands of new connections in a single day. Could it be that these connections steer our internal loss landscape between different minima we need throughout the day?

New comment by charlescurt123 in "An adult fruit fly brain has been mapped"

charlescurt123 — Wed, 02 Oct 2024 22:14:32 +0000

I believe they do know this.

However the real challenge would be:

1. bring this mapping into a AI framework for inferencing 2. We don't know the "OS" on how it runs. Just randomly triggering a neuron probably wouldn't work as there is a lot of other factors that trigger neurons.

New comment by charlescurt123 in "Do AI companies work?"

charlescurt123 — Wed, 02 Oct 2024 19:50:39 +0000

Possibly, but as of now it's a completely unsolved problem and to my knowledge nobody has shown even a tiny model being able to perform it.

Based on the top page today I may even be able to make the argument we can't even simulate the abilities of a fruit fly.

The absolute frontier models can perform only a fraction of a fraction of what a typical work day looks like for a human. I calculated the chance that a frontier model today has about a 1^-29 chance of performing a single day of connected tasks based on GAIA benchmarks.

New comment by charlescurt123 in "Do AI companies work?"

charlescurt123 — Tue, 01 Oct 2024 01:06:03 +0000

continually learn with a fixed computational budget over massively extended task horizons.

New comment by charlescurt123 in "Learning to Reason with LLMs"

charlescurt123 — Thu, 12 Sep 2024 18:20:27 +0000

with these methods the issue is the log scale of compute. Let's say you ask it to solve fusion. It may be able to solve it but the issue is it's unverifiable WHICH was correct.

So it may generate 10 Billion answers to fusion and only 1-10 are correct.

There would be no way to know which one is correct without first knowing the answer to the question.

This is my main issue with these methods. They assume the future via RL then when it gets it right they mark that.

We should really be looking at methods of percentage it was wrong rather then it was right a single time.

New comment by charlescurt123 in "Learning to Reason with LLMs"

charlescurt123 — Thu, 12 Sep 2024 18:07:52 +0000

It's RL so that means it's going to be great on tasks they created for training but not so much on others.

Impressive but the problem with RL is that it requires knowledge of the future.

New comment by charlescurt123 in "Panic at the Job Market"

charlescurt123 — Thu, 18 Jul 2024 04:33:00 +0000

So to preface, I'm not looking for a job (trying to build my own company)

When I do interviews (probably limited compared to you but some) I do it like I wish someone would interview me.

I focused purely on curiosity. how many things disparate things are they interested in, the things that overlap with my knowledge I probe deep. I believe in Einsteins quote.

"I have no special talents. I am only passionately curious."

If someone knows about how RDMA and GPU MIGS work they are probably pretty damn interested in how HPC clusters function. More importantly can they compress this information and explain it so that a non technical person could understand?

There are so many endless number of questions I could ask someone to prob their knowledge of a technical field it kind of upsets me that most of the time people ask the most shallow of questions.

I believe this is because most people actually study their fields a very limited amount, because most people are honestly not truly interested in what they do.

The biggest implication of this is that I may be able to tell if someone has this trait but I understand that the majority of people could not as they literally don't know the things they could ask.

Asking system designs of me if you aren't knowledgable of the field would probably be the easiest to see the complexity of systems I can build.

New comment by charlescurt123 in "Panic at the Job Market"

charlescurt123 — Thu, 18 Jul 2024 04:13:19 +0000

Self taught, come from EE background.

Was originally building firmware and hardware for previous company while testing DS on their systems. They liked my work and I switched to DS and ended up a team lead.

Honestly have never had a take home exercise but would love it if I could. I basically make my own work, if I don't get work I will build other projects for them and try to sell it to the company.

I normally make good value projects and can sell it. It's how I went from hardware to DS lead of a team in a few years.

New comment by charlescurt123 in "Panic at the Job Market"

charlescurt123 — Thu, 18 Jul 2024 04:08:48 +0000

I'm not bad at getting jobs, just feel when I am looking I believe people think I am lying about my project experience.

I do these things because I see a place in the company to grow value and do it.

I can write almost all basic coding and my true skill set is in custom complex DS pipelines at scale.

New comment by charlescurt123 in "Panic at the Job Market"

charlescurt123 — Wed, 17 Jul 2024 20:12:42 +0000

So I feel I strongly fall in a poor performer interview category any time any code problems come up. How would I convince you I do not have a fraudulent resume?

I study hours every day for many years now. I know many complex systems however studying algorithms bore me to tears.

I've built HPC clusters, k8s clusters, Custom DL method, custom high performance file system, low level complex image analysis algorithms, firmware, UIs, custom OS work.

I've done a lot of stuff because I can't help wanting to learn it. But I fail even basic leetcode questions.

Am I a bad engineer?

There seems to be no way for me to show my abilities to companies other than passing a leetcode but at the same time stopping learning DL methods to learn leetcode feels painful. I only want to learn the systems that create the most value for a company.

I imagine if you interviewed me you would think I wrote a fraudulent resume. Not sure how I am supposed to convince someone otherwise though. Perhaps I've been dumb in not working on code that can be seen outside of a company.

New comment by charlescurt123 in "Karpathy: Let's reproduce GPT-2 (1.6B): one 8XH100 node 24h $672 in llm.c"

charlescurt123 — Thu, 11 Jul 2024 20:17:11 +0000

I imagine we could do this now but not the way you think.

have a human created story and text as a guideline.

With that have genAI make the text per stage, you would get different statements every time and would stay on track.

Would be interesting to play a game where all players say the same information in slightly different ways every single playthrough.

New comment by charlescurt123 in "From GPT-4 to AGI: Counting the OOMs"

charlescurt123 — Wed, 10 Jul 2024 03:59:46 +0000

I think hallucinations are actually the sign that LLMs are far closer to a real brain than we realize.

I think hallucinations are a major unsearched gateway to AGI.

New comment by charlescurt123 in "From GPT-4 to AGI: Counting the OOMs"

charlescurt123 — Wed, 10 Jul 2024 03:58:23 +0000

Good question, I'm working on exactly this, I suppose you could call it the replacement of RAG.

It's actually not very easy to achieve this. I could give a very long winded answer (don't tempt me) but suffice to say it's a resolution problem.

All AI have a fixed resolution on creation. Long running tasks focus on a very particular narrowing space per step, the resolution required for an infinite task is infinite resolution.

No 9s of error will ever fix this.

Funny enough, small animals do this with ease so I strongly disagree the idea that our AI outcompete even small mammals in every way.

New comment by charlescurt123 in "From GPT-4 to AGI: Counting the OOMs"

charlescurt123 — Wed, 10 Jul 2024 03:41:38 +0000

Doing any job for more than an hour without completely forgetting it's goals and tasks

The Forever AI Agent

charlescurt123 — Thu, 02 May 2024 22:54:28 +0000

Article URL: https://medium.com/@charles.curt/the-forever-ai-agent-f9cc39a23f8d

Comments URL: https://news.ycombinator.com/item?id=40242261

Points: 2

# Comments: 0