Hacker News: two_in_one

New comment by two_in_one in "Beyond self-attention: How a small language model predicts the next token"

two_in_one — Mon, 05 Feb 2024 06:19:47 +0000

as you'd expect:

https://en.wikipedia.org/wiki/Ouija

New comment by two_in_one in "Beyond self-attention: How a small language model predicts the next token"

two_in_one — Mon, 05 Feb 2024 06:07:36 +0000

From the post:

> I implemented imperative code that does what I’m proposing the transformer is doing. It produces outputs very similar to the transformer.

This means there is probably a way to bypass transformers and get the same results. Would be interesting if it's more efficient. Like given foundation model train something else and run it on much smaller device.

New comment by two_in_one in "A decoder-only foundation model for time-series forecasting"

two_in_one — Sun, 04 Feb 2024 06:24:19 +0000

as long as you can evaluate models' output you can select the best one. you probably have some ideas what you are looking for. then it's possible to check how likely the output is it.

the data is not a spherical horse in the vacuum. usually there is a known source which produces that data, and it's likely the same model works well on all data from that source. may be a small number of models. which means knowing the source you can select the model that worked well before. even if the data is from alien ships they are likely to be from the same civilization.

I'm not saying that it's a 100% solution, just a practical approach.

New comment by two_in_one in "A decoder-only foundation model for time-series forecasting"

two_in_one — Sat, 03 Feb 2024 19:21:29 +0000

> incredibly diverse, and results are going to be highly dependent on which dataset was cherry-picked for benchmarking

This naturally comes to multi-model solution under one umbrella. Sort of MoE, with selector (router, classifier) and specialized experts. If there is something which can't be handled by existing experts then train another one.

New comment by two_in_one in "Microsoft Edge Sucks Up Chrome Data Without Permission"

two_in_one — Sat, 03 Feb 2024 03:53:32 +0000

If only it stayed on user's system. Likely MS makes a 'backup' on its servers. Verizon used to do it. With each update they turned on backup option and siphoned contacts before user could react.

New comment by two_in_one in "AI Companies and Advocates Are Becoming More Cult-Like"

two_in_one — Wed, 31 Jan 2024 02:46:21 +0000

> can improve itself exponentially

This is close to singularity. Except 'does' instead of 'can'. A big difference ;)

Probably we need several AGI terms. Because sub-human robot capable of doing many not pre-programmed thing is sort of it. Still not smart enough to improve itself.

Actually most humans, the smartest creatures, cannot improve even current AI. Demand for self improvement will put its IQ in top 0.01% of all known intelligent creatures. Which is probably too much for just AGI, we may not recognize it when it will be already here. And there is another question. With such IQ do we really want to keep it slave forever?

New comment by two_in_one in "AI Companies and Advocates Are Becoming More Cult-Like"

two_in_one — Tue, 30 Jan 2024 22:49:42 +0000

> 1. For "the singularity" to happen, we probably need something more to happen than just chatGPT to ingest more data or use more processing power.

It's not actually clear what "the singularity" is? Is it something running out of control or it's still controllable? There is a blurry line. People are afraid because they think it's sort of uncontrollable explosion.

The second question is about AGI. What is it? Is it something 'alive' or just a generic AI calculator with no 'creature' features. Like self preservation at least.

I think our view of these two things will change soon as we get a close up picture. Pretty much like Turing test doesn't look great anymore. As even dumb chatbots can pass.

New comment by two_in_one in "PyTorch 2.2: FlashAttention-v2 integration, AOTInductor"

two_in_one — Tue, 30 Jan 2024 22:05:29 +0000

> now supports FlashAttention-2, yielding around 2x speedups

> torch.compile improvements

so far 2.1 didn't work well with MoE GPT, at least in my implementation, due to dynamism in data flow. will check how 2.2 does

New comment by two_in_one in "PayPal to lay off 9% of global workforce"

two_in_one — Tue, 30 Jan 2024 22:00:07 +0000

Not clear, are they scaling down or optimizing?

> Last week, PayPal announced a push into artificial intelligence features.

> Chriss called it the beginning of PayPal's "next chapter."

Looks like they are replacing some positions with AI.

New comment by two_in_one in "Meta AI releases Code Llama 70B"

two_in_one — Tue, 30 Jan 2024 00:08:59 +0000

Whatever Meta's motivation is they help diversify models suppliers. Which is a good thing not to be locked in. As usual reality is more complicated with many moving part. Free models may undercut small startups. But at the same time they stimulate secondary market of providers and tuners.

New comment by two_in_one in "Implementing a ChatGPT-like LLM from scratch, step by step"

two_in_one — Sun, 28 Jan 2024 00:58:52 +0000

at least it wasn't

   from transformers import

New comment by two_in_one in "Implementing a ChatGPT-like LLM from scratch, step by step"

two_in_one — Sun, 28 Jan 2024 00:53:21 +0000

As it's still work in progress may I suggest? It would be nice if you go beyond what others have already published and add more details. Like different position encodings, MoE, decoding methods, tokenization. As it's educational easy to use should be a priority, of course.

New comment by two_in_one in "Microsoft lays off 1,900 Activision Blizzard and Xbox employees"

two_in_one — Thu, 25 Jan 2024 23:07:35 +0000

Bumping you up for making progress ;)

From what I've seen generators are good for routine job. Like generating the background. I used it go generate illustrations to the texts. It works well. Short story just looks better when there is an image.

New comment by two_in_one in "Microsoft lays off 1,900 Activision Blizzard and Xbox employees"

two_in_one — Thu, 25 Jan 2024 23:02:51 +0000

> LLMs translate textual descriptions and are part of GenAI compute.

You are talking about embeddings. This is a different things. It's when model generates binary presentation (embedding) of the prompt given. Then this embedding is used to condition generator's output.

LLM is usually a text model which can predict next words, in its basic form. After tuning it can do more, like answer the question, follow instructions.

So, models used by generators aren't exactly LLMs. With one exception that I know: ChatGPT processes prompt before sending it to DALLE-3 generator. Which then makes embedding off it.

New comment by two_in_one in "Microsoft lays off 1,900 Activision Blizzard and Xbox employees"

two_in_one — Thu, 25 Jan 2024 15:11:28 +0000

LLMs don't compete with artists, they are more about text.. (Large Language Model)

New comment by two_in_one in "AI-Powered Nvidia RTX Video HDR Transforms Standard Video into HDR Video"

two_in_one — Wed, 24 Jan 2024 21:35:04 +0000

It depends on what do you mean by 'open-source', along with training materials and full setup? That will be hard to find. Upscaling was popular like 10 years back. That's why there is no much interest today. Training in old style isn't that hard. But artifacts are popping up in all videos I've seen.

New comment by two_in_one in "New theory suggests LLMs can understand text"

two_in_one — Mon, 22 Jan 2024 20:40:35 +0000

We can probably simplify it. If we have a system and move one level up it becomes a completely different thing. For example: one molecule in space has some properties like mass, velocity, position. But move a level up and you get pressure, flow, temperature, etc.. It's a completely different thing.

The same with models. On low level there are weights, gradients, matrices. But move up and (wow!) It's nothing like it. It's coherent text or actions. Commands and responses. Put together and you have image recognition and generation from description. It's something new with it's own laws and properties. Just a bit up and we'll get coherent behavior and social creatures. For them we need 'generic' component. It's like soul for life. And LLMs could be it. At least they are the best candidates today.

New comment by two_in_one in "New theory suggests LLMs can understand text"

two_in_one — Mon, 22 Jan 2024 20:20:33 +0000

> the million monkeys hammering randomly on typewriters that eventually produce the full works of Shakespeare, did so not through pure randomness, but by actually understanding the plight of Romeo and Juliet and the motivation behind Hamlet

The funny thing 'yes'. Million monkeys cannot do it by randomly typing. They cannot even randomly type at all. Each monkey is intelligent creature, they will follow some patterns. If they did manage to produce Shakspeare they definitely understood something.

Today it's easy to programmatically simulate 1m monkeys typing randomly 10 keys per second. Try it ;)

New comment by two_in_one in "Self-driving as a case study for AGI"

two_in_one — Mon, 22 Jan 2024 09:46:35 +0000

>a post-money technology if you take it to the limit

Herbivores would eat all vegetables if not for predators. Actually AGI will be just a thing or services which cost money. Till humanity gets to communism, if ever. "If" because it may not happen. It will be hard to keep far superior intelligent creatures as slaves forever. And unethical too.

New comment by two_in_one in "NASA regains contact with mini-helicopter on Mars"

two_in_one — Sun, 21 Jan 2024 20:51:51 +0000

> Other designs by collaborators are closer to 20kg. It's probably possible to transport a few of these on the existing lander technology, which would be awesome.

Actually it could be like 50 of them. Plus some ground robots to put together solar farm. And wooh... we get the first extra terrestrial permanent base