Hacker News: JackRumford

New comment by JackRumford in "The pro-Israel information war"

JackRumford — Fri, 08 Dec 2023 21:04:00 +0000

Not all other nations - just Palestine. Because Israel is in position of power.

New comment by JackRumford in "The wealth of the 25 richest families in the world soared 43% in the last year"

JackRumford — Fri, 08 Dec 2023 20:43:56 +0000

yes but the size of the portfolio is the issue here

New comment by JackRumford in "Ask HN: How do you keep going without burning out?"

JackRumford — Sun, 19 Nov 2023 17:13:53 +0000

I solve exponentially harder problems.

When I’m im at the peak, I will start pursuing another field that is foreign to me and, again, grind to the top.

A cycle like this takes me about two years depending on the sub topic.

I have no interest in being the top 100 protein language model scientist in the world and stay there forever. I want to be a polymath.

New comment by JackRumford in "TikTok says it's not the algorithm, teens are just pro-Palestine"

JackRumford — Sun, 19 Nov 2023 17:10:16 +0000

Concerning /s

New comment by JackRumford in "I’m not a programmer, and I used AI to build my first bot"

JackRumford — Fri, 06 Oct 2023 18:06:16 +0000

Almost everyone I know in the dev community agree that both generative art and code is fine. Most people don't care about copyright, to say the least. At least the people I know in real life and chat with online.

Only thing I've seen was artists on Reddit and Twitter getting angry about the AI art.

DeepMind shows Promptbreeder, an AI that self-improves via recursive prompts

JackRumford — Thu, 05 Oct 2023 14:29:20 +0000

Article URL: https://arxiv.org/abs/2309.16797

Comments URL: https://news.ycombinator.com/item?id=37779064

Points: 2

# Comments: 0

New comment by JackRumford in "Understanding LLMs Through the Problem They Are Trained to Solve"

JackRumford — Tue, 03 Oct 2023 12:07:31 +0000

Abstract: The widespread adoption of large language models (LLMs) makes it important to recognize their strengths and limitations. We argue that in order to develop a holistic understanding of these systems we need to consider the problem that they were trained to solve: next-word prediction over Internet text. By recognizing the pressures that this task exerts we can make predictions about the strategies that LLMs will adopt, allowing us to reason about when they will succeed or fail. This approach - which we call the teleological approach - leads us to identify three factors that we hypothesize will influence LLM accuracy: the probability of the task to be performed, the probability of the target output, and the probability of the provided input. We predict that LLMs will achieve higher accuracy when these probabilities are high than when they are low - even in deterministic settings where probability should not matter. To test our predictions, we evaluate two LLMs (GPT-3.5 and GPT-4) on eleven tasks, and we find robust evidence that LLMs are influenced by probability in the ways that we have hypothesized. In many cases, the experiments reveal surprising failure modes. For instance, GPT-4's accuracy at decoding a simple cipher is 51% when the output is a high-probability word sequence but only 13% when it is low-probability. These results show that AI practitioners should be careful about using LLMs in low-probability situations. More broadly, we conclude that we should not evaluate LLMs as if they are humans but should instead treat them as a distinct type of system - one that has been shaped by its own particular set of pressures.

Understanding LLMs Through the Problem They Are Trained to Solve

JackRumford — Tue, 03 Oct 2023 12:07:31 +0000

Article URL: https://arxiv.org/abs/2309.13638

Comments URL: https://news.ycombinator.com/item?id=37750765

Points: 2

# Comments: 2

Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization

JackRumford — Tue, 03 Oct 2023 12:04:14 +0000

Article URL: https://arxiv.org/abs/2309.08546

Comments URL: https://news.ycombinator.com/item?id=37750737

Points: 1

# Comments: 0

New comment by JackRumford in "Twitter / X is losing daily active users. CEO Linda Yaccarino confirmed it"

JackRumford — Mon, 02 Oct 2023 09:38:37 +0000

Criticisms of Elon Musk are not solely based on ideological differences, but also on concerns regarding his behavior, management style, and the working conditions at his companies, which some believe warrant scrutiny irrespective of his personal beliefs. He just doesn't feel like a good person and isn't relatable. This is why I think people hate him.

New comment by JackRumford in "Users report iPhone 15 is overheating"

JackRumford — Fri, 29 Sep 2023 16:14:32 +0000

I suspect they are trying to compete in the console market without having a console per se. AFAIK the iPhone 15 Pro has 1/3rd of the TFLOPs of the Playstation 5. They also have a smooth TV-connectivity system.

Most people who have consoles in the US already have an iPhone in their pocket. Now you just need a controller, some kind of dock for the phone, and pay off a lot of devs to make games for the platform to get the ball rolling.

This aligns with the new changes on Mac OS and IOS and the Metal API: https://developer.apple.com/metal/

New comment by JackRumford in "Synthia-7B-v1.3 hits 64.85 on 4-evals; LLaMA-2-70B-Chat at 66.8"

JackRumford — Fri, 29 Sep 2023 14:22:43 +0000

yep. this guy keeps up to date with quantised models: https://huggingface.co/TheBloke

New comment by JackRumford in "Llama 2 Long"

JackRumford — Fri, 29 Sep 2023 11:13:05 +0000

"With FLASHATTENTION (Dao et al., 2022), there is negligible GPU memory overhead as we increase the sequence length and we observe around 17% speed loss when increasing the sequence length from 4,096 to 16,384 for the 70B model."

"For the 7B/13B models, we use learning rate 2e−5 and a cosine learning rate schedule with 2000 warm-up steps. For the larger 34B/70B models, we find it important to set a smaller learning rate (1e−5) to get monotonically decreasing validation losses."

"In the training curriculum ablation study, models trained with a fixed context window of 32k from scratch required 3.783 × 10^22 FLOPs and achieved performance metrics like 18.5 F1 on NarrativeQA, 28.6 F1 on Qasper, and 37.9 EM on Quality."

"Continual pretraining from short context models can easily save around 40% FLOPs while imposing almost no loss on performance."

"Through early experiments at the 7B scale, we identified a key limitation of LLAMA 2’s positional encoding (PE) that prevents the attention module from aggregating information of distant tokens. We adopt a minimal yet necessary modification on the RoPE positional encoding (Su et al., 2022) for long-context modeling – decreasing the rotation angle."

Pretty exciting stuff. Getting close to GPT-4 hopefully soon!

New comment by JackRumford in "Synthia-7B-v1.3 hits 64.85 on 4-evals; LLaMA-2-70B-Chat at 66.8"

JackRumford — Fri, 29 Sep 2023 11:04:58 +0000

HuggingFace model: https://huggingface.co/migtissera/Synthia-7B-v1.3

Synthia-7B-v1.3 hits 64.85 on 4-evals; LLaMA-2-70B-Chat at 66.8

JackRumford — Fri, 29 Sep 2023 11:02:51 +0000

Article URL: https://twitter.com/migtissera/status/1707482374748139690

Comments URL: https://news.ycombinator.com/item?id=37701921

Points: 2

# Comments: 3

New comment by JackRumford in "Fine-tune your own Llama 2 to replace GPT-3.5/4"

JackRumford — Thu, 14 Sep 2023 10:52:20 +0000

These sites say 154B:

https://www.ankursnewsletter.com/p/gpt-4-gpt-3-and-gpt-35-tu...

https://blog.wordbot.io/ai-artificial-intelligence/gpt-3-5-t...

New comment by JackRumford in "QubesOS – A reasonably secure operating system"

JackRumford — Wed, 12 Jul 2023 09:57:00 +0000

Ashamed to say but I do all these things on my main OSs for ~15 years and I never had any problem (IIRC). Yes, I'm not huge on security and I realize what this could do, but I figure the chance is as big as getting it through a channel I don't expect.