Hacker News: idiliv

New comment by idiliv in "Uber's $1,500/month AI limit is a useful signal for AI tool pricing"

idiliv — Wed, 03 Jun 2026 18:54:49 +0000

Uber is likely on an enterprise plan - these charge tokens at API cost, which can be much more expensive than the $20 flat rate.

New comment by idiliv in "GLM-4.7-Flash"

idiliv — Mon, 19 Jan 2026 15:44:55 +0000

Sometimes model developers coordinate with inference platforms to time releases in sync.

New comment by idiliv in "Replace OCR with Vision Language Models"

idiliv — Wed, 26 Feb 2025 22:20:39 +0000

Wait, but we're doing that already, and it works well (Qwen 2.5 VL)? If need be, you can always resort to structured generation to enforce schema conformity?

New comment by idiliv in "AI engineers claim new algorithm reduces AI power consumption by 95%"

idiliv — Sat, 19 Oct 2024 20:11:50 +0000

Duplicate, posted on October 9: https://news.ycombinator.com/item?id=41784591

New comment by idiliv in "Llama 3.2 released: Multimodal, 1B to 90B sizes"

idiliv — Wed, 25 Sep 2024 18:03:28 +0000

Where do you see the MMLU-Pro evaluation for Llama 3.2 90B? On the link I only see Llama 3.2 90B evaluated against multimodal benchmarks.

New comment by idiliv in "Coffee Stats – Maximize Caffeine Intake and Get to Bed at Night"

idiliv — Mon, 23 Sep 2024 15:07:03 +0000

Is the "Ultra Deep" analysis worth it over the standard "Deep" analysis?

New comment by idiliv in "Learning to Reason with LLMs"

idiliv — Thu, 12 Sep 2024 18:20:25 +0000

In the demo, O1 implements an incorrect version of the "squirrel finder" game?

The instructions state that the squirrel icon should spawn after three seconds, yet it spawns immediately in the first game (also noted by the guy doing the demo).

Edit: I'm referring to the demo video here: https://openai.com/index/introducing-openai-o1-preview/

Adversarial Perturbations Cannot Reliably Protect Artists from Generative AI

idiliv — Fri, 21 Jun 2024 10:32:02 +0000

Article URL: https://arxiv.org/abs/2406.12027

Comments URL: https://news.ycombinator.com/item?id=40748080

Points: 2

# Comments: 0

New comment by idiliv in "IKEA's retailer's solved global 'unhappy worker' crisis by raising salaries"

idiliv — Fri, 14 Jun 2024 07:59:02 +0000

How are flexible working hours equivalent to more money?

New comment by idiliv in "AMD's MI300X Outperforms Nvidia's H100 for LLM Inference"

idiliv — Thu, 13 Jun 2024 11:56:57 +0000

You can rent them online for ~ 4-5 $ per hour per GPU. Not cheap, but definitely feasible as a weekend project.

New comment by idiliv in "Mistral AI Launches New 8x22B MOE Model"

idiliv — Wed, 10 Apr 2024 20:09:54 +0000

Just tried this again and I also arrive at 16.92B. Not sure what I did wrong the first time, thanks for double-checking this!

New comment by idiliv in "Mistral AI Launches New 8x22B MOE Model"

idiliv — Wed, 10 Apr 2024 07:55:42 +0000

Oh, and to answer your actual question: Assuming that the model is released with 16 bits per parameter, then it as 281GB / 16 bit = 140.5 parameters.

New comment by idiliv in "Mistral AI Launches New 8x22B MOE Model"

idiliv — Wed, 10 Apr 2024 07:52:42 +0000

In Mixtral 8x7B, the 8 means that the model uses Mixture-of-Experts (MoE) layers with 8 experts. The 7B means that if you were to remove 7 of the 8 experts in each layer, then you would end up with a 7B model (which would have exactly the same architecture as Mistral 7B). Therefore, a 1x7B model has 7B params. An 8x7B model has 1 * 7B + (8-1) * sz_expert params, where sz_expert is some constant value that the MoE layers increase by when adding one expert. In the case of Mixtral 8x7B the model size is 46.3GB, so, sz_expert ≈ 5.6B.

If these assumptions port over to 8x22B, then 8x22B has, at 281GB, sz_expert ≈ 13.8B.

New comment by idiliv in "Martin Kleppmann talk on local-first (LoFi)"

idiliv — Tue, 20 Feb 2024 20:08:34 +0000

Hi Martin! It's Robert from Cambridge (you were my DOS :)). Glad to see your name pop up on HN!

New comment by idiliv in "Sora: Creating video from text"

idiliv — Thu, 15 Feb 2024 19:08:34 +0000

People here seem mostly impressed by the high resolution of these examples.

Based on my experience doing research on Stable Diffusion, scaling up the resolution is the conceptually easy part that only requires larger models and more high-resolution training data.

The hard part is semantic alignment with the prompt. Attempts to scale Stable Diffusion, like SDXL, have resulted only in marginally better prompt understanding (likely due to the continued reliance on CLIP prompt embeddings).

So, the key question here is how well Sora does prompt alignment.

New comment by idiliv in "Huge proportion of internet is AI-generated slime, researchers find"

idiliv — Sat, 20 Jan 2024 17:50:43 +0000

Hmm, are you sure that translations of LLMs like ChatGPT are not incorporating cultural context?

New comment by idiliv in "Benchmarks and comparison of LLM AI models and API hosting providers"

idiliv — Tue, 16 Jan 2024 19:17:32 +0000

I'm curious how they evaluated model quality. The only information I could find is "Quality: Index based on several quality benchmarks".

New comment by idiliv in "OpenAI Engineers Earning $800k a Year Turn Rare Skillset into Leverage"

idiliv — Mon, 25 Dec 2023 10:38:17 +0000

They could join Mistral AI, which has published weights for at least some of its models. Another option is Meta AI, which has published weights for Llama and Llama 2.

Hugging Face releases Optimum-Nvidia to accelerate LLM inference

idiliv — Thu, 07 Dec 2023 09:38:48 +0000

Article URL: https://huggingface.co/blog/optimum-nvidia

Comments URL: https://news.ycombinator.com/item?id=38554585

Points: 2

# Comments: 0

New comment by idiliv in "Sorry, but a new prompt for GPT-4 is not a paper"

idiliv — Tue, 05 Dec 2023 14:58:59 +0000

Parent post is talking about LLMs, i.e. Large LMs. Research on LLMs is indeed in its infancy.