Hacker News: ollin

New comment by ollin in "Project Glasswing: Securing critical software for the AI era"

ollin — Tue, 07 Apr 2026 20:25:09 +0000

- The OpenBSD one is 'TCP packets with invalid SACK options could crash the kernel' https://cdn.openbsd.org/pub/OpenBSD/patches/7.8/common/025_s...

- One (patched) Linux kernel bug is 'UaF when sys_futex_requeue() is used with different flags' https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd94...

These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today https://red.anthropic.com/2026/mythos-preview/, which includes more detail on some of the public/fixed issues (like the OpenBSD one) as well as hashes for several unreleased reports and PoCs.

New comment by ollin in "System Card: Claude Mythos Preview [pdf]"

ollin — Tue, 07 Apr 2026 19:26:10 +0000

My impression was entirely the opposite; the unsolved subset of SWE-bench verified problems are memorizable (solutions are pulled from public GitHub repos) and the evaluators are often so brittle or disconnected from the problem statement that the only way to pass is to regurgitate a memorized solution.

OpenAI had a whole post about this, where they recommended switching to SWE-bench Pro as a better (but still imperfect) benchmark:

https://openai.com/index/why-we-no-longer-evaluate-swe-bench...

> We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions

> SWE-bench problems are sourced from open-source repositories many model providers use for training purposes. In our analysis we found that all frontier models we tested were able to reproduce the original, human-written bug fix

> improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time

> We’re building new, uncontaminated evaluations to better track coding capabilities, and we think this is an important area to focus on for the wider research community. Until we have those, OpenAI recommends reporting results for SWE-bench Pro.

New comment by ollin in "Tell HN: Apple development certificate server seems down?"

ollin — Wed, 11 Mar 2026 00:28:24 +0000

Here was the developer thread https://developer.apple.com/forums/thread/818403 I found with lots of other reports of "Unable to Verify App - An internet connection is required to verify the trust of the developer".

Although https://developer.apple.com/system-status/ was green for most of the 3-4 hour outage, the page now at least acknowledges two minutes of downtime:

    App Store Connect - Resolved Outage
    Today, 12:04 AM - 12:06 AM
    All users were affected
    Users experienced a problem with this service.

Not a great developer experience.

New comment by ollin in "Mac mini will be made at a new facility in Houston"

ollin — Wed, 25 Feb 2026 03:08:01 +0000

The still photo (with 富士康科技 photoshopped out) is the second image of the "In Houston, workers assemble advanced AI servers" photo carousel https://www.apple.com/newsroom/images/2026/02/apple-accelera...

New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"

ollin — Fri, 30 Jan 2026 16:03:24 +0000

A lot of people mentioned this! The "dreamlike" comparison is common as well. In both cases, you have a network of neurons rendering an image approximating the real world :) so it sort of makes sense.

Regarding the specific boiling-textures effect: there's a tradeoff in recurrent world models between jittering (constantly regenerating fine details to avoid accumulating error) and drifting (propagating fine details as-is, even when that leads to accumulating error and a simplified/oversaturated/implausible result). The forest trail world is tuned way towards jittering (you can pause with `p` and step frame-by-frame with `.` to see this). So if the effect resembles LSD, it's possible that LSD applies some similar random jitter/perturbation to the neurons within your visual cortex.

New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"

ollin — Thu, 29 Jan 2026 23:11:13 +0000

On a technical level, this looks like the same diffusion transformer world model design that was shown in the Genie 3 post (text/memory/d-pad input, video output, 60sec max context, 720p, sub-10FPS control latency due to 4-frame temporal compression). I expect the public release uses a cheaper step-distilled / quantized version. The limitations seen in Genie 3 (high control latency, gradual loss of detail and drift towards videogamey behavior, 60s max rollout length) are still present. The editing/sharing tools, latency, cost, etc. can probably improve over time with this same model checkpoint, but new features like audio input/output, higher resolution, precise controls, etc. likely won't happen until the next major version.

From a product perspective, I still don't have a good sense of what the market for WMs will look like. There's a tension between serious commercial applications (robotics, VFX, gamedev, etc. where you want way, way higher fidelity and very precise controllability), vs current short-form-demos-for-consumer-entertainment application (where you want the inference to be cheap-enough-to-be-ad-supported and simple/intuitive to use). Framing Genie as a "prototype" inside their most expensive AI plan makes a lot of sense while GDM figures out how to target the product commercially.

On a personal level, since I'm also working on world models (albeit very small local ones https://news.ycombinator.com/item?id=43798757), my main thought is "oh boy, lots of work to do". If everyone starts expecting Genie 3 quality, local WMs need to become a lot better :)

New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"

ollin — Thu, 29 Jan 2026 20:02:54 +0000

Yup, similar concepts! Just at two opposite extremes of the compute/scaling spectrum.

- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.

- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.

There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.

New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"

ollin — Thu, 29 Jan 2026 18:18:46 +0000

Really great to see this released! Some interesting videos from early-access users:

- https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197 walking through various cities

- https://x.com/fofrAI/status/2016936855607136506 helicopter / flight sim

- https://x.com/venturetwins/status/2016919922727850333 space station, https://x.com/venturetwins/status/2016920340602278368 Dunkin' Donuts

- https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207 simulating a laptop computer, moving the mouse

- https://x.com/emollick/status/2016919989865840906 otter airline pilot with a duck on its head walking through a Rothko inspired airport

New comment by ollin in "FLUX.2 [Klein]: Towards Interactive Visual Intelligence"

ollin — Sat, 17 Jan 2026 02:53:08 +0000

Z-Image is another open-weight image-generation model by Alibaba [1]. Z-Image Turbo was released around the same time as (non-Klein) FLUX.2 and received generally warmer community response [2] since Z-image Turbo was faster, also high-quality, and reportedly better at generating NSFW material. The base (non-Turbo) version of Z-Image is not yet released.

[1] https://tongyi-mai.github.io/Z-Image-blog/

[2] https://www.reddit.com/r/StableDiffusion/comments/1p9uu69/no...

New comment by ollin in "Ask HN: Share your personal website"

ollin — Wed, 14 Jan 2026 18:04:12 +0000

https://madebyoll.in

I write about on-device generative models (particularly world models). Past posts have been reasonably well-received on HN (https://news.ycombinator.com/from?site=madebyoll.in).

New comment by ollin in "Severe performance penalty found in VSCode rendering loop"

ollin — Mon, 27 Oct 2025 04:56:55 +0000

Yeah the issue reads as if someone asked Claude Code "find the most serious performance issue in the VSCode rendering loop" and then copied the response directly into GitHub (without profiling or testing anything).

New comment by ollin in "Mirage 2 – Generative World Engine"

ollin — Fri, 22 Aug 2025 03:26:55 +0000

Notes on my experience:

- Infra/systems: I was able to connect to a server within a minute or two. Once connected, the displayed RTT (roundtrip time?) was around 70ms but actual control-to-action latency was still around ~600-700ms vs the ~30ms I'd expect from an on-device model or game streaming service.

- Image-conditioning & rendering: The system did a reasonable job animating the initial (landscape photo) image I provided and extending it past the edges. However, the video rendering style drifted back to "constrast-boosted video game" within ~10s. This style drift shows up in their official examples as well (https://x.com/DynamicsLab_AI/status/1958592749378445319).

- Controls: Apart from the latency, control-following was relatively faithful once I started holding down Shift. I didn't notice any camera/character drift or spurious control issues, so I guess they are probably using fairly high-quality control labels.

- Memory: I did a bit of memory testing (basically - swinging view side to side and seeing which details got regenerated) and it looks like the model can retain maybe ~3-5s of visual memory + the prompt (but not the initial image).

New comment by ollin in "Genie 3: A new frontier for world models"

ollin — Tue, 05 Aug 2025 16:14:26 +0000

Regarding latency, I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving. This writeup [2] from someone who tried the Genie 3 research preview mentions that "while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself" so a lot of this latency may be added by their client/server streaming setup.

[1] https://x.com/holynski_/status/1952756737800651144

[2] https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...

New comment by ollin in "Genie 3: A new frontier for world models"

ollin — Tue, 05 Aug 2025 15:10:43 +0000

This is very encouraging progress, and probably what Demis was teasing [1] last month. A few speculations on technical details based on staring at the released clips:

1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).

2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.

3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.

[1] https://x.com/demishassabis/status/1940248521111961988

[2] https://deepmind.google/api/blob/website/media/genie_environ...

New comment by ollin in "AI video you can watch and interact with, in real-time"

ollin — Sat, 31 May 2025 20:51:51 +0000

I think the most likely explanation is that they trained a diffusion WM (like DIAMOND) on video rollouts recorded from within a 3D scene representation (like NeRF/GS), with some collision detection enabled.

This would explain:

1. How collisions / teleportation work and why they're so rigid (the WM is mimicking hand-implemented scene-bounds logic)

2. Why the scenes are static and, in the case of should-be-dynamic elements like water/people/candles, blurred (the WM is mimicking artifacts from the 3D representation)

3. Why they are confident that "There's no map or explicit 3D representation in the outputs. This is a diffusion model, and video in/out" https://x.com/olivercameron/status/1927852361579647398 (the final product is indeed a diffusion WM trained on videos, they just have a complicated pipeline for getting those training videos)

New comment by ollin in "World Emulation via Neural Network"

ollin — Tue, 29 Apr 2025 16:00:32 +0000

Got it, that makes sense! In terms of raw compute capability, a Snapdragon 888's GPU should have more than enough power to run this demo smoothly. I think I just need to optimize the inference setup better (maybe switch to WebGPU if the platform supports it?) and do targeted testing on Firefox/Android.

New comment by ollin in "World Emulation via Neural Network"

ollin — Mon, 28 Apr 2025 14:40:05 +0000

Curious, which device/OS/browser? I did all my testing on 4-year old hardware (iPhone 13 Pro, M1 Pro MBP), and the model itself is extremely tiny (~1GFLOP) so I'm optimistic that performance issues would be solvable with a better software stack (e.g. native app).

New comment by ollin in "World Emulation via Neural Network"

ollin — Sat, 26 Apr 2025 19:40:38 +0000

I think https://diamond-wm.github.io is a reasonable place to start (they have public world-model training code, and people have successfully adapted their codebase to other games e.g. https://derewah.dev/projects/ai-mariokart). Most modern world models are essentially image generators with additional inputs (past-frames + controls) added on, so understanding how Diffusion/IADB/Flow Matching work would definitely help.

New comment by ollin in "World Emulation via Neural Network"

ollin — Sat, 26 Apr 2025 01:54:13 +0000

Yup, definitely similar! There are a lot of video-game-emulation World Models floating around now, https://worldarcade.gg had a list. In the self-driving & robotics literature there have also been many WMs created for policy training and evaluation. I don't remember a prior WM built on first-person cell-phone video, but it's a simple enough concept that someone has probably done it for a student project or something :)

New comment by ollin in "World Emulation via Neural Network"

ollin — Sat, 26 Apr 2025 01:36:28 +0000

Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.