<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ollin</title><link>https://news.ycombinator.com/user?id=ollin</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 09 Apr 2026 05:25:00 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ollin" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ollin in "Project Glasswing: Securing critical software for the AI era"]]></title><description><![CDATA[
<p>- The OpenBSD one is 'TCP packets with invalid SACK options could crash the kernel' <a href="https://cdn.openbsd.org/pub/OpenBSD/patches/7.8/common/025_sack.patch.sig" rel="nofollow">https://cdn.openbsd.org/pub/OpenBSD/patches/7.8/common/025_s...</a><p>- One (patched) Linux kernel bug is 'UaF
when sys_futex_requeue() is used with different flags' <a href="https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd945366151ba54fcb9580508" rel="nofollow">https://github.com/torvalds/linux/commit/e2f78c7ec1655fedd94...</a><p>These links are from the more-detailed 'Assessing Claude Mythos Preview’s cybersecurity capabilities' post released today <a href="https://red.anthropic.com/2026/mythos-preview/" rel="nofollow">https://red.anthropic.com/2026/mythos-preview/</a>, which includes more detail on some of the public/fixed issues (like the OpenBSD one) as well as hashes for several unreleased reports and PoCs.</p>
]]></description><pubDate>Tue, 07 Apr 2026 20:25:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47680899</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=47680899</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47680899</guid></item><item><title><![CDATA[New comment by ollin in "System Card: Claude Mythos Preview [pdf]"]]></title><description><![CDATA[
<p>My impression was entirely the opposite; the unsolved subset of SWE-bench verified problems <i>are</i> memorizable (solutions are pulled from public GitHub repos) and the evaluators are often so brittle or disconnected from the problem statement that the <i>only</i> way to pass is to regurgitate a memorized solution.<p>OpenAI had a whole post about this, where they recommended switching to SWE-bench Pro as a better (but still imperfect) benchmark:<p><a href="https://openai.com/index/why-we-no-longer-evaluate-swe-bench-verified/" rel="nofollow">https://openai.com/index/why-we-no-longer-evaluate-swe-bench...</a><p>> We audited a 27.6% subset of the dataset that models often failed to solve and found that at least 59.4% of the audited problems have flawed test cases that reject functionally correct submissions<p>> SWE-bench problems are sourced from open-source repositories many model providers use for training purposes. In our analysis we found that all frontier models we tested were able to reproduce the original, human-written bug fix<p>> improvements on SWE-bench Verified no longer reflect meaningful improvements in models’ real-world software development abilities. Instead, they increasingly reflect how much the model was exposed to the benchmark at training time<p>> We’re building new, uncontaminated evaluations to better track coding capabilities, and we think this is an important area to focus on for the wider research community. Until we have those, OpenAI recommends reporting results for SWE-bench Pro.</p>
]]></description><pubDate>Tue, 07 Apr 2026 19:26:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47680158</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=47680158</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47680158</guid></item><item><title><![CDATA[New comment by ollin in "Tell HN: Apple development certificate server seems down?"]]></title><description><![CDATA[
<p>Here was the developer thread <a href="https://developer.apple.com/forums/thread/818403" rel="nofollow">https://developer.apple.com/forums/thread/818403</a> I found with lots of other reports of "Unable to Verify App - An internet connection is required to verify the trust of the developer".<p>Although <a href="https://developer.apple.com/system-status/" rel="nofollow">https://developer.apple.com/system-status/</a> was green for most of the 3-4 hour outage, the page now at least acknowledges two minutes of downtime:<p><pre><code>    App Store Connect - Resolved Outage
    Today, 12:04 AM - 12:06 AM
    All users were affected
    Users experienced a problem with this service.
</code></pre>
Not a great developer experience.</p>
]]></description><pubDate>Wed, 11 Mar 2026 00:28:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47330496</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=47330496</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47330496</guid></item><item><title><![CDATA[New comment by ollin in "Mac mini will be made at a new facility in Houston"]]></title><description><![CDATA[
<p>The still photo (with 富士康科技 photoshopped out) is the second image of the "In Houston, workers assemble advanced AI servers" photo carousel <a href="https://www.apple.com/newsroom/images/2026/02/apple-accelerates-us-manufacturing-with-mac-mini-production/article/Apple-US-manufacturing-investment-Houston-data-center-assembly-line-02_big.jpg.large_2x.jpg" rel="nofollow">https://www.apple.com/newsroom/images/2026/02/apple-accelera...</a></p>
]]></description><pubDate>Wed, 25 Feb 2026 03:08:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=47146795</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=47146795</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47146795</guid></item><item><title><![CDATA[New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"]]></title><description><![CDATA[
<p>A lot of people mentioned this! The "dreamlike" comparison is common as well. In both cases, you have a network of neurons rendering an image approximating the real world :) so it sort of makes sense.<p>Regarding the specific boiling-textures effect: there's a tradeoff in recurrent world models between jittering (constantly regenerating fine details to avoid accumulating error) and drifting (propagating fine details as-is, even when that leads to accumulating error and a simplified/oversaturated/implausible result). The forest trail world is tuned way towards jittering (you can pause with `p` and step frame-by-frame with `.` to see this).  So if the effect resembles LSD, it's possible that LSD applies some similar random jitter/perturbation to the neurons within your visual cortex.</p>
]]></description><pubDate>Fri, 30 Jan 2026 16:03:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=46826014</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46826014</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46826014</guid></item><item><title><![CDATA[New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"]]></title><description><![CDATA[
<p>On a technical level, this looks like the same diffusion transformer world model design that was shown in the Genie 3 post (text/memory/d-pad input, video output, 60sec max context, 720p, sub-10FPS control latency due to 4-frame temporal compression). I expect the public release uses a cheaper step-distilled / quantized version. The limitations seen in Genie 3 (high control latency, gradual loss of detail and drift towards videogamey behavior, 60s max rollout length) are still present. The editing/sharing tools, latency, cost, etc. can probably improve over time with this same model checkpoint, but new features like audio input/output, higher resolution, precise controls, etc. likely won't happen until the next major version.<p>From a product perspective, I still don't have a good sense of what the market for WMs will look like. There's a tension between serious commercial applications (robotics, VFX, gamedev, etc. where you want way, way higher fidelity and very precise controllability), vs current short-form-demos-for-consumer-entertainment application (where you want the inference to be cheap-enough-to-be-ad-supported and simple/intuitive to use). Framing Genie as a "prototype" inside their most expensive AI plan makes a lot of sense while GDM figures out how to target the product commercially.<p>On a personal level, since I'm also working on world models (albeit very small local ones <a href="https://news.ycombinator.com/item?id=43798757">https://news.ycombinator.com/item?id=43798757</a>), my main thought is "oh boy, lots of work to do". If everyone starts expecting Genie 3 quality, local WMs need to become a lot better :)</p>
]]></description><pubDate>Thu, 29 Jan 2026 23:11:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=46818196</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46818196</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46818196</guid></item><item><title><![CDATA[New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"]]></title><description><![CDATA[
<p>Yup, similar concepts! Just at two opposite extremes of the compute/scaling spectrum.<p>- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.<p>- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.<p>There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World <a href="https://github.com/robbyant/lingbot-world" rel="nofollow">https://github.com/robbyant/lingbot-world</a>, Waypoint 1 <a href="https://huggingface.co/blog/waypoint-1" rel="nofollow">https://huggingface.co/blog/waypoint-1</a>) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.</p>
]]></description><pubDate>Thu, 29 Jan 2026 20:02:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=46815779</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46815779</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46815779</guid></item><item><title><![CDATA[New comment by ollin in "Project Genie: Experimenting with infinite, interactive worlds"]]></title><description><![CDATA[
<p>Really great to see this released! Some interesting videos from early-access users:<p>- <a href="https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197" rel="nofollow">https://youtu.be/15KtGNgpVnE?si=rgQ0PSRniRGcvN31&t=197</a> walking through various cities<p>- <a href="https://x.com/fofrAI/status/2016936855607136506" rel="nofollow">https://x.com/fofrAI/status/2016936855607136506</a> helicopter / flight sim<p>- <a href="https://x.com/venturetwins/status/2016919922727850333" rel="nofollow">https://x.com/venturetwins/status/2016919922727850333</a> space station, <a href="https://x.com/venturetwins/status/2016920340602278368" rel="nofollow">https://x.com/venturetwins/status/2016920340602278368</a> Dunkin' Donuts<p>- <a href="https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207" rel="nofollow">https://youtu.be/lALGud1Ynhc?si=10ERYyMFHiwL8rQ7&t=207</a> simulating a laptop computer, moving the mouse<p>- <a href="https://x.com/emollick/status/2016919989865840906" rel="nofollow">https://x.com/emollick/status/2016919989865840906</a> otter airline pilot with a duck on its head walking through a Rothko inspired airport</p>
]]></description><pubDate>Thu, 29 Jan 2026 18:18:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46814137</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46814137</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46814137</guid></item><item><title><![CDATA[New comment by ollin in "FLUX.2 [Klein]: Towards Interactive Visual Intelligence"]]></title><description><![CDATA[
<p>Z-Image is another open-weight image-generation model by Alibaba [1]. Z-Image Turbo was released around the same time as (non-Klein) FLUX.2 and received generally warmer community response [2] since Z-image Turbo was faster, also high-quality, and reportedly better at generating NSFW material. The base (non-Turbo) version of Z-Image is not yet released.<p>[1] <a href="https://tongyi-mai.github.io/Z-Image-blog/" rel="nofollow">https://tongyi-mai.github.io/Z-Image-blog/</a><p>[2] <a href="https://www.reddit.com/r/StableDiffusion/comments/1p9uu69/no_hard_feelings/" rel="nofollow">https://www.reddit.com/r/StableDiffusion/comments/1p9uu69/no...</a></p>
]]></description><pubDate>Sat, 17 Jan 2026 02:53:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=46654814</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46654814</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46654814</guid></item><item><title><![CDATA[New comment by ollin in "Ask HN: Share your personal website"]]></title><description><![CDATA[
<p><a href="https://madebyoll.in" rel="nofollow">https://madebyoll.in</a><p>I write about on-device generative models (particularly world models). Past posts have been reasonably well-received on HN (<a href="https://news.ycombinator.com/from?site=madebyoll.in">https://news.ycombinator.com/from?site=madebyoll.in</a>).</p>
]]></description><pubDate>Wed, 14 Jan 2026 18:04:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=46619686</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=46619686</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46619686</guid></item><item><title><![CDATA[New comment by ollin in "Severe performance penalty found in VSCode rendering loop"]]></title><description><![CDATA[
<p>Yeah the issue reads as if someone asked Claude Code "find the most serious performance issue in the VSCode rendering loop" and then copied the response directly into GitHub (without profiling or testing anything).</p>
]]></description><pubDate>Mon, 27 Oct 2025 04:56:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=45717536</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=45717536</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45717536</guid></item><item><title><![CDATA[New comment by ollin in "Mirage 2 – Generative World Engine"]]></title><description><![CDATA[
<p>Notes on my experience:<p>- <i>Infra/systems</i>: I was able to connect to a server within a minute or two. Once connected, the displayed RTT (roundtrip time?) was around 70ms but actual control-to-action latency was still around ~600-700ms vs the ~30ms I'd expect from an on-device model or game streaming service.<p>- <i>Image-conditioning & rendering:</i> The system did a reasonable job animating the initial (landscape photo) image I provided and extending it past the edges. However, the video rendering style drifted back to "constrast-boosted video game" within ~10s. This style drift shows up in their official examples as well (<a href="https://x.com/DynamicsLab_AI/status/1958592749378445319" rel="nofollow">https://x.com/DynamicsLab_AI/status/1958592749378445319</a>).<p>- <i>Controls</i>: Apart from the latency, control-following was relatively faithful once I started holding down Shift. I didn't notice any camera/character drift or spurious control issues, so I guess they are probably using fairly high-quality control labels.<p>- <i>Memory</i>: I did a bit of memory testing (basically - swinging view side to side and seeing which details got regenerated) and it looks like the model can retain maybe ~3-5s of visual memory + the prompt (but not the initial image).</p>
]]></description><pubDate>Fri, 22 Aug 2025 03:26:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=44980742</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=44980742</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44980742</guid></item><item><title><![CDATA[New comment by ollin in "Genie 3: A new frontier for world models"]]></title><description><![CDATA[
<p>Regarding latency, I found a live video of gameplay here [1] and it looks like closer to 1.1s keypress-to-photon latency (33 frames @ 30fps) based on when the onscreen keys start lighting up vs when the camera starts moving. This writeup [2] from someone who tried the Genie 3 research preview mentions that "while there is some control lag, I was told that this is due to the infrastructure used to serve the model rather than the model itself" so a lot of this latency may be added by their client/server streaming setup.<p>[1] <a href="https://x.com/holynski_/status/1952756737800651144" rel="nofollow">https://x.com/holynski_/status/1952756737800651144</a><p>[2] <a href="https://togelius.blogspot.com/2025/08/genie-3-and-future-of-neural-game.html" rel="nofollow">https://togelius.blogspot.com/2025/08/genie-3-and-future-of-...</a></p>
]]></description><pubDate>Tue, 05 Aug 2025 16:14:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=44799979</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=44799979</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44799979</guid></item><item><title><![CDATA[New comment by ollin in "Genie 3: A new frontier for world models"]]></title><description><![CDATA[
<p>This is very encouraging progress, and probably what Demis was teasing [1] last month. A few speculations on technical details based on staring at the released clips:<p>1. You can see fine textures "jump" every 4 frames - which means they're most likely using a 4x-temporal-downscaling VAE with at least 4-frame interaction latency (unless the VAE is also control-conditional). Unfortunately I didn't see any real-time footage to confirm the latency (at one point they intercut screen recordings with "fingers on keyboard" b-roll? hmm).<p>2. There's some 16x16 spatial blocking during fast motion which could mean 16x16 spatial downscaling in the VAE. Combined with 1, this would mean 24x1280x720/(4x16x16) = 21,600 tokens per second, or around 1.3 million tokens per minute.<p>3. The first frame of each clip looks a bit sharper and less videogamey than later stationary frames, which suggests this is could be a combination of text-to-image + image-to-world system (where the t2i system is trained on general data but the i2w system is finetuned on game data with labeled controls). Noticeable in e.g. the dirt/textures in [2]. I still noticed some trend towards more contrast/saturation over time, but it's not as bad as in other autoregressive video models I've seen.<p>[1] <a href="https://x.com/demishassabis/status/1940248521111961988" rel="nofollow">https://x.com/demishassabis/status/1940248521111961988</a><p>[2] <a href="https://deepmind.google/api/blob/website/media/genie_environmental_consistency_6_96KPmd3.mp4" rel="nofollow">https://deepmind.google/api/blob/website/media/genie_environ...</a></p>
]]></description><pubDate>Tue, 05 Aug 2025 15:10:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=44799022</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=44799022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44799022</guid></item><item><title><![CDATA[New comment by ollin in "AI video you can watch and interact with, in real-time"]]></title><description><![CDATA[
<p>I think the most likely explanation is that they trained a diffusion WM (like DIAMOND) on video rollouts recorded from within a 3D scene representation (like NeRF/GS), with some collision detection enabled.<p>This would explain:<p>1. How collisions / teleportation work and why they're so rigid (the WM is mimicking hand-implemented scene-bounds logic)<p>2. Why the scenes are static and, in the case of should-be-dynamic elements like water/people/candles, blurred (the WM is mimicking artifacts from the 3D representation)<p>3. Why they are confident that "There's no map or explicit 3D representation in the outputs. This is a diffusion model, and video in/out" <a href="https://x.com/olivercameron/status/1927852361579647398" rel="nofollow">https://x.com/olivercameron/status/1927852361579647398</a> (the final product is indeed a diffusion WM trained on videos, they just have a complicated pipeline for getting those training videos)</p>
]]></description><pubDate>Sat, 31 May 2025 20:51:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=44146849</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=44146849</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44146849</guid></item><item><title><![CDATA[New comment by ollin in "World Emulation via Neural Network"]]></title><description><![CDATA[
<p>Got it, that makes sense! In terms of raw compute capability, a Snapdragon 888's GPU should have more than enough power to run this demo smoothly. I think I just need to optimize the inference setup better (maybe switch to WebGPU if the platform supports it?) and do targeted testing on Firefox/Android.</p>
]]></description><pubDate>Tue, 29 Apr 2025 16:00:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43834437</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=43834437</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43834437</guid></item><item><title><![CDATA[New comment by ollin in "World Emulation via Neural Network"]]></title><description><![CDATA[
<p>Curious, which device/OS/browser? I did all my testing on 4-year old hardware (iPhone 13 Pro, M1 Pro MBP), and the model itself is extremely tiny (~1GFLOP) so I'm optimistic that performance issues would be solvable with a better software stack (e.g. native app).</p>
]]></description><pubDate>Mon, 28 Apr 2025 14:40:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=43822024</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=43822024</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43822024</guid></item><item><title><![CDATA[New comment by ollin in "World Emulation via Neural Network"]]></title><description><![CDATA[
<p>I think <a href="https://diamond-wm.github.io" rel="nofollow">https://diamond-wm.github.io</a> is a reasonable place to start (they have public world-model training code, and people have successfully adapted their codebase to other games e.g. <a href="https://derewah.dev/projects/ai-mariokart" rel="nofollow">https://derewah.dev/projects/ai-mariokart</a>). Most modern world models are essentially image generators with additional inputs (past-frames + controls) added on, so understanding how Diffusion/IADB/Flow Matching work would definitely help.</p>
]]></description><pubDate>Sat, 26 Apr 2025 19:40:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=43806473</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=43806473</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43806473</guid></item><item><title><![CDATA[New comment by ollin in "World Emulation via Neural Network"]]></title><description><![CDATA[
<p>Yup, definitely similar! There are a lot of video-game-emulation World Models floating around now, <a href="https://worldarcade.gg" rel="nofollow">https://worldarcade.gg</a> had a list. In the self-driving & robotics literature there have also been many WMs created for policy training and evaluation. I don't remember a prior WM built on first-person cell-phone video, but it's a simple enough concept that someone has probably done it for a student project or something :)</p>
]]></description><pubDate>Sat, 26 Apr 2025 01:54:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=43800231</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=43800231</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43800231</guid></item><item><title><![CDATA[New comment by ollin in "World Emulation via Neural Network"]]></title><description><![CDATA[
<p>Mostly 1xA10 (though I switched to 1xGH200 briefly at the end, lambda has a sale going). The network used in the post is very tiny, but I had to train a really long time w/ large batch to get somewhat-stable results.</p>
]]></description><pubDate>Sat, 26 Apr 2025 01:36:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=43800142</link><dc:creator>ollin</dc:creator><comments>https://news.ycombinator.com/item?id=43800142</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43800142</guid></item></channel></rss>