Hacker News: w1nk

New comment by w1nk in "What Is ChatGPT Doing and Why Does It Work? (2023)"

w1nk — Tue, 18 Jun 2024 18:24:48 +0000

For a more informed opinion than folks on the internet, here's some work from Microsoft with early/internal access to gpt4: https://arxiv.org/abs/2303.12712 . I don't think people close by these systems share the same dumb parrot sentiment at all.

New comment by w1nk in "AzerothCore: Self-Hosted World of Warcraft 3.3.5a Server"

w1nk — Fri, 12 Apr 2024 20:11:45 +0000

These projects are awesome to see, there are similar efforts for everquest. Is anyone aware of anyone trying to create different clients/renderers for these MMOs? A VR client for any of these worlds would instantly be amazing.

New comment by w1nk in "AI isn’t good enough"

w1nk — Fri, 25 Aug 2023 08:39:20 +0000

Just to further this, it's not just 'big names' that feel this way. Read this paper from a team at Microsoft Research: https://arxiv.org/abs/2303.12712 . These folks spent months studying properties of GPT-4, that paper is ~150 pages of examples probing the boundaries of the model's world understanding. There is obviously some emergent complexity arising from the training procedure.

New comment by w1nk in "Formula E team caught using RFID scanner that got live tire data from other cars"

w1nk — Sun, 25 Jun 2023 16:16:48 +0000

This seems like (clearly?) bad journalism. RFID tags on their own are for sure not taking active sensor readings, that'd be left for something closer to TPMS style systems.

New comment by w1nk in "Dynamic Branch Prediction with Perceptrons (2000) [pdf]"

w1nk — Wed, 10 May 2023 12:12:24 +0000

Yes, it did: https://chasethedevil.github.io/post/the_neural_network_in_y...

New comment by w1nk in "Llama.cpp 30B runs with only 6GB of RAM now"

w1nk — Sat, 01 Apr 2023 11:32:34 +0000

No, your OP is mistaken. The model weights have to all be accessed for the forward pass. What has happened is that using mmap changes where the memory is consumed (kernel vs process) and so it was being incorrectly interpreted. There are still 30B parameters, and you'll need that times however big your floating point representation is to use the model still.

New comment by w1nk in "Llama.cpp 30B runs with only 6GB of RAM now"

w1nk — Sat, 01 Apr 2023 09:48:07 +0000

Ahh, this would do it, thanks :).

New comment by w1nk in "Llama.cpp 30B runs with only 6GB of RAM now"

w1nk — Fri, 31 Mar 2023 22:54:30 +0000

That shouldn't be the case. 30B is a number that directly represents the size of the model, not the size of the other components.

New comment by w1nk in "Llama.cpp 30B runs with only 6GB of RAM now"

w1nk — Fri, 31 Mar 2023 21:46:52 +0000

I also have this question, yes it should be. The forward pass should require accessing all the weights AFAIK.

New comment by w1nk in "Llama.cpp 30B runs with only 6GB of RAM now"

w1nk — Fri, 31 Mar 2023 21:43:28 +0000

Does anyone know how/why this change decreases memory consumption (and isn't a bug in the inference code)?

From my understanding of the issue, mmap'ing the file is showing that inference is only accessing a fraction of the weight data.

Doesn't the forward pass necessitate accessing all the weights and not a fraction of them?

New comment by w1nk in "‘Every Parent’s Nightmare’: TikTok Is a Venue for Child Sexual Exploitation"

w1nk — Tue, 21 Feb 2023 23:11:27 +0000

So can we at least take a second to appreciate the orders of magnitude of growth between BBS' and tiktok? With data structures and algorithms we're very willing to accept that orders of magnitudes shifting can change our basic assumptions about things.

The point isn't that these things haven't existed in some form forever, it's that they're scaling to points where these effects are becoming increasingly dangerous, proportional to the growth of these systems.

New comment by w1nk in "Google “Pepper”: When a Brand Becomes More Popular Than the Meaning of the Word"

w1nk — Tue, 04 Oct 2022 08:48:02 +0000

This thread is exposing one of the dirty realities of modern SEO. Most of the time, SEO folks will measure their success based on having some way of determining their ranking for a given search query.

The problem is that in 2022, google personalizes (aka re-ranks) the SERPs on enough dimensions (device type, location, user, etc) that a single mapping of search query -> ranking fails to capture any of the nuance that google is applying to their user understanding. I'm not sure how or when the SEO space will actually reckon with this, or if they'll keep just pushing poor 'visibility' metrics that can't actually be reduced to a single dimension.

New comment by w1nk in "Python type hints are Turing complete"

w1nk — Fri, 09 Sep 2022 15:28:46 +0000

It's mostly a curiosity. Occasionally things like this occur: https://googleprojectzero.blogspot.com/2021/12/a-deep-dive-i... . TL;DR exploited an accidentally turing complete file format to execute instructions inside a virtual machine.

New comment by w1nk in "Apple is building a demand-side platform"

w1nk — Thu, 04 Aug 2022 19:20:40 +0000

For big organizations that have the capacity and data, online advertising becomes a ROI optimization game, and one that they perform quite well at.

For a random business that wants to advertise online, without the infrastructure and data capability to back it, they will struggle to compete unless they exist in a segment full of similar peers. When the former happens, we see articles about how PPC doesn't actually work, etc.

Reality is that it takes engineering work and infrastructure, coupled with some data capabilities to unlock real value in the online advertising space.

As noted, online advertising brought all sorts of insight and visibility over traditional 'offline' marketing channels, but with that comes more savvy competitors that will do all the data things you're not.

New comment by w1nk in "How normal am I?"

w1nk — Tue, 05 Jul 2022 14:02:36 +0000

Given the usual markers of attractiveness, wouldn't you expect age and weight to be strong predictors usually?

New comment by w1nk in "How CUDA Programming Works"

w1nk — Tue, 05 Jul 2022 13:49:31 +0000

There's nothing 1960s about it, that's just not well reasoned. These computation constructs/tools just don't currently have better abstractions while maintaining the desired computational performance.

It's a strange intersection of needs where one wants or needs something like CUDA, but doesn't care to ensure their computation is actually running optimally. If you don't want to be bothered with control and granularity, why are you trying to write high performance CUDA code in the first place?

Would you mind elaborating as to what your hobby project was?

New comment by w1nk in "How CUDA Programming Works"

w1nk — Tue, 05 Jul 2022 13:35:21 +0000

What? CUDA is intentionally granular and low level, why do you feel that is a negative thing at this level of abstraction? Are you suggesting the tools should be better so that doesn't have to be the case? I can't figure out what you're actually trying to express here.

New comment by w1nk in "Imagen, a text-to-image diffusion model"

w1nk — Tue, 24 May 2022 18:20:15 +0000

You're not wrong that the dataset and compute are important, and if you browse the author's previous work, you'll see there are datasets available. The reproduction of DALL-E 2 required a dataset of similar size to the one imagen was trained on (see: https://arxiv.org/abs/2111.02114).

The harder part here will be getting access to the compute required, but again, the folks involved in this project have access to lots of resources (they've already trained models of this size). We'll likely see some trained checkpoints as soon as they're done converging.

New comment by w1nk in "Imagen, a text-to-image diffusion model"

w1nk — Tue, 24 May 2022 11:19:10 +0000

To expand a bit for the grandparent, if you check out this authors other repos you'll notice they have a thing for implementing these papers (multiple DALLE-2 implementations for instance). You should expect to see an implementation there pretty quickly I'd guess.

New comment by w1nk in "“Amateur” programmer fought cancer with 50 Nvidia Geforce 1080Ti"

w1nk — Sat, 21 May 2022 09:00:04 +0000

Why does everyone assume this guy has zero business attempting this? If you read his credentials, he should be every bit as qualified as you to attempt this kind of work while understanding the pitfalls.

According to his CV he's been active in the field for quite some time. The default assumption that he's an idiot and going to kill people just seems too cynical here.

Grandparent - you specifically mention having noted methodology problems, would you mind sharing where in the methodology you think he's gone wrong?