Hacker News: acters

New comment by acters in "TurboQuant: A first-principles walkthrough"

acters — Mon, 27 Apr 2026 09:36:18 +0000

Just look at deepseek V4, this preview model uses only 8 GB for 1M token KV cache(the context). It's insanely efficient already. It's just that most models that are coming out are barely catching up with technical breakthroughs. Deepseek are pioneers.

Unfortunately V4 is not trained for most real world usage, it is mainly for world general knowledge.

New comment by acters in "Google Gemma 4 Runs Natively on iPhone with Full Offline AI Inference"

acters — Wed, 15 Apr 2026 11:06:48 +0000

Man can't wait for AI in my brain. And then intelligence will be pay to win.

New comment by acters in "LLMs learn what programmers create, not how programmers work"

acters — Tue, 24 Mar 2026 06:50:20 +0000

Instead of telling the LLM that "run"works like a cli, maybe just tell the LLM that "run" will execute sh/bash/zsh/etc scripts?

New comment by acters in "How to run Qwen 3.5 locally"

acters — Sun, 08 Mar 2026 06:34:21 +0000

I have a 1660ti and the cachyos + aur/llama.cpp-cuda package is working fine for me. With about 5.3 GB of usable memory, I find that the 35B model is by far the most capable one that performs just as fast as the 4B model that fits entirely on my GPU. I did try the 9B model and was surprisingly capable. However 35B still better in some of my own anecdotal test cases. Very happy with the improvement. However, I notice that qwen 3.5 is about half the speed of qwen 3

New comment by acters in "Gemini 3.1 Pro"

acters — Fri, 20 Feb 2026 03:04:50 +0000

I have personally seen a rise of LLMs being too lazy to investigate or do some level of figuring out things on their own and just jump to conclusions and hope you tell them extra information even if it is something they can do on their own.

New comment by acters in "Good Riddance, 4o"

acters — Fri, 13 Feb 2026 18:10:00 +0000

I'm partially fascinated by their reliance on this model. I do miss the models before gpt 5. Openai is quietly locking it away into some vault as we just need to accept whatever model is current. I think I can sympathize with these people on only one merit and that is nostalgia and entertainment. I still load up old versions of software. I still watch old shows. I still play old video games. Under the lens of entertainment, I will never be able to be entertained by the objectively worse models. Old chats are kind of still there but not really, the UI is obviously different and probably will get deleted when I stop paying for the subscription and try to claw back some of my life away from chatting with these stupid models. It's dangerous to hold any meaningful memory with these cloud LLMs. Not to mention the social media traps people fell for, that I was proactively avoiding. I did get some part of me attached to gpt 4o. I quickly realized it and moved away from it. This post is a mixture of complex emotions but it is just what I felt like posting. It's fine to ridicule people for wanting to be that deeply attached but these cloud LLMs show how easily it is to start a social habit and lose it in an instant. We need more healthcare push to prevent (and treat the) social attachment from happening to LLMs.

New comment by acters in "The Day the Telnet Died"

acters — Wed, 11 Feb 2026 04:20:56 +0000

If it's alright to be pedantic, anyone with programming knowledge can do the same without these tools. What these offer is tried and tested secure code for client side needs, clear options and you don't need to hand roll code for.

New comment by acters in "The Day the Telnet Died"

acters — Wed, 11 Feb 2026 04:14:29 +0000

So basically the same as censorship because that is the exact same thing blocking ports does.

New comment by acters in "Mistral OCR 3"

acters — Sat, 20 Dec 2025 03:18:47 +0000

Devstral 2 is free from the API. That has to be a bigger point to what makes it better. The price to performance ratio is practically better in every way. Does it matter if the performance is slightly worse when it is practically free?

New comment by acters in "RAM is so expensive, Samsung won't even sell it to Samsung"

acters — Thu, 04 Dec 2025 16:28:33 +0000

I am still running an i5 4690k, really all I need is better GPU but those prices are criminal. I wish I got a 4090 when I had the chance rip

New comment by acters in "Functional Quadtrees"

acters — Thu, 04 Dec 2025 16:26:14 +0000

That is why I like harmonic app, there is an invite button separating the upvote and downvote. Never going to have this kind of issue

New comment by acters in "Shopping research in ChatGPT"

acters — Mon, 24 Nov 2025 20:13:43 +0000

The reality is that advertisers will be able to inject their products into the LLMs through manufactured results, prompt engineering and possibly long term deals integrating training data for their brand and product lines.

New comment by acters in "Three Years from GPT-3 to Gemini 3"

acters — Mon, 24 Nov 2025 20:03:09 +0000

I feel like hallucinations have changed over time from factual errors randomly shoehorned into the middle of sentences to the LLMs confidently telling you they are right and even provide their own reasoning to back up their claims, which most of the time are references that don't exist.

New comment by acters in "Incus-OS: Immutable Linux OS to run Incus as a hypervisor"

acters — Fri, 14 Nov 2025 22:13:20 +0000

I use incus to pass a containerized kali os the Wayland and x11 sockets, and whatever else maybe in the /run/user/1000 folder and x11 socket folder, like pipewire. It isn't perfect, but it's really nice spawning a shell/bar/etc inside the container and it goes over the current Wayland desktop. Then I am able to use it to spawn other graphical apps. It works really well. Incus is amazing, or lxc and wayland in general.

New comment by acters in "Steam Controller"

acters — Thu, 13 Nov 2025 08:14:47 +0000

I still have the og steam controller, three of them in fact. They still work but I lost the dongle and rely on Bluetooth. It was an experience. Definitely won't consider buying a controller from them again. The Xbox controller is perfect. Simple and good enough to use.

New comment by acters in "Gemini in Chrome"

acters — Fri, 19 Sep 2025 09:05:55 +0000

As far as I can tell, Linux will remain not targeted by attempts to sponge off all kinds of user data. Which makes me so happy that I finally made the leap.

New comment by acters in "GPT-5: "How many times does the letter b appear in blueberry?""

acters — Sun, 10 Aug 2025 08:11:22 +0000

I asked GPT 5 to spell out the individual letters of strawberry or blueberry. It did it correctly by essentially putting a space char in between the letters.

Then I simply asked it to count all unique letters in the word. GPT 5 still got it completely correct without thinking.

Lastly I asked how many r(or b) is in the word. This one for some reason switched to GPT 5 thinking with few seconds of reasoning. It out put the correct number.

I guess starting the conversation by painstakingly walking it over to the correct answer helps it out. Idk it's a silly test

New comment by acters in "Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs"

acters — Thu, 07 Aug 2025 06:12:21 +0000

Another caveat with this method is that both larger and smaller models need to behave very similar because a lot of the savings come from generating the necessary fluff around each detail such as grammar, formatting and words/letters that transition between each other.

Unsurprisingly gpt-oss has both larger and smaller models that work very similarly! Both model sizes are so similar that even if getting a few wrong would not be slowing down the performance enough to equal the speed of the larger model(which is the worst case with this setup). We want the speed of the smaller model as much as possible. That is all

New comment by acters in "Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs"

acters — Thu, 07 Aug 2025 06:02:50 +0000

I believe that is exactly the downside of using speculative decoding, which is why it is very important to have the models properly sized between each other by making sure the small use is big enough to be mostly correct while also being exceptionally faster than the larger one. However the larger one has to be fast enough that catching flaws won't introduce too manyrandom delays. Also, if the small one is incorrect then the larger one correcting the mistake is miles better than leaving in incorrect output.

It is about improving quality while allowing for faster speed most of the time. The tradeoff is that you consume more memory from having two models loaded vs one of them exclusively.

If you just focus on one then it would make sense to reduce memory usage by just running the smaller model.

New comment by acters in "Running GPT-OSS-120B at 500 tokens per second on Nvidia GPUs"

acters — Thu, 07 Aug 2025 05:55:48 +0000

Personally, I think bigger companies should be more proactive and work with some of the popular inference engine software devs with getting their special snowflake LLM to work before it gets released. I guess it is all very much experimental at the end of the day. Those devs are putting in God's work for us to use on our budget friendly hardware choices.