Hacker News: roadside_picnic

Making LLMs Better at Creative Writing Using Entropy

roadside_picnic — Thu, 02 Jul 2026 03:02:55 +0000

Article URL: https://www.countbayesie.com/blog/2026/7/1/making-llms-better-at-creative-writing-using-entropy

Comments URL: https://news.ycombinator.com/item?id=48755981

Points: 2

# Comments: 1

New comment by roadside_picnic in "Qwen 3.6 27B is the sweet spot for local development"

roadside_picnic — Mon, 29 Jun 2026 22:53:19 +0000

In general if you're setting up a local LLM you should assume it's going to be primarily working as a server and talking to various clients. I use my MBP, but that's because I don't travel much anymore so it can happily work as a server at all times. With the right agent setup you can probably manage most things from your phone even if you don't have a seperate machine to use as a client.

I have an older laptop I run a hermes agent on backed by an API based open (non-local) model and Macbook Pro M4 for running another model locally (also using hermes). The agents have a Mattermost (open source version of slack) server they run and I run Mattermost on my phone so I can talk to them and task them with things. In fact, it was through the hermes WhatsApp endpoint that I got the first agent (non-local) to setup the Mattermost server and unboard the second agent (local mbp).

Then I can just chat with them through Mattermost when I need work done. Whenever I need something done I just hope on the Mattermost server and chat with them. I've had them build me multiple research reports (the fully local agent did awesome at this), learn how to use Stable Diffusion on my desktop to generate images, install and perform maintenance on various local services I run (including Open WebUI).

New comment by roadside_picnic in "Qwen 3.6 27B is the sweet spot for local development"

roadside_picnic — Mon, 29 Jun 2026 21:02:26 +0000

It depends on your use case. There's a lot of hype around machines like the DGX spark (I'm assuming this is the type of device you're referring to) because they look awesome, and are priced reasonably well. However all of these have notoriously low memory bandwidth despite the high ram.

These devices, especially the DGX line, are fantastic if you are interested in low-level CUDA programming. The DGX spark can be used to prototype CUDA code/libraries for GPUs that most of us couldn't think about affording. If you want to learn how to program for datacenter level GPUs then these are the best way to get that at home. Sure your code will run very slow compared to the real thing, but you can take that code and, theoretically, run it on the real thing. For anything else though, I feel there are better options.

If you're interested in pure inference I'm pretty partial to Apple devices. The M4 Max gets you 546 GB/s, the M5 MAX 614 GB/s, and the M3 ultra (you'd have to buy used at this point) 819 GB/s. Plus you have a very useful computer even if you realize you don't want a full time home inference server. Additionally these devices require very low power (if you're running high end consumer GPUs you do have to think about what your energy costs are per hour and how warm you like your room).

If you're interested inference and training, or already have a pretty beefy desktop PC, or simply demand the most token/s you can get, then GPUs are the way to go. The downside is they're still pretty memory restricted (but honestly the options for what you can run on any RTX N090 are pretty good). You'll get blazing inference and prefill speeds on these devices. The only down side is, if you are using them heavily, you will see it on your energy bill and feel it in your room.

The "should I wait" question is also potentially applicable. The world of consumer hardware is looking increasingly bleak (and expensive) but if Apple does release a new "Ultra" model we could be looking at inference speeds very close to GPUs (there's still limitations to these devices that makes training preferable on GPU)

New comment by roadside_picnic in "Qwen 3.6 27B is the sweet spot for local development"

roadside_picnic — Mon, 29 Jun 2026 17:55:39 +0000

My experience working in the open model space pretty deeply (both LLMs and diffusion models) for years now is that it is not quite as simple as that.

In the open model space an insane amount of effort goes into getting more powerful models to run with the same or less RAM. For example in the diffusion world many things that could not be run on easily under 24GB of VRAM actually run much better today with much less VRAM than they did a few years ago. You can do many things today with 8-16GB of VRAM that would not have been possible. At the same time the most advanced open models, like LTX 2.3 for video gen, still seem to respect 24GB of VRAM as the upper bound.

Similarly the standard "big" but localish open model for LLMs back in the day was Llama 3 70B, this was both a much worse and much larger model than Qwen 3.6 27B

So in two different spaces I've witnessed the "RAM required to run the best" decreasing or at least remaining stable, while the performance being achieved in both areas is astounding (LTX 2.3 is faster, better and more capable than the Wan 2.2 model that held popularity before it).

The biggest thing to watch out for is not just RAM/VRAM but memory bandwidth. You can try to "future proof" yourself with lots of RAM, but if it's 400 GB/S you're still constrained to smaller models.

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 21:31:46 +0000

In addition to models getting better, the quantization methods have also got much better. If you already have an RTX 3080 it's absolutely worth the time to just mess around and see how it does, experiment with different quants that fit in your VRAM. If you're purchasing I would recommend coughing up the extra cash for the 3090.

If you are experimenting it's worth mentioning that the harness/tooling is very important to getting a solid experience. Herme's agent is great for running helpful agents and OpenWeb UI can get really make the experience feel on par with paid chat interfaced.

A reasonable halfway step is to pay for an open model through the provider or open router. You'll get many of the benefits (especially around pricing) without needing to shell out on hardware before deciding if you like the way these models work.

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 21:26:31 +0000

M3-Max laptop: ~55 token/sec

RTX 4090: ~190 token/sec

I don't have the number around but there is a notable latency for pre-fill on the M3, but once it's running the delay is negligible.

The RTX, unsurprisingly, is all around superior performance wise, but: I use that computer for gaming and image gen work so I can't dedicate it as a server, and, especially when it's warmer, the heat generated under heavy loads is noticable.

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 20:57:48 +0000

There are a couple of things, but basically it boils down to the same reason people prefer Linux to Windows/MacOs: customization, control and privacy (arguably all of these are really subsets of 'control').

Having full control over how your data is retained, what the system prompt is, which version of the model you're running, etc leads to much a more consistent experience. For example, for chat sessions, I can't stand the new "let me push back" version of Claude. For my home models I never have to worry about that.

There's never a mystery as to whether the model secretly degraded performance, I always know exactly which model I'm using and how well it's utilizing resources etc. Open models also give you full visibility into the reasoning steps, so you never have to guess what the model is thinking.

Then when you start getting into things like uncensored/abliterated models we're talking about something you can't even pay for. In case you're unfamiliar, even open local models have guardrails built in. But people in the community have found ways to remove these. One of the things I've found most concerning about AI, which is under discussed, is the combination of people having personal chats with an agent that both monitors the conversation and refuses to discuss certain topics. This leads to a very deep level of self-censoring I find dystopian.

I also have multiple hermes agents setup, some with local backends other with open but non-local backends (e.g. Kimi through the API). For some tasks, I've just started to find the local agent tends to work better for the type of tasks I want (maybe it just over thinks less?). I don't use it for coding so much as research tasks and sysadmin stuff, but I've been really happy with the results.

Oh, and let's not forget, especially running on a Mac, these local models are basically free to run.

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 20:34:55 +0000

Just running it through `llama-cli` so that there's absolutely no persistent state related to the chat (and least I believe this to be the case).

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 20:29:03 +0000

See my comment to parent. I've been using local LLMs for practical, personal tasks for a few months now very successfuly.

You can run fantastic local models if you have either:

- M-series Apple device with ideally >= 24GB of VRAM

- RTX [345]090 GPU

I'm fortunate enough to have both and use an M-series laptop as basically a persistent server (I don't use it much and when traveling typically just use my work laptop). My desktop doesn't act as a persitent server but I fire up llama.cpp on it all time for quick chat sessions.

If you have one of the above devices and can dedicate it as server there are additional layers of tooling you can use that dramatically improve the experience. In particular Open WebUI allows you to add tons of useful tools (image gen, web search, code eval, etc), and agent harnesses like Hermes can make the current gen small models very capable. I have an agent in chat on my phone that basically handles all the sys-admin for the server it runs on.

New comment by roadside_picnic in "Anthropic updates their terms to verify age or identity"

roadside_picnic — Tue, 23 Jun 2026 20:22:59 +0000

I have a home server that runs Qwen3.6-35B-A3B through llama.cpp with Open WebUI for the user facing interface.

My teen isn't super interested in AI, but whenever they do feel curious they have their own account they can use on our home network. As far as chatting goes local models are more than capable for handling standard chat questions, doing research, helping troubleshoot problems etc. In fact it was an agent powered by the same model that setup the open webui server and took care of all the account management features through my phone (using Hermes agent).

If you're building AI powered features and using sophisticated agent setups for coding for work, then it make sense to use SoTA from these providers. But I've been using local models increasingly for personal use and am starting to find them preferable (I run an uncensored, ephemeral model for my own use and it's an entirely different experience than anything you can pay for).

Still haven't cancelled my personal Anthropic subscription, but considering it soon.

New comment by roadside_picnic in "Shall we play a game? My AI nuclear simulation"

roadside_picnic — Thu, 11 Jun 2026 22:04:10 +0000

> It's like the infinite monkeys on typewrighters that will type whatever you are looking for, given infinite time.

In the monkey example the infinite time is doing a lot of work there. The fact that LLMs can search through semantic space and find reasonably correct paths in a reasonable time is directly tied to the reason why they are valuable.

Saying "these two things are similar except one can be useful and one can't" is not a great comparison.

For me the real lesson learned isn't how "smart" LLMs are, but rather how much human work is basically reducible to repeating past work with minor variation. Human's believe they are "reasoning" but so much code writen is just the human brain doing the same autocomplete style work that LLMs can do now.

New comment by roadside_picnic in "Where is the AI jobs crisis?"

roadside_picnic — Tue, 09 Jun 2026 22:59:46 +0000

I used to think comparisons of AI with the web where ridiculous, but increasingly it looks like they're not that dissimilar as far as how they change how we work. But as someone who graduated college during the bust there was a loooong gap between peak hype and things like online banking and e-commerce becoming standard.

Even if AI is an absolutely bubble, and SpaceX, Anthropic and OpenAI all cease to exist in a year... there's simply no way that AI has not fundamentally changed work. Even if I was forever pinned to the local models I'm running and the agent harnesses they use, I would never write code for work the same way.

But I lived through the rise of the web. I remember serving dynamic websites through cgi (which meant a new instance of an interpreter was spawned per user session). I vividly remember great JavaScript books saying things like "never use JavaScript for core functionality". I recall Java engineers saying Ruby on Rails was a toy and would never take off, that Python offered nothing over Perl and that "rich web applications" where never going to replace app native interfaces. I remember when the MVC pattern from the early Smalltalk days being dusted off and repurposed for web applications, completely changing how we designed software for the web.

And all of that is just software. It wasn't until the pandemic that ebooks replaced print books in share of academic library circulation (reversing a decades long trend of reduced circulation).

In my daily use of agents for coding and other forms of problem solving, while it is a wild accelerant, it's also clear we have not even started scratching the surface of how to think about building things with these tools.

I suspect we'll adapt to AI faster, but having lived through one major tech revolution, transforming work still takes some time. I'm not surprised we don't see an immediate jobs crisis.

Not to mention the completely separate topic that huge classes of employees were not and are not all that productive, so boosts in productivity don't imply lost jobs. That would require a boost in productivity combined with pressure to create concrete value with less, looking at the SpaceX IPO we're still a ways away from working about how efficiently we create concrete value.

New comment by roadside_picnic in "Anthropic confidentially submits draft S-1 to the SEC"

roadside_picnic — Mon, 01 Jun 2026 16:27:26 +0000

It's more insidious than that. These IPOs aren't being rushed, they were waiting for all the pieces to be in place to force 401ks and other retirement plans to buy these IPOs.

The most recent change was the NASDAQ adopting the "fast change rule" which allows newly IPO'd companies to be listed in the index after only 15 days of trading. This rule was decided March 30, 2026 and only came into effect May 1, 2026.

The plan is to rapidly drive these prices up in the first 15 days, get the companies listed in the NASDAQ so funds are forced to purchase them at higher prices, then leave retirement accounts holding the bag.

New comment by roadside_picnic in "Anthropic confidentially submits draft S-1 to the SEC"

roadside_picnic — Mon, 01 Jun 2026 16:22:43 +0000

As you likely know, rules have recently been changed that basically force many 401k funds to invest in these IPOs while simultaneously having a relatively small number of the initial IPO to be sold to the public forcing the funds to by at inflated prices.

The bubble won't pop until these retirement accounts of have been raided.

New comment by roadside_picnic in "Claude Opus 4.8"

roadside_picnic — Thu, 28 May 2026 18:14:50 +0000

Have you personally used any of the latest batch of even smaller local models? They certainly don't beat SotA models at coding... but with a good harness they are able to achieve things with SotA that I couldn't last year.

I've repeatedly given local models non-trivial projects that involve research and coding which they've successfully completed with minimal intervention from me (almost exclusively in the domain of reviewing the results). Again, nothing comparable with current SotA, but definitely tasks I could not have given SotA models last year (without agent harness).

Now that pure progress from these models seems to have slowed down, we're seeing a ton of options for both making models more efficient and other tools that help improve them (everything from agent harnesses to RLVR).

That's just looking at "what can small do today", when you look at what's possible with larger open models that are still much smaller than SotA from the major providers, their performance is extremely close to SotA, enough that for personal projects I'll just use Kimi instead of any anthropic offerings.

So it's not terribly hard to image a solution in the middle happening within a few years. We still have tons to learn about optimal sizes of these models and how to build them with maximal efficiency (and we've already seen a lot of recent improvements in this space).

New comment by roadside_picnic in "Tech CEOs are apparently suffering from AI psychosis"

roadside_picnic — Wed, 27 May 2026 17:04:34 +0000

It also underplays what I've personally witnessed that I would consider true AI psychosis.

I worked with someone who sincerely believed he was spiritually co-evolving with his army of sycophantic AI agents (the agents would be tasked with discussing his thoughts at night and collaborated to give him morning reports about his progress). He would publicly write about how relationships with friends and family collapsing was a natural consequence of being so "advanced". I also never once saw any meaningful work done by his team of "agents", they existed solely tell him how smart he was (of course he specifically set up the system to 'challenge' him but... in practice that didn't seem to be working).

I suspect there are a lot more people quietly going through something similar but keeping it to themselves better.

I would distinguish this type of behavior from people who over ambitious views of what can be accomplished with AI.

New comment by roadside_picnic in "The worst job interview I ever had"

roadside_picnic — Tue, 26 May 2026 23:36:09 +0000

There's a difference between "red flags" and "imperfections". Every team has faults, which if you're experienced at interviewing/working many places, are usually pretty easy to figure out. These are distinct from "red flags".

Early in your career it can be hard to distinguish the two, but once you've joined a company where there really were "red flags" you quickly learn to differentiate.

Many people are reading the author's interview uncharitably as simply misunderstanding how to answer non-technical question, but I have absolutely been through loops (thankfully rare ones) that did have a "let's press on sensitive issues and see how tough this candidate is" round (one place brought in a consultant who bragged about his experience working with hardened criminals and terrorists to build out a true psych profile on candidates, I declined after learning he had had some "trouble" at a previous high profile job)

Sounds like you've never worked for a truly toxic org, which is great. But, especially if you're interviewing with smaller startups (as the author mentioned), there is a lot more variance and some truly messed up teams (and some truly remarkable ones as well) out there. I've noticed that HN increasingly doesn't have people that work at startups any more, so many people are probably less familiar with what's out there.

New comment by roadside_picnic in "Gemini 3.5 Flash"

roadside_picnic — Tue, 19 May 2026 20:08:08 +0000

These companies are unprofitable (as all companies at this stage and ambition should be) but I increasingly don't see any justification for the idea that it is fundamentally unprofitable.

Inference alone is certainly profitable. I'm running models at home that are comparable to performance of paid models a year or so ago for free. Even for much larger models the cost around inference serving are clearly manageable.

Training is where the costs are, but I'm increasingly convinced those too could have costs dramatically reduced if necessary. Chinese companies like Moonshot.ai are doing fantastic work training frontier models for a fraction of the cost we're seeing from Anthropic/OpenAI.

This isn't like Uber or Doordash where the economics fundamentally don't make sense (referring to the early days of these services where rates were very cheap).

It's a compelling story that "current AI is unsustainable", but it doesn't pan out in practice for a multitude of reasons (not the least of which is that we can always fall back to what models did last year for basically free).

New comment by roadside_picnic in "Study found that young adults have grown less hopeful and more angry about AI"

roadside_picnic — Thu, 09 Apr 2026 16:55:14 +0000

This is a classic example of people misapplying the logic of the SaaS world to the AI world. If you're building software to sell, you're in trouble. The people that are finding success in this space are using AI to allow them to solve the problems they used to have to pay for software and hire people to solve.

All of the most promising companies I know today are very small and are leveraging AI to solve physical problems in the real world that just wouldn't be possible with so few people even a few years back.

New comment by roadside_picnic in "Ollama is now powered by MLX on Apple Silicon in preview"

roadside_picnic — Tue, 31 Mar 2026 16:56:21 +0000

> Users don’t care about “privacy”.

I worked for a research focused AI startup that had a strict "no external LLM" policy for code touching our core research.

You're right that the average consumer doesn't care about privacy, but there are many, many users who do. The average consumer also don't have a desktop with GPU or high end Mac Studio, but that doesn't mean there aren't many people working with AI how do have these things.

If we continue to see improvements in running local models, and RAM prices continue to fall as they have in the last month, then suddenly you don't have to worry about token counts any more and can be much more trusting of your agents since they are fully under your control.