Hacker News: bird0861

New comment by bird0861 in "A tool that removes censorship from open-weight LLMs"

bird0861 — Sun, 08 Mar 2026 02:57:31 +0000

Just want to add to this that with custom calibration data it's incredibly effective and surgical, you can get VERY LOW KL divergence this way. Many MoEs are supported too, it's actively maintained.

New comment by bird0861 in "Training students to prove they're not robots is pushing them to use more AI"

bird0861 — Sun, 08 Mar 2026 02:11:28 +0000

A good general rule in life is that people get one chance to show why they're not worth communicating with, and that's it.

New comment by bird0861 in "Training students to prove they're not robots is pushing them to use more AI"

bird0861 — Sun, 08 Mar 2026 02:07:30 +0000

It's very easy to convince yourself it's true and then hand out punishment like it was on sale.

New comment by bird0861 in "Claude Code is being dumbed down?"

bird0861 — Fri, 13 Feb 2026 16:57:39 +0000

add that to your claude.md

New comment by bird0861 in "Anthropic's original take home assignment open sourced"

bird0861 — Thu, 22 Jan 2026 19:48:48 +0000

Hilarious that this got a downvote, hello Satya!

New comment by bird0861 in "Ask HN: How are you automating your coding work?"

bird0861 — Thu, 22 Jan 2026 18:26:48 +0000

With respect to the first issue you raise, I would perhaps start including prompts in comments. This is a little sneaky sure. And maybe explicitly putting them in a markdown would be better. But there's the risk that markdown won't be loaded. Perhaps it might be possible to inject the file into context via a comment, I've never tried that though and I doubt every assistant will act in a consistent way. The comment method is probably the best bet IMO.

Forgive me because this is a bit of a tangential rant on the second issue, but Gemini Pro 3 was absolutely heinous about this so I cancelled my sub. I'm completely puzzled what it's supposed to be good for.

To your third issue, you should maybe consider building a dataset from those interactions... you might be able to train a LoRA on them and use it as a first pass before you lift a finger to scroll through a PR.

I think a really big issue is that there is a lack of consistency in the use of AI for SWE. There are a lot of models and poorly designed agents/assistants with really unforgivable performance and people just blindly using them without caring about the outputs amounts to something that is kind of Denial-of-Service-y and I keep seeing this issue be raised over and over again.

At the risk of sounding elitist, the world might be a better place for project maintainers when the free money stops rolling into the frontier labs to offer anyone and everyone free use of the models...never give a baby powertools and so on.

New comment by bird0861 in "Anthropic's original take home assignment open sourced"

bird0861 — Wed, 21 Jan 2026 07:00:03 +0000

Which Gemini model did you use? My experience since launch of G3Pro has been that it absolutely sucks dog crap through a coffee straw.

New comment by bird0861 in "The insecure evangelism of LLM maximalists"

bird0861 — Thu, 15 Jan 2026 10:03:54 +0000

Rust.

New comment by bird0861 in "The Future of Veritasium [video]"

bird0861 — Wed, 24 Dec 2025 18:37:13 +0000

Aren't they one of the worst physics channels apart from just outright fraudulent/fringe grifters like ElectricUniverse? Seems like every other week or so I see someone detail patiently why they have incorrectly explained something. I think the "[particles, like photons] take all possible paths" fiasco might be the latest one I can recall.

New comment by bird0861 in "Some Epstein file redactions are being undone"

bird0861 — Wed, 24 Dec 2025 18:25:59 +0000

Typical quality of The Guardian unfortunately. Don't read their energy reporting if you're at all literate about any of those topics. Any time they do a story on fusion I just about have an embolism.

New comment by bird0861 in "Contrails Map"

bird0861 — Sat, 20 Dec 2025 14:59:44 +0000

The water is actually ice crystals and the ice crystals form around the soot.

New comment by bird0861 in "SoundCloud has banned VPN access"

bird0861 — Mon, 15 Dec 2025 09:04:34 +0000

stares in Lidarr

New comment by bird0861 in "I ignore the spotlight as a staff engineer"

bird0861 — Sat, 06 Dec 2025 04:30:57 +0000

You seem like the type of coworker I would accept less pay to work with. Actually at a crossroads right now, did my research on my prospects and have narrowed it down to two places I most expect to be surrounded by good coworkers and managers. Cheers.

New comment by bird0861 in "Thoughts on Go vs. Rust vs. Zig"

bird0861 — Sat, 06 Dec 2025 03:11:15 +0000

I've been asking around the last week about Go vs Elixir vs Zig, I'd love to get feedback here too. I only have time for one and I'm looking for something that can replace a lot of the stuff I do with Python. I don't have time to wait for Mojo.

New comment by bird0861 in "Writing a good Claude.md"

bird0861 — Mon, 01 Dec 2025 06:22:19 +0000

I fully agree with this POV but for one detail; there is a problem with sunsetting frontier models. As we begin to adopt these tools and build workflows with them, they become pieces of our toolkit. We depend on them. We take them for granted even. And then the model either changes (new checkpoints, maybe alignment gets fiddled with) and all of the sudden prompts no longer yield the same results we expected from them after working on them for quite some time. I think the term for this is "prompt instability". I felt this with Gemini 3 (and some people had less pronounced but similar experience with Sonnet releases after 3.7) which for certain tasks that 2.5Pro excelled at..it's just unusable now. I was already a local model advocate before this but now I'm a local model zealot. I've stopped using Gemini 3 over this. Last night I used Qwen3 VL on my 4090 and although it was not perfect (sycophancy, overuse of certain cliches...nothing I can't get rid of later with some custom promptsets and a few hours in Heretic) it did a decent enough job of helping me work through my blindspots in the UI/UX for a project that I got what I needed.

If we have to perform tuning on our prompts ("skills", agents.md/claude.md, all of the stuff a coding assistant packs context with) every model release then I see new model releases becoming a liability more than a boon.

New comment by bird0861 in "Writing a good Claude.md"

bird0861 — Mon, 01 Dec 2025 04:49:41 +0000

That study is garbo and I suspect you didn't even read the abstract. Am I right?

New comment by bird0861 in "Gemini CLI tips and tricks for agentic coding"

bird0861 — Thu, 27 Nov 2025 07:12:06 +0000

I can't emphasize this enough, it doesn't matter how good a model is or what CLI I'm using, use git and chroot (at the least, container is easier though).

Always make the agent write a plan first and save it to something like plan.md, and tell it to update the list of finished tasks in status.md as it finishes each task from plan.md and to let you review the change before proceeding to next task.

New comment by bird0861 in "We ran over 600 image generations to compare AI image models"

bird0861 — Tue, 11 Nov 2025 23:55:01 +0000

Check out Mask Banana - you might have better luck with using masks to get image models to pay attention to what you want edited.

New comment by bird0861 in "We ran over 600 image generations to compare AI image models"

bird0861 — Tue, 11 Nov 2025 23:53:26 +0000

SDXL and FLUX models with LoRAs can and do vastly outperform at tons of things singular big models can't or won't do now. Various subreddits and civitAI blogs describe comfyui workflows and details on how to maximize LoRA effectiveness and are probably all you need for a guided tour of that space.

This is not my special interest though but the DIY space is much more interesting than the SaaS offerings; this is something about generative AI more generally that also holds, the DIY scene is going to be more interesting.

New comment by bird0861 in "Baby Shoggoth Is Listening"

bird0861 — Tue, 11 Nov 2025 22:50:06 +0000

Pretending 16 samples is authoritative is absolutely hilarious and wild, copium this pure could kill someone. Also working on a codebase you already know biases results in the first place -- they missed out on what has become a cornerstone of this stuff for AISWE people like me: repo tours; tree-sitter feeds the codebase to the LLM and I get to find all the stuff in the code I care about by either a single well formatted meta prompt or by just asking questions when I need to.

I'll concede one thing to the authors of the study, Claude Code is not that great. Everyone I know has moved on since before July. I personally am hacking on my own fork of Qwen CLI (which is itself a Gemini fork) and it does most of what I want with the models of my choice which I swap out depending on what I'm doing. Sometimes they're local on my 4090 and sometimes I use a frontier or larger openweights model hosted somewhere else. If you're expecting a code assistant to drop in your lap and just immediately experience all of its benefits you'll be disappointed. This is not something anyone can offer without just prescribing a stack or workflow. You need to make it your own.

The study is about dropping just 16 people into a tooling they're unfamiliar with, have no mechanical sympathy for, and aren't likely to shape and mold it to their own needs.

You want conclusive evidence go make friends with people who hack their own tooling. Basically everyone I hang out with has extended BMAD, written their own agents.md for specific tasks, make their own slash commands, "skills" (convenient name and PR hijacking of a common practice but whatever, thanks for MCP I guess). Literally what kind of dev are you if you're not hacking your own tools???

You got four ingredients here you have to keep in mind when thinking about this stuff: the model, the context, the prompt, and the tooling. If you're not intervening to set up the best combination of each for each workflow you are doing then you are just letting someone else determine how that workflow goes.

Universal function approximators that can speak english got invented and nobody wants to talk to them is not the scifi future I was hoping for when I was longing for statistical language modeling to lead to code generation back in 2014 as a young NLP practitioner learning Python for the first time.

If you can't make it work fine, maybe it's not for you, but I would probably turn violent if you tried to take this stuff from me.