Hacker News: ibestvina

New comment by ibestvina in "Show HN: How I topped the HuggingFace open LLM leaderboard on two gaming GPUs"

ibestvina — Mon, 23 Mar 2026 22:08:53 +0000

On a related note - would it be easier, instead of doing a benchmark sweep across the whole NxN set of start-end pairs for which layers to modify, to instead measure cross-correlation between outputs of all layers? Shouldn't that produce similar results?

Churning butter while agents churn code

ibestvina — Fri, 13 Mar 2026 12:08:37 +0000

Article URL: https://ibestvina.substack.com/p/churning-butter-while-agents-churn

Comments URL: https://news.ycombinator.com/item?id=47363337

Points: 2

# Comments: 0

New comment by ibestvina in "Ask HN: How do you cope with the broken rythm of agentic coding?"

ibestvina — Thu, 12 Mar 2026 21:08:06 +0000

I think that "hovering the code" and "broken rhythm" are correlated, but still separate issues. There are parts of my code (e.g. vibe-coded mockups) that I don't look at the code of at all, and I don't care about it. There are others where I check everything in detail (same as if I were to review someones PR), and I think I have a very good grasp of that code.

But the broken rhythm problem persists regardless, and I find that issue to become more and more serious as LLMs are able to work for longer and longer on their own.

It might be that what we're experiencing now is just an uncanny valley, where they're not yet good enough for us managing them to work in similar ways as with other developers, but are good enough to allow us to switch our attention away from them while they work. But that attention span is mostly wasted, as the time between interactions isn't enough to e.g. work on something else, or read a book.

It's a stupid analogy, but currently it's similar to having a bathroom break every couple of minutes, and if this continues, most developers will probably start doomscrolling more and more.

I was wondering recently if there are some productive activities that might fit well into this rhythm, but I haven't found any yet. I guess sourdough baking is one such example, but there's only so much bread you can eat...

New comment by ibestvina in "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

ibestvina — Mon, 16 Feb 2026 10:34:15 +0000

Exactly! I now feel bad for not thinking of that example, thank you.

New comment by ibestvina in "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?"

ibestvina — Mon, 16 Feb 2026 10:22:55 +0000

There's a whole industry of "illusions" humans fail for: optical, word plays (including large parts of comedy), the Penn & Teller type, etc. Yet no one claims these are indicators that humans lack some critical capability.

Surface of "illusions" for LLMs is very different from our own, and it's very jagged: change a few words in the above prompt and you get very different results. Note that human illusions are very jagged too, especially in the optical and auditory domains.

No good reason to think "our human illusions" are fine, but "their AI illusions" make them useless. It's all about how we organize the workflows around these limitations.

New comment by ibestvina in "Lena by qntm (2021)"

ibestvina — Fri, 13 Feb 2026 17:41:28 +0000

This is very "distant" suggestion if you enjoyed Antimemetics, but The Unconsoled by Kazuo Ishiguro is another one of my favourites, and it too explores this idea of unreliable and inconsistent memories, although from a completely different angle.

New comment by ibestvina in "I miss thinking hard"

ibestvina — Wed, 04 Feb 2026 12:09:01 +0000

First, that's quite a sad view of incentives structures. Second, you can't be serious in thinking that "worker worried they might be fired" puts the person in charge closer to the "materials" and more "hands on" with the project.

New comment by ibestvina in "I miss thinking hard"

ibestvina — Wed, 04 Feb 2026 09:45:20 +0000

I don't see why his involvement, explaining to his team how exactly to build a piece, is any different from a developer explaining to an LLM how to build a certain feature, when it comes to the level of "being hands on".

Obviously I am not comparing his final product with my code, I am simply pointing out how this metaphor is flawed. Having "workers" shape the material according to your plans does not reduce your agency.

New comment by ibestvina in "I miss thinking hard"

ibestvina — Wed, 04 Feb 2026 09:21:38 +0000

This makes no sense to me. There are plenty of artists out there (e.g. El Anatsui), not to mention whole professions such as architects, who do not interact directly with what they are building, and yet can have profound relationship with the final product.

Discovering the right problem to solve is not necessarily coupled to being "hands on" with the "materials you're shaping".

New comment by ibestvina in "Best Documentaries You've Ever Seen"

ibestvina — Sun, 28 Jan 2024 00:21:24 +0000

Gates of Heaven (1978)

https://www.imdb.com/title/tt0077598/?ref_=ext_shr

New comment by ibestvina in "Ask HN: What apps have you created for your own use?"

ibestvina — Wed, 13 Dec 2023 19:54:22 +0000

I always send myself messages as bookmarks, notrs, and general "here's an idea I might want to come back to later". It's an awful system when it comes to discoverability (I do it in Messenger so search is... bad), but I still do it as it's the most convinient option at the moment when I need to store that something somewhere.

So I built a simple Telegram bot which automatically stores anything I send as a text embedding into a vector database, and allows me to search over it in that same chat (same process that powers the AI Q&A assistants these days).

If I post a link, it automatically scrapes it and stores text as chunks for better search, extracts text from youtube videos (still wip), turns images into text with the visual models, etc.

One thing I'm unhappy about is not being able to easily edit any notes I search for later, but it's miles ahead of my previous "system". Hopefully I can open source this when I clean it up - if anyone is interested, let me know.

New comment by ibestvina in "A new old kind of R&D lab"

ibestvina — Tue, 12 Dec 2023 19:00:00 +0000

How do you look at hiring "experienced people" vs. "enthusiastic interns" on something like this? More generally, how quickly do you think the team will grow, and what the ratio should be between the "old" and the "young"?

New comment by ibestvina in "Made an app that summarizes recent popular stories from Hacker News"

ibestvina — Thu, 16 Nov 2023 22:28:12 +0000

Could you give an example of content where such critical distilation would be useful to you?

New comment by ibestvina in "Ask HN: What are you passionate about at the moment?"

ibestvina — Mon, 06 Nov 2023 23:03:59 +0000

Interested in this as well - do you already work on this somewhere out in the open?

New comment by ibestvina in "Show HN: For people who takes notes on WhatsApp and Telegram"

ibestvina — Wed, 11 Oct 2023 22:37:53 +0000

As someone who daily sends messages to myself as a form of notetaking, for everything from todos and shoplists to ideas and bookmarks, I've thought about this a lot. The only conclusions I came to are that I keep using messages because (1) they are convinient and synced across devices, (2) app is already installed and frequently opened so it rarely needs to load up making notetaking faster, and (3) I do not have to think about organization.

I believe the last point is most important, and is where your app is misaligned with my reasons for using messages - you seem to have spearate folders and notes, prompting me to think "where does this go?" before writing, which is something I really want to avoid.

New comment by ibestvina in "Show HN: KraspAI Kompass – keep up with new LLMs"

ibestvina — Tue, 12 Sep 2023 08:45:52 +0000

Great work! Any plans to open up an API?

New comment by ibestvina in "Show HN: Marsha – An LLM-Based Programming Language"

ibestvina — Thu, 27 Jul 2023 10:02:25 +0000

I'm always drawn to these types of initiatives, and your whitepaper looks (at least with my very limited knowledge of the domain) interesting.

What I am always wondering, and maybe you can give some details here, is the following: isn't the fact that regulations are in natural language, with all its ambiguity, a necessary requirement to have the system operate without being fully specified?

In other words, wouldn't any kind of strict DSL force us to think through all the edge cases that might possibly arise, instead of dealing with them when they do arise, which is basically what the judiciary is for? And isn't that a price too high to implement these kinds of systems?

New comment by ibestvina in "Show HN: Danswer – Open-source question answering across all your docs"

ibestvina — Tue, 11 Jul 2023 07:32:12 +0000

So if the product from OP used Azure OpenAI, it would be okay? You say "companies happily pay up", but the pricing is exactly the same (source: my company is paying for both APIs).

It's been quite clear for some time that, between OAI and MS, they very neatly split their market: OAI handles the early development and direct customers, and MS handles enterprises. It would require OAI to be a much bigger company than it is right now to properly handle enterprises, and MS already has all that infrastructure (legal, support, etc.). Seems like a sensible setup to me, and I don't see the need for enterprises to run open source models themselves (in this context - of course I see the value in all the other respects about lock-in and specialization), especially if they are already on Azure.

New comment by ibestvina in "Pandas AI"

ibestvina — Thu, 25 May 2023 20:27:27 +0000

That (LLM -> SQL -> Pandas using PandaSQL) was my approach with Datasloth [1] some time ago. It then grew into Qluent [2] which I now work on full time, along the lines of what the sibling comment wrote about managers being able to analyze their data without coding.

[1] https://github.com/ibestvina/datasloth

[2] https://qluent.com/

New comment by ibestvina in "Clifford Stoll beat the Russians, then made useless, wondrous objects (2016)"

ibestvina — Tue, 21 Mar 2023 09:40:18 +0000

Thank you for the information. I'm sure there are a lot of people who'd be interested in the full 1 hour talk, if you ever find time to film it.