<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: musebox35</title><link>https://news.ycombinator.com/user?id=musebox35</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 18:33:07 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=musebox35" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by musebox35 in "Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable"]]></title><description><![CDATA[
<p>True enough, but that is true for all the products I buy. I do not expect to control every product I own. For some I prefer to have more control, for others I just need something that works out of the box. There is always an initial bias for trust when you buy something otherwise you would not spend your hard earned money on it.<p>“Fool me once, shame on you. Fool me twice, shame on me. Fool me three times, shame on both of us.” -- S. King</p>
]]></description><pubDate>Thu, 11 Jun 2026 09:38:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48488195</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48488195</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48488195</guid></item><item><title><![CDATA[New comment by musebox35 in "Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable"]]></title><description><![CDATA[
<p>It is much more reasonable to do it in a visible / flagged way. At least you have visibility over the quality of service you get as a customer.<p>Silent treatment is a breach of trust, what you buy changes depending on the context based on the goals of the producer. It is like your computer silently blocking ads from competitors at the hardware level, which is crazy. I think they erred on the wrong side of things due to IPO pressure.<p>At least there is competition from multiple companies. Still it is best to have personal benchmarks for the domain you are working on to have a real evaluation of the value you get for the money/time you spent on these products. Without trust, that might be the only way forward to keep the companies honest.<p>This happens eventually in all sectors, a good magazine/website that does independent product evaluation is priceless. Sadly, the new ad-driven internet decimated those that worked great in the 90/00s. Still there are independent blogs that does some evaluation and that is better than nothing.</p>
]]></description><pubDate>Thu, 11 Jun 2026 09:16:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=48488058</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48488058</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48488058</guid></item><item><title><![CDATA[New comment by musebox35 in "Cybersecurity researchers aren't happy about the guardrails on Anthropic's Fable"]]></title><description><![CDATA[
<p>I work on open source text-to-image finetuning of open source models like zimage/flux2 klein 4b and inference time latency optimization. The moment I read the silent treatment, I went ahead and cancelled my subscription too since I would never know whether the models they launch will silently corrupt my output. This is totally unacceptable. There is a big difference between silent / flagged if you are doing ml research but not at frontier capability.<p>This goes on to show that
- All that interpretability / safety research they are doing can also be weaponized against customers (steering vectors, intent classification, ...) in the name of safety from malicious actors.
- If they deem profitable, they might nerf to original model and its training data for ml research at a bulk scale and then they won't even have to announce it so long as the overall benchmark score stays high enough.<p>As the IPOs get closer, they can do whatever they want to assure the investors that they have a moat that can not be crossed over by their own products. Considering this affects all ML researchers/students at universities, smaller scale research labs, this is just "cutting the branch you are sitting on".</p>
]]></description><pubDate>Thu, 11 Jun 2026 06:34:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48486961</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48486961</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48486961</guid></item><item><title><![CDATA[New comment by musebox35 in "Rich Sutton on AI creativity and discovery"]]></title><description><![CDATA[
<p>SFT + RL connection to model/hypothesis search is insightful. Brute force / scalable search is where Sutton's Bitter Lesson also points to. Once your search domain is small compared to your search budget, that makes a lot of sense.<p>If I get your meaning right, SFT creates the right inductive bias so that the RL search + reward guidance does the trick.<p>For novel discovery, the question might then be whether the inductive bias builds a strong enough prison so no new discovery is possible by RL or if the search can escape the boundaries set by SFT given enough randomization and the right reward function.<p>I know that RL is usually not performed at inference time, but in-context learning mechanisms might be developed by RL to discover at test time. Edit: I would love to hear if that actually happens or not, like new induction heads (<a href="https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html" rel="nofollow">https://transformer-circuits.pub/2022/in-context-learning-an...</a>) forming during RL. I really have no idea.</p>
]]></description><pubDate>Wed, 10 Jun 2026 16:31:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=48478813</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48478813</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48478813</guid></item><item><title><![CDATA[New comment by musebox35 in "Rich Sutton on AI creativity and discovery"]]></title><description><![CDATA[
<p>I understand the skepticism. I am worried about the implications of AI as well. The deeper issue at stake is that the depth of scientific knowledge has been increasing for a very long time. Now you get to have a PhD in esoteric subproblems and that slows down research especially if the discoveries require depth in multiple subdomains. Socially and economically training people in every combinatorial combination of subfields at the required depth may not be possible. I am especially interested in two problems to be resolved and do not care if an AI scientist performed the discovery. It will be humbling, but totally worth it:<p>- Fusion (a clean sustainable form): Without this I think we are heading in a very wrong direction, whether it is conflict or climate change does not matter. Everyone is aware of this and instinctively afraid of the implied loss of quality+quantity of life.<p>- Cure for Cancer: It is a world wonder even in Civ. I and for good reason. As a father of a teenager, every time I hear a story of someone losing a parent/child I cringe. We have to accept this as a reality of life until a proper/generic cure is found that eliminates the most common offenders.<p>I am skeptical that we will have AGI anytime soon and I think the social aspects will help balance the technical developments even it becomes a reality (Three laws, A Butlerian uprising, you name it).<p>Chess bots can beat grandmasters, but I have a friend who takes his son to tournaments. Humans are still playing chess, kids in the same tournament with grand masters. We have to have faith in the humanity, or all else will not matter.<p>And I will definitely keep playing Factorio even if AGI comes to pass ;-)</p>
]]></description><pubDate>Wed, 10 Jun 2026 07:33:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48472786</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48472786</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48472786</guid></item><item><title><![CDATA[New comment by musebox35 in "Rich Sutton on AI creativity and discovery"]]></title><description><![CDATA[
<p>The most successful applications like coding are not the result of pure LLM/generative modeling. They come from closing the loop with an agentic harness. The generate-test-selectively refine loop is the core modality of scientific work. An LLM + RL with Verifiable Rewards + feedback from compiler/terminal runs mimics this process to a great extend.<p>This is Fisher/Box feedback loop (<a href="https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writings/Boxonmaths.pdf" rel="nofollow">https://www-sop.inria.fr/members/Ian.Jermyn/philosophy/writi...</a>) implemented on a modern computational system. LLM is just a component. I wish Sutton had commented on this fuller picture of what we have now instead of commenting just on the LLM/Backprop side of things. I am honestly curious of whether such a loop can at least partially automate discovery.<p>There are more elements to discovery though. It is still not clear where the initial working model/hypothesis comes from or how the updates are selected (unless it is just parameter induction). I recently read about Hanson's Patterns of Discovery which aims in that direction. I have still not read it, but I am curious if it has any mechanistic clues.</p>
]]></description><pubDate>Wed, 10 Jun 2026 06:48:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48472388</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48472388</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48472388</guid></item><item><title><![CDATA[New comment by musebox35 in "How LLMs work"]]></title><description><![CDATA[
<p>I was about to post your last point / quote. Going multigpu is relatively not so though but once you go multi-node you have distributed storage/io/compute system which is highly non trivial. Add that the long training times now you have robustness/fault-tolerantness concerns with hardware failures and restarts. Today’s training systems are engineering marvels.</p>
]]></description><pubDate>Sat, 06 Jun 2026 12:21:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=48424311</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48424311</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48424311</guid></item><item><title><![CDATA[New comment by musebox35 in "Artificial intelligence is not conscious – Ted Chiang"]]></title><description><![CDATA[
<p>Not understanding the whole does not completely remove an ability to analyze. An interesting direction is individuality and having a notion of self. It is difficult to demarcate the individual for a model given how much the system prompt and the fine tunes / distills affect the behavior. So with computational intelligence in its current form either we can not talk of an individual or we can have a nearly infinite set of individuals corresponding to variations of the context window including the system prompt. So I do not think it can have the same kind of consciousness as biological embodied individuals. It might have something else or maybe embodied robots will one day have a similar consciousness in a similar sense to the one we think we have.</p>
]]></description><pubDate>Thu, 04 Jun 2026 05:07:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=48394176</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48394176</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48394176</guid></item><item><title><![CDATA[New comment by musebox35 in "Nobody cracks open a programming book anymore"]]></title><description><![CDATA[
<p>I think the complexity issue in science and engineering has also been growing for some time beyond what can be analyzed/designed by a person or a group with conventional software and math. Wolfram argues that some processes are so complex, only a computational method can solve them. If that is the case, AI might be the only path to help us in designing and discovering novel tech / science. It might be the bicycle for the mind that Jobs envisioned.</p>
]]></description><pubDate>Tue, 26 May 2026 08:31:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=48276826</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48276826</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48276826</guid></item><item><title><![CDATA[New comment by musebox35 in "Gemini 3.5 Flash"]]></title><description><![CDATA[
<p>The cutoff date is early 2025 so make sure to enable web search when experimenting. I was expecting something more recent, took a while to notice this.</p>
]]></description><pubDate>Wed, 20 May 2026 16:56:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=48210657</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48210657</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48210657</guid></item><item><title><![CDATA[New comment by musebox35 in "Anthropic acquires Stainless"]]></title><description><![CDATA[
<p>Thanks, that sounds like a good direction to try.</p>
]]></description><pubDate>Tue, 19 May 2026 19:08:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=48197908</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48197908</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48197908</guid></item><item><title><![CDATA[New comment by musebox35 in "AI eats the world (Spring 26) [pdf]"]]></title><description><![CDATA[
<p>Most of your analysis I can easily relate to except “There is evidence that the Chinese models are falling further behind, not gaining.” Where is that evidence? Deepseekv4 claims to be trailing front runners by six months. I read people agreeing with this. I watched Eric Schmidt to recently make similar comments. Is he just scaremongering? Why do you claim they are falling behind?</p>
]]></description><pubDate>Tue, 19 May 2026 10:11:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=48191434</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48191434</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48191434</guid></item><item><title><![CDATA[New comment by musebox35 in "The last six months in LLMs in five minutes"]]></title><description><![CDATA[
<p>I watched the last one S5:E17 What jobs are AI jobs and I think it gives the right framing to think about this. It is not prescriptive, it does not give a list which is smart. The job title might be the same but the actual role might have different context so the best is to have the right frame to explore your particular situation.</p>
]]></description><pubDate>Tue, 19 May 2026 09:04:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48190994</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48190994</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48190994</guid></item><item><title><![CDATA[New comment by musebox35 in "Anthropic acquires Stainless"]]></title><description><![CDATA[
<p>I am exploring ways to document the design for the agent to read and update. What makes it difficult is the lack of structure. Spec writing is not my core skill. Schemas and APIs are easier, there are declarative ways to document them. Runtime concepts and workflows have less structure and writing prose seems so unstructured for my taste. Formal languages are too rigid. But I could not find a better way.</p>
]]></description><pubDate>Tue, 19 May 2026 08:57:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48190944</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48190944</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48190944</guid></item><item><title><![CDATA[New comment by musebox35 in "The last six months in LLMs in five minutes"]]></title><description><![CDATA[
<p>I totally agree. I loved coding because of its closed feedback loop. Since last November, I also delegated it mostly to agents. Now I concentrate more on the design part, which is not the same. However, you move with the times and hope something else will become exciting. I do not know a more worthwhile and satisfying way than computing to spend my work hours.</p>
]]></description><pubDate>Tue, 19 May 2026 08:37:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48190776</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48190776</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48190776</guid></item><item><title><![CDATA[New comment by musebox35 in "Anthropic acquires Stainless"]]></title><description><![CDATA[
<p>Could you briefly describe your workflow for doing that or give a pointer to a blog you wrote/like that aligns with the process? Thanks in any case, happy designing ;-)</p>
]]></description><pubDate>Tue, 19 May 2026 08:28:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=48190714</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48190714</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48190714</guid></item><item><title><![CDATA[New comment by musebox35 in "The last six months in LLMs in five minutes"]]></title><description><![CDATA[
<p>Please see Ben Evans’ podcast on a good take on this. Coding is just one of the task you do in your job, it is not the job or at least it probably is not. You do not get paid to code, you get paid to make a set of decisions that create value to the company. If this is automated then yes sadly your salary is not justified.</p>
]]></description><pubDate>Tue, 19 May 2026 05:05:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48189371</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48189371</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48189371</guid></item><item><title><![CDATA[New comment by musebox35 in "The Emacsification of Software"]]></title><description><![CDATA[
<p>I wonder what will be the effect of this on open source software. I do not mean the technical aspects, easy of coding, documenting, explosion of PRs, .. Social aspects scare me a bit more. If everyone is building their own version, will people stick around to contribute to the same project for years. I guess the dedicated ones will do so maybe it is a good thing, a filtering out of the disinterested.</p>
]]></description><pubDate>Thu, 14 May 2026 10:43:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=48133542</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48133542</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48133542</guid></item><item><title><![CDATA[New comment by musebox35 in "Three Inverse Laws of AI"]]></title><description><![CDATA[
<p>Debating how not to use AI will not get anyone anywhere since negative framing almost never works with humans (it also does not work with llms). Let’s concentrate on how to build  closed loop systems that verify the llm output, how to manage context, and how to build failsafes around agentic systems and then and only then we might start to make progress.</p>
]]></description><pubDate>Tue, 05 May 2026 17:07:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=48025358</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=48025358</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48025358</guid></item><item><title><![CDATA[New comment by musebox35 in "Specsmaxxing – On overcoming AI psychosis, and why I write specs in YAML"]]></title><description><![CDATA[
<p>Not all parts of the code is equal in this respect. Those parts pertaining to the user visible portion (API of a library, command args of a CLI, UI of a GUI/TUI app, endpoints in a web service, etc.) are closely related to the spec. The rest is more fluid as long as it does not change user visible behavior. The choices still affect maintenance and debugging costs, so there is some pressure to not YOLO these portions. I think the most difficult design decisions relate to how to separate the two and how to ensure a smooth evolution of both user facing and programmer facing design decisions.<p>What is different now is that maintainability and debugging design decisions were made w.r.t. human coders or teams in the past which is not necessarily the case anymore. Should we just specify the API and let agents figure the rest or do we still want to control the rest to ensure maintenance and security? A year ago I definitely thought so. Now it is more murky as the agents are faster browsers of codebases and can explore runtime effects faster than I can type and parse output. Strongest empirical observations depend on the runtime behavior so they have an edge there.</p>
]]></description><pubDate>Sun, 03 May 2026 12:16:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47996179</link><dc:creator>musebox35</dc:creator><comments>https://news.ycombinator.com/item?id=47996179</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47996179</guid></item></channel></rss>