<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ycombiredd</title><link>https://news.ycombinator.com/user?id=ycombiredd</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 18 Apr 2026 20:54:59 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ycombiredd" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ycombiredd in "Apideck CLI – An AI-agent interface with much lower context consumption than MCP"]]></title><description><![CDATA[
<p>What's interesting to me is that while it was obvious to all of us who came to think in the Unix Way, that insofar as composability, usage discoverability, and gobs of documentation in posts and man pages that are hugely represented in training corpora for LLMs, that the CLI is a great fit for LLM tool use, it seems only a recent trend to acknowledge this (and also the next hype wave, perhaps.)<p>Also interesting that while the big vendors are following this trend and are now trying to take a lead in it, they still suggest things like "but use a JSON schema" (the linked article does a bit of the same - acknowledging that incremental learning via `--help` is useful AND can be token-conserving (exception being that if they already "know" the correct pattern, they wouldn't need to use tokens to learn it, so there is a potential trade-off), they are also suggesting that LLMs would prefer to receive argument knowledge in json rather than in plain language, even though the entire point of an LLM is for understand and create plain language. Seemed dubious to me, and a part of me wondered if that advice may be nonsense motivated by desire to sell more token use. I'm only partially kidding and I'm still dubious of the efficacy.<p>* Here's a TL;DR for anyone who wants to skip the rest of this long message: I ran an LLM CLI eval in the form of a constructed CTF. Results and methodology are in the two links in the section linked: 
<a href="https://github.com/scottvr/jelp?tab=readme-ov-file#what-else" rel="nofollow">https://github.com/scottvr/jelp?tab=readme-ov-file#what-else</a><p>Anyhow... I had been experimenting with the idea of having --help output json when used by a machine, and came up with a simple module that exposes `--help` content as json, simply by adding a `--jelp` argument to any tool that already uses argparse.<p>In the process, I started testing, to see if all this extra machine-readable content actually improved performance, what it did to token use, etc.  While I was building out test, trying to settle on legitimate and fair ways to come to valid conclusions, I learned of the OpenCLI schema draft, so I altered my `jelp` output to fit that schema, and set about documenting the things I found lacking from the schema draft, meanwhile settling to include these arg-related items as metadata in the output.<p>I'll get to the point. I just finished cleaning the output up enough to put it in a public repo, because my intent is to share my findings with the OpemCLI folks, in hopes that they'll consider the gaps in their schema compared to what's commonly in use, but at the same time, what came as a secondary thought in service of this little tool I called "jelp", is a benchmarking harness (and the first publishable results from it), the to me, are quite interesting and I would be happy if others found it to be and added to the existing test results with additional runs, models, or ideas for the harness, or criticism about the validity of the method, etc.<p>The evaluation harness uses constructed CLI fixtures arranged as little CLI CTF's, where the LLMs demonstrate their ability to use an unknown CLI be capturing a "flag" that they'll need to discover by using the usage help, and a trail of learned arguments.<p>My findings at first confirmed my intuitions, which was disappointing but unsurprising. When testing with GPT-4.1-mini, no manner of forcing them to receive info about the CLI via json was more effective than just letting them use the human-friendly plain English output of --help, and in all cases the JSON versions burned more tokens. I was able to elicit better performance by some measurements from 5.1-mini, but again the tradeoff was higher token burn.<p>I'll link straight to the part of the README that shows one table of results, and contains links to the LLM CLI CTF part of the repo, as well as the generated report after the phase-1 runs; all the code to reproduce or run your own variation is there (as well as the code for the jelp module, if there is any interest, but it's the CLI CTF eval that I expect is more interesting to most.)<p><a href="https://github.com/scottvr/jelp?tab=readme-ov-file#what-else" rel="nofollow">https://github.com/scottvr/jelp?tab=readme-ov-file#what-else</a></p>
]]></description><pubDate>Wed, 18 Mar 2026 08:28:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=47423111</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47423111</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47423111</guid></item><item><title><![CDATA[New comment by ycombiredd in "Forget Flags and Scripts: Just Rename the File"]]></title><description><![CDATA[
<p>This just gave me a flashback to something I made a long time ago, which was a tool to create a file that was a named pipe - the contents of which were determined by the command in its filename. If I remember correctly (and its embedded man page would seem to validate this memory), the primary impetus for making this tool was to have dynamically generated file content for purposes of enabling a remote process execution over server daemons that did not explicitly allow for it, such as finger, etc, but were intended only to read a specific static file.<p><a href="https://web.archive.org/web/19991109163128/http://www.dfw.net:80/~scottvr/program-file" rel="nofollow">https://web.archive.org/web/19991109163128/http://www.dfw.ne...</a><p>Using named pipes in this manner also enabled a hackish method to create server-side dynamic web  content by symlinking index.html to a file created in this manner, which was a secondary motivator, which seems kinda quaint and funny now, but at that time, it wasn't very long after just having finally decommed our gopher server, so fingerd was still a thing, Apache was fairly new, and I may still have been trying to convince management that the right move was not from ncsa httpd to Netscape Enteprise Server, but to Apache+mod_ssl.  RSA patent licensing may still have been a thing too. Stronghold vaguely comes to mind, but I digress.<p>Yeah, programs that do stuff based on filename, like busybox. Oh, and this long forgotten artifact this article just reminded me of that I managed to find in the Wayback Machine, a tool to mknod a named pipe on a SunOS 4.1.4 machine, to get server-side dynamic content when remotely accessing a daemon that was supposed to return content from a single static file. Ah, memories.</p>
]]></description><pubDate>Wed, 18 Mar 2026 06:48:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47422335</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47422335</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47422335</guid></item><item><title><![CDATA[New comment by ycombiredd in "A new Oracle Solaris Common Build Environment (CBE) release"]]></title><description><![CDATA[
<p>As a former SUN sysadmin/netadmin (from SunOS 4.1.4 days), I vaguely remember the Solaris releases after 2.5.1, maybe to another re-version/branding called Solaris 7, maybe? And then not paying any attention after Oracle absorbed it. I was honestly surprised enough by this headline to click TFA, simply because I did not think Solaris even existed anymore.</p>
]]></description><pubDate>Tue, 10 Mar 2026 22:36:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=47329625</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47329625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47329625</guid></item><item><title><![CDATA[New comment by ycombiredd in "Drosophila Fly Brain Emulation"]]></title><description><![CDATA[
<p>I apologize in advance, but this is my one pre-existing contribution to the world that mentions mapping of Drosophilia brains, and having just used it in its intended "copypasta" role this morning, I was excited to coincidentally see this Emulation link on HN this morning.<p><a href="https://gist.github.com/scottvr/f968b65bedf7a4635a4cdd643628dd05" rel="nofollow">https://gist.github.com/scottvr/f968b65bedf7a4635a4cdd643628...</a><p>It is ludicrously-escalating grotesque sci-fi (intended for sending over SMS), so you can safely skip it if that's not your cup of tea.</p>
]]></description><pubDate>Tue, 10 Mar 2026 16:40:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47325656</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47325656</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47325656</guid></item><item><title><![CDATA[New comment by ycombiredd in "Show HN: Mount any OpenAPI/Swagger API (or non-API JSON) as a local filesystem"]]></title><description><![CDATA[
<p>If this sounds interesting, and you have a moment and a favorite API, I'd appreciate your experience testing it out, or if the README needs more detail, etc.<p>I have been solely using an old X86 Darwin MacBook Air for this, so that's the extent of the platform(s) tested. (Writing this has caused me to realize I might want to document the process of installing the FUSE driver on a mac, but I do link to the MacFUSE webpage, which is probably more broadly useful than my experience on this dated laptop.)<p>Anyway, I actually went searching before posting because it hadn't occurred to me that this might have already been done somewhere else (yeah, one might think searching for an existing solution would be a <i>first</i> thing...) and was happy to see that it doesn't seem to have been exactly put out there before, while also being a bit surprised to see that so many spiritually-related FUSE implementatations have been created since last I had occasion to do anything with FUSE. It might be the case that if I have done this correctly, very-specific niche FUSE implementations will be unneeded, as my hope is that apifusefs is capable of handling any swagger-type, OpenAPI-spec API.<p>(Eventually, anyway. For example, there exists a massive openapi.json for the GitHub API, but due to the way the root endpoint refers to other endpoints, and each has its own context and auth-requirements, this initial release of apifusefs isn't magic for api.github.com/ even with the spec file, and I had to mount specific endpoints by their URL given in the response to "GET /", to varying degrees of success or failure, which is what caused me to add the --json-file mode,so I could just redirect the output of a curl request to a file and test with that.)<p>That said, it does now support a variety of ways to pass authentication tokens, so my hope is that if anyone here has an API to try it out against, that it will work for you without hassle.</p>
]]></description><pubDate>Thu, 05 Mar 2026 02:23:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=47256696</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47256696</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47256696</guid></item><item><title><![CDATA[Show HN: Mount any OpenAPI/Swagger API (or non-API JSON) as a local filesystem]]></title><description><![CDATA[
<p>APIFUSEfs. It does what the ShowHN title says it does. (I had originally named it "apifuse", but find there is a SaaS by that name, so renamed the repo apifusefs, but still need to rename it internally and in docs. There is no association between apifusefs and the Saas known as  "apifuse".)<p>Requires libFUSE (MacOS/Linux) and Python. It was made because I am a CLI kind of guy, and I live in a terminal so being able to fallback on muscle-memory shell instincts  (for loops, piping commands and I/O, etc) without having to do a bunch of curling, or browsering, postman, etc.<p>It occurs to me that with all the hype lately about AI agent tool use, that it could be useful for this purpose as well, since the Agent would not need any special skill for it; the data just becomes navigable and readable as with any file in a directory tree.<p>I'd be interesting in hearing if you find this  useful, dumb, broken, lacking an obvious feature, or anything else you might have to say about it.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47256567">https://news.ycombinator.com/item?id=47256567</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 05 Mar 2026 02:03:27 +0000</pubDate><link>https://github.com/scottvr/apifusefs/blob/main/README.md</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47256567</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47256567</guid></item><item><title><![CDATA[Show HN: A GFM+GF-MathJax/Latex HTML formatting adventure]]></title><description><![CDATA[
<p>I think this is apropos of the "Show HN" tag, as the post is explanatory and the entire codebase this little side-story use case discussed in TFA  is in the repo and free to use. (I'd be pleased if you did!)<p>In the post, as I tried to capture in the title submitted, I outline my journey of exploration, when I became  determined to make GitHub-Flavored Markdown display my text, with color, style and alignment of my choosing,  which as I discovered after setting out to do so, the inability to do such a thing outside of fenced blocks with pre-defined syntax highlighting is a well-known condition, which is met with "works as intended" response because, well, GitHub doesn't want their repos looking like MySpace or Geocities or presenting security risk exposure by allowing arbitrary html/CSS styling. Sure, I <i>should</i> have used GitHub Pages to build a page from my Markdown using Jekyll, which is a supported way to control the styling of your own documents in your repo, but where's the fun in that?<p>The linked post documents the workaround I arrived at, which became an output target format that nobody has ever asked for from my ASCII line-Art diagramming tool. I thought some here might appreciate the documentation of "wasting my time so you don't have to" on a technical solution for a problem I probably just shouldn't have cared about and moved on.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47203584">https://news.ycombinator.com/item?id=47203584</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Sun, 01 Mar 2026 04:05:24 +0000</pubDate><link>https://github.com/scottvr/phart/blob/main/docs/GHM-LATEX.md</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=47203584</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47203584</guid></item><item><title><![CDATA[New comment by ycombiredd in "Show HN: Multimodal perception system for real-time conversation"]]></title><description><![CDATA[
<p>"It's a test - designed to provoke an emotional response. "<p>I was going to follow this with something like "except the role of analyzing the emotional response is reversed", and then I wanted to expound with an "ooh but.. wait, there's another metaphor here since ..." but thought I've already potentially approached "spoiler alert" territory so I'll just stop there.  Those who know the reference I am replying to will know; those who don't, well, don't google any of this or its parent cuz <i>spoiler alert</i></p>
]]></description><pubDate>Wed, 11 Feb 2026 12:44:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=46974261</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46974261</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46974261</guid></item><item><title><![CDATA[New comment by ycombiredd in "Show HN: Multimodal perception system for real-time conversation"]]></title><description><![CDATA[
<p>You cause me to have an additional thought on the topic which is that as much as I expressed a sense of dread at the inevitable use of this sort of tech in hiring pipelines (not by agents, necessarily, but as a sort of HUD overlay on a video call between humans was my initial envisioned use case.) But I suppose that just as the AI interviewer bots that I thus far have refused to engage with will inevitably be unavoidable if one is on the job hunt, so will the use of this sort of multi-modal sentiment analysis be inevitable. (Same with the justice system use case you referenced in your metaphor, and probably therapists and such as well will follow.)<p>As such, I wish you the best of luck with this project - earnestly so - because if, as I suggest, it is inevitable... we want such a system to be as good as possible.<p>An aside: another inevitable use case just came to mind - that of the cheap, shoddily implemented and poorly tested (along with the insecure, surveillance-adjacent products that will proliferate) <i>kid's toys with embedded AI</i> and the sardonically-humorous privacy mishaps and unintended actions from such low-quality implementation toys being sold (see: the current LLM-enabled kids toys currently popping up routinely at retailers.) ha! Sorry I keep taking your cool demo to dystopian extremes. :)<p>Oh, one more thing... Upon re-reading my previous comment, I recognize that the  description of my visceral reaction as on of being being "repulsed by the thought" could literally be read as me calling your system "repulsive", which was not my intent.  I think your tech is cool, and was just trying to convey two conflicting feelings that occurred within me when thinking about the future commercial use cases. I hope your systems works great so that if it does find market fit with such use cases, that, well... if it's inevitable - as the last few years of "LLMs everywhere!" has forced us all to adapt (accept or reject it, it still requires new effort) - we should hope for a good and working system, so I hope you succeed in making one.<p>Lastly, to your self-driving/potholes analogy... I do think that that fits more in line with my "objective CV classification" category; I think a closer fit to what you're building would be "self-driving car having to handle the Trolley Car Problem", with the nuances of human value judgements etc; does the car swerve into two adults vs one child? And so on. Pothole classification is more objective while driving into it, swerving to avoid it, classifying pedestrians and choosing one to possibly collide with, etc are subjective and more complicated (as is your system and the functions it can perform.)<p>Best of luck!</p>
]]></description><pubDate>Wed, 11 Feb 2026 12:41:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46974224</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46974224</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46974224</guid></item><item><title><![CDATA[New comment by ycombiredd in "Show HN: Multimodal perception system for real-time conversation"]]></title><description><![CDATA[
<p>Hmm.. My first thought is that great, now not only will e.g., HR/screening/hiring hand-off the reading/discerning tasks to an ML model, they'll now outsource the things that require any sort of emotional understanding (compassion, stress, anxiety, social awkwardness, etc) to a model too.<p>One part of me has a tendency to think "good, take some subjectivity away from a human with poor social skills", but another part of me is repulsed by the concept because we see how otherwise capable humans will defer to "expertise" of an LLM due to a notion of perceived "expertise" in the machine, or laziness (see recent kerfuffles in the legal field over hallucinated citations, etc.)<p>Objective classification in CV is one thing, but subjective identification (psychology, pseudoscientific forensic sociology, etc) via a multi-modal model triggers a sort of danger warning in me as initial reaction.<p>Neat work, though, from a technical standpoint.</p>
]]></description><pubDate>Tue, 10 Feb 2026 22:36:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46967953</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46967953</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46967953</guid></item><item><title><![CDATA[New comment by ycombiredd in "GitHub Actions is slowly killing engineering teams"]]></title><description><![CDATA[
<p>I remember setting up CruiseControl when I was at a J2EE shop. That and Mantis, but I don't remember which was  before which.</p>
]]></description><pubDate>Tue, 10 Feb 2026 20:37:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=46966530</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46966530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46966530</guid></item><item><title><![CDATA[New comment by ycombiredd in "AT&T, Verizon blocking release of Salt Typhoon security assessment reports"]]></title><description><![CDATA[
<p>"Lawful Intercept".<p>Some may find this interesting 
<a href="https://www.fcc.gov/calea" rel="nofollow">https://www.fcc.gov/calea</a></p>
]]></description><pubDate>Mon, 09 Feb 2026 23:16:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46952975</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46952975</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46952975</guid></item><item><title><![CDATA[New comment by ycombiredd in "DNS Explained – How Domain Names Get Resolved"]]></title><description><![CDATA[
<p>It might be worth mentioning the concept of "stub resolver" and clarifying a bit that a nameserver <i>is</i> a resolver. That might be being pedantic, but thought it might be worth clarifying that the difference conceptually may just be what the particular dns server answering the query is authoritative for, if anything.<p>One other thing that might be worth a mention is the concept of the OS' resolver and "suffix search order", with an example of connecting (https, ping, ssh, whatever protocol) to a host using just the hostname, and the aforementioned mechanism that (probably) allows this to connect to the FQDN you want. (Also, now that I type that, do you mention "FQDN" at all? If not, maybe should.)<p>On that note one final thought that occurs to me is the error/confound that may occur if a hostname is entered and is not resolved, but <i>does</i> resolve with one of the domain suffixes attached on a retry (particularly can be confusing with a typo coupled with a wildcard A record in a domain, for example.) I recognize that the lines that look like DNS records are not explicitly stated to be in a format for any particular dns server software, and even if they were, they're snippets without larger context so we don't know what the $ORIGIN for the zone might be, an adjacent concept you might want to explore, even if just for your own edification is that of the effect of a terminating "." at the end of a hostname, either at resolution or configuration time.<p>Just offering feedback that might help you add to the article.</p>
]]></description><pubDate>Sat, 07 Feb 2026 19:13:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=46926620</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46926620</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46926620</guid></item><item><title><![CDATA[New comment by ycombiredd in "GitHub Actions is slowly killing engineering teams"]]></title><description><![CDATA[
<p>I don't care if this is an advertisement for buildkite masquerading as a blog post or if this is just an honest rant. Either way, I gotta say it speaks a lot of truth.</p>
]]></description><pubDate>Fri, 06 Feb 2026 08:55:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46910565</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46910565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46910565</guid></item><item><title><![CDATA[New comment by ycombiredd in "Mermaid ASCII: Render Mermaid diagrams in your terminal"]]></title><description><![CDATA[
<p>Tangentially related, I once wanted to render a NetworkX DAG in ASCII, and created phart to do so.<p>There's an example of a fairly complicated graph of chess grandmaster PGM taken from a matplotlib example from the NetworkX documentation website, among some more trivial output examples in the README at <a href="https://github.com/scottvr/phart/blob/main/README.md#examples" rel="nofollow">https://github.com/scottvr/phart/blob/main/README.md#example...</a><p>(You will need to expand the examples by tapping/clicking on the rightward-facing triangle under "Examples", so that it rotates to downward facing and the hidden content section is displayed)</p>
]]></description><pubDate>Thu, 29 Jan 2026 14:06:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=46810371</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46810371</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46810371</guid></item><item><title><![CDATA[New comment by ycombiredd in "What came first: the CNAME or the A record?"]]></title><description><![CDATA[
<p>Yes. This type of behavior was what I was referring to in an earlier comment mentioning flashbacks to seeing logs from named filled with "cannot have cname and other data", and slapping my forehead asking "who keeps doing this?", in the days when editing files by hand was the norm.  And then, of course having repeats of this feeling as tools were built, automations became increasingly common, and large service providers "standardized" interfaces (ostensibly to ensure correctness) allowing or even encouraging creation of bad zone configurations.<p>The more things change, the more things stay the same. :-)</p>
]]></description><pubDate>Tue, 20 Jan 2026 05:14:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46688163</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46688163</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46688163</guid></item><item><title><![CDATA[New comment by ycombiredd in "What came first: the CNAME or the A record?"]]></title><description><![CDATA[
<p>You just caused flashbacks of error messages from BIND of the sort "cannot have CNAME and other data", from this proximate cause, and having to  explain the problem many, many times. Confusion and ambiguity of understandings have also existed since forever by people creating domain RR's (editing files) or the automated or more machined equivalents.<p>Related, the phrase "CNAME chains" causes vague memories of confusion surrounding the concepts of "CNAME" and casual usage of the term "alias". Without re-reading RFC1034 today, I recall that my understanding back in the day was that the "C" was for "canonical", and that the host record the CNAME itself resolved to must itself have an A record, and not be another CNAME, and I acknowledge the already discussed topic that my  "must" is doing a lot of lifting there, since the RFC in question predates a normative language standard RFC itself.<p>So, I don't remember exactly the initial point I was trying to get at with my second paragraph; maybe there has always been some various failure modes due to varying interpretations which have only compounded with age, new blood, non-standard language being used in self-serve DNS interfaces by providers, etc which I suppose only strengthens the "ambiguity" claim. That doesn't excuse such a large critical service provider though, at all.</p>
]]></description><pubDate>Tue, 20 Jan 2026 05:06:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=46688099</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46688099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46688099</guid></item><item><title><![CDATA[New comment by ycombiredd in "Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity"]]></title><description><![CDATA[
<p>So, I posted this link.  I actually did so assuming it likely already had already been submitted, and I wanted to discuss this with people more qualified and educated in the subject than I. The authors of this paper are definitely more qualified to publish such a paper than I am; I'm not an ML scientist and I am not trying to pose as one. The paper made me feel a sort of way, and caused a bunch of questions to come to mind I didn't find answers to in the paper but, as I'm willing to suppose, maybe I'm not even qualified to <i>read</i> such a paper. I considered  messaging the authors someplace like Twitter or in review/feedback on the Arxiv submission (which I probably don't have access to do with my user anyway, but I digress.) I decided that might make me seem like a hostile critic, or maybe likely, I'd just come off as an unqualified idiot.<p>So... HN came quickly to mind as a place where I can share a thought, considered opinion, ask questions, with potential to have them be answered by very smart and knowledgeable folks on a neutral ground. If you've made it this far into my comment, I already appreciate you. :)<p>Ok so... I've already disclaimed any authority, so I will get to my point and see what you guys can tell me. I read the paper (it is 80+ pages, so admittedly I skimmed some math, but also re-read some passages to feel more certain that I understood what they are saying.<p>I understand the phenomenon, and have no reason to doubt anything they put in the paper. But, as I mentioned, while reading it I had some intangible gut "feelings" that seeing that they have math to back what they're saying could not resolve for me. Maybe this is just because I don't understand the proofs.  Still, I realized when I stopped reading at it that it actually wasn't anything that they said, it was what it seemed to my naive brain was <i>not</i> said, and I felt like it should have been.<p>I'll try to get to the point. I completely buy that reframing prompts can reduce mode collapse. But, as I understand it, the chat interface in front of the backend API of any LLM tested does not have insight into logits, probs, etc. The parameters passed by the prompt request, and the probabilities returned with the generations (if asked for by the API request) do not leak, are not provided in the chat conversation context in any way, so that when you prompt an LLM to return a probability, it's responding with, essentially, the language <i>about</i> probabilities it learned during its training, and it seems rather unlikely that many training datasets contain actual factual information about their own contents' distributions for the model during training or RLHF to "learn" any useful probabilistic information about its own training data.<p>So, a part of the paper I re-read more than once says at one point (in 4.2): "Our method is training-free, model-agnostic, and requires no logit access."  This statement is unequivocally obviously true and honest, but - and I'm not trying to be rude or mean, I just feel like there is something subtle I'm missing or misunderstanding - because, said another way, that statement could also be true and honest if it said "Our method <i>has</i> no logit access, because the chat interface isn't designed that way", and here's what immediately follows then in my mind, which is "the model learned how humans write about probabilities and will output a number that may be near to (or far away from) the actually prob of the token/word/sentence/whathaveyou, and we observed that if you prompt the model in a way that causes it to output a number that looks like a probability (some digits, a decimal somewhere), along with the requested five jokes, it has an effect on the 'creativity' of the list of five jokes it gives you."<p>So, naturally, one wonders what, if any actual correlation there is between the numbers the LLM generates as "hallucinated" (I'm not trying to use the word in a loaded way; it's just the term that everyone understands for this meaning, with no sentiment behind my usage here) probabilities for the jokes it generated, and the actual probabilities thereof. I did see that they measured empirical frequencies of generated answers across runs and compared that empirical histogram to a proxy pretraining distribution, and that they acknowledge that they did no comparison or correlation of the "probabilities" output by the model, and they clearly state it. So without continuing to belabor that point, this is probably core to my confusion about the <i>framing</i> of what the paper says that the phenomenon indicates.<p>It is hard for me to stop asking all the slight variations on these questions that lead me to write this, but I will stop, and try to get to a TL;DR I think dear HN readers may appreciate more than my exposition of befuddlement bordering on dubiousness:<p>I guess the TLDR of my comment is that I am curious if the authors examined any relationship between the LLM verbalized "probabilities" and actual model sampling likelihoods (logprobs or selection frequency). I am not convinced that the verbalized "probabilities" themselves are doing any work other than functioning as token noise or prompt reframing.<p>I didn't see a control for, or even a comparison to/against multi-slot prompts with arbitrary labels or non-semantic "decorative" annotation.  In my experience poking and prodding LLMs as a user, desiring to influence generations in specific and sometimes unknown ways,  even lightweight slotting without probability language  substantially reduces repetition, which makes me wonder how much of the gain from VS is attributable to task reframing, as opposed to the probability verbalization itself.<p>This may not even be a topic of interest for anyone, and maybe nobody will even see my comment/questions, so I'll stop for now... but if anyone has insights, clarifications, or can point out where I'm being dense, I actually have quite a bit more to say and ask about this paper.<p>I can't really explain why I just <i>had</i> to see if I could get another insightful opinion on this paper (I usually don't have such a strong reaction when reading academic papers I may not fully understand, but there's some gap in my knowledge (or less likely, there's something off about the framing of the phenomenon described), and it's causing me to really hope for discussion, so I can ask my perhaps even less-qualified questions pertaining to what boils down to mostly just my intuition (or maybe incomprehension. Heh.)<p>Thanks so much if you've read this and even more if you can talk to me about what I've used too many words to try to convey here.</p>
]]></description><pubDate>Sun, 18 Jan 2026 07:17:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=46665519</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46665519</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46665519</guid></item><item><title><![CDATA[Verbalized Sampling: How to Mitigate Mode Collapse and Unlock LLM Diversity]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2510.01171">https://arxiv.org/abs/2510.01171</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46665183">https://news.ycombinator.com/item?id=46665183</a></p>
<p>Points: 1</p>
<p># Comments: 2</p>
]]></description><pubDate>Sun, 18 Jan 2026 05:59:29 +0000</pubDate><link>https://arxiv.org/abs/2510.01171</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46665183</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46665183</guid></item><item><title><![CDATA[New comment by ycombiredd in "Former NYC Mayor Eric Adams rugs his own memecoin just 30 minutes after launch"]]></title><description><![CDATA[
<p>Yeah, just following up to my grandparent comment to say "wow. Holy shit. It is how it looks." I'm not sure why I was surprised; maybe I'm an optimist, or as I suggested in my first comment, a bit naive.<p>In my defense, I don't think I'm stupid; I just don't <i>want</i> to believe so many people in power are cartoonishly evil, so I tend to look for explanations that don't require it. I think my internal sense of the world wants there to be a distinction between, say, average cryptoscammer evil buffoonery and the people in positions where at least <i>ostensibly</i> they try to present as a good guy while trying to keep their evildoings secret.  This story  gives me some sort of cognitive dissonance, and while reflecting on that fact, I get a bit sad.   This world is bonkers.</p>
]]></description><pubDate>Tue, 13 Jan 2026 23:37:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=46610095</link><dc:creator>ycombiredd</dc:creator><comments>https://news.ycombinator.com/item?id=46610095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46610095</guid></item></channel></rss>