Hacker News: capnrefsmmat

New comment by capnrefsmmat in "TerraPower in deal with Meta for eight Natrium 345 MW nuclear plants"

capnrefsmmat — Thu, 18 Jun 2026 16:27:54 +0000

From your link,

> TerraPower must still complete construction, submit an operating license application, and satisfy all applicable safety and regulatory requirements before loading fuel and beginning operations.

New comment by capnrefsmmat in "ICAO issued new power bank restriction on flight"

capnrefsmmat — Sat, 28 Mar 2026 17:03:41 +0000

Discussion is included in the Dangerous Goods Panel report, agenda item 4.3 (pages 39-41) and Appendix E (beginning page 89). https://www.icao.int/sites/default/files/DangerousGoods/DGP%...

Paragraph 4.3.3:

> While data indicated that portable electronic devices were more often the cause of fire in aircraft cabins than power banks were, the latter were a significant concern due to their increased use and a prevalence of lower-quality products with defects or vulnerabilities that were more likely to lead to thermal events. Power banks were also not offered the same level of protection that batteries installed in portable electronic devices were provided. The amendments therefore focused on power banks.

New comment by capnrefsmmat in "FreeBSD Capsicum vs. Linux Seccomp Process Sandboxing"

capnrefsmmat — Mon, 09 Mar 2026 16:28:06 +0000

Several reasons:

1. The post mainly reiterates a single idea (Capsicum enumerates what the process can do, seccomp provides a configurable filter) in many different ways. There is not much actual depth, code samples notwithstanding. Nothing on why different designs were chosen, how easy each is to use, outcomes besides the Chrome example, etc.

2. There are a lot of AI writing tells, like staccato sentences, parallelism ("Same browser. Same threat model. Same problem."), pointless summary tables, "it's not X, it's Y" contradiction ("This is not a bug. It is the original Unix security model"), etc.

3. The author has roughly a blog post a day, all with similar style and on widely varied topics, and in the same writing style. Unless the author has deep expertise on a remarkably wide range of topics and spends all their time writing, these can't reflect deep insight or experience, but minimal editing of AI output.

So yes, it's pretty sloppy.

New comment by capnrefsmmat in "LLM Writing Tropes.md"

capnrefsmmat — Sun, 08 Mar 2026 13:23:40 +0000

Probably. One common feature of LLM output is grammatical features that indicate information density, like nominalizations, longer words, participial clauses, and so on. Perhaps training tasks that involve asking the LLMs for concise explanations or summaries encourage the use of these features to give denser answers.

New comment by capnrefsmmat in "LLM Writing Tropes.md"

capnrefsmmat — Sun, 08 Mar 2026 13:21:37 +0000

Thanks for the links. You may be interested in the other LLM writing style studies I've been collecting: https://www.refsmmat.com/notebooks/llm-style.html

New comment by capnrefsmmat in "LLM Writing Tropes.md"

capnrefsmmat — Sun, 08 Mar 2026 13:20:39 +0000

I've heard the Kenya and Nigeria story, but has anyone backed it up with quantitative evidence that the vocabulary LLMs overuse coincides with the vocabulary that is more common in Kenyan and Nigerian English than in American English?

New comment by capnrefsmmat in "LLM Writing Tropes.md"

capnrefsmmat — Sat, 07 Mar 2026 23:46:07 +0000

I work on research studying LLM writing styles, so I am going to have to steal this. I've seen plenty of lists of LLM style features, but this is the first one I noticed that mentions "tapestry", which we found is GPT-4o's second-most-overused word (after "camaraderie", for some reason).[1] We used a set of grammatical features in our initial style comparisons (like present participles, which GPT-4o loved so much that they were a pretty accurate classifier on their own), but it shouldn't be too hard to pattern-match some of these other features and quantify them.

If anyone who works on LLMs is reading, a question: When we've tried base models (no instruction tuning/RLHF, just text completion), they show far fewer stylistic anomalies like this. So it's not that the training data is weird. It's something in instruction-tuning that's doing it. Do you ask the human raters to evaluate style? Is there a rubric? Why is the instruction tuning pushing such a noticeable style shift?

[1] https://www.pnas.org/doi/10.1073/pnas.2422455122, preprint at https://arxiv.org/abs/2410.16107. Working on extending this to more recent models and other grammatical features now

New comment by capnrefsmmat in "A new bill in New York would require disclaimers on AI-generated news content"

capnrefsmmat — Fri, 06 Feb 2026 12:27:48 +0000

No, that doesn't really work so well. A lot of the LLM style hallmarks are still present when you ask them to write in another style, so a good quantitative linguist can find them: https://hdsr.mitpress.mit.edu/pub/pyo0xs3k/release/2

That was with GPT4, but my own work with other LLMs show they have very distinctive styles even if you specifically prompt them with a chunk of human text to imitate. I think instruction-tuning with tasks like summarization predisposes them to certain grammatical structures, so their output is always more information-dense and formal than humans.

New comment by capnrefsmmat in "How AI assistance impacts the formation of coding skills"

capnrefsmmat — Fri, 30 Jan 2026 12:41:07 +0000

The first sentence is a reference to prior research work that has found those productivity gains, not a summary of the experiment conducted in this paper.

New comment by capnrefsmmat in "Beginning January 2026, all ACM publications will be made open access"

capnrefsmmat — Fri, 19 Dec 2025 13:45:09 +0000

Most of the tedious formatting requirements do not match what the final typeset article looks like. The requirements are instead theoretically to benefit peer reviewers, e.g., by having double-spaced lines so they can write their comments on the paper copy that was mailed to them back when the submission guidelines were written in the 1950s.

The smarter journals have started accepting submissions in any format on the first round, and then only require enough formatting for the typesetters to do their job.

New comment by capnrefsmmat in "Beginning January 2026, all ACM publications will be made open access"

capnrefsmmat — Fri, 19 Dec 2025 00:00:11 +0000

Outside of disciplines that use LaTeX, the ability of authors to do typesetting is pretty limited. And there are other typesetting requirements that no consumer tool makes particularly easy; for instance, due to funding requirements, many journals deposit biomedical papers with PubMed Central, which wants them in JATS XML. So publishers have to prepare a structured XML version of papers.

Accessibility in PDFs is also very difficult. I'm not sure any publishers are yet meeting PDF/UA-2 requirements for tagged PDFs, which include things like embedding MathML representations of all mathematics so screenreaders can parse the math. LaTeX only supports this experimentally, and few other tools support it at all.

New comment by capnrefsmmat in "State of AI-assisted software development"

capnrefsmmat — Tue, 23 Sep 2025 16:23:33 +0000

It didn't "survey" devs. It paid them to complete real tasks while they were randomly assigned to use AI or not, and measured the actual time taken to complete the tasks vs. just the perception. It is much higher quality evidence than a convenience sample of developers who just report their perceptions.

New comment by capnrefsmmat in "Researchers find evidence of ChatGPT buzzwords turning up in everyday speech"

capnrefsmmat — Wed, 27 Aug 2025 22:41:54 +0000

Sure, if you're learning to write and want lots of examples of a particular style, LLMs can generate that for you. Just don't assume that is a normal writing style, or that it matches a particular genre (say, workplace communication, or academic writing, or whatever).

Our experience (https://arxiv.org/abs/2410.16107) is that LLMs like GPT-4o have a particular writing style, including both vocabulary and distinct grammatical features, regardless of the type of text they're prompted with. The style is informationally dense, features longer words, and favors certain grammatical structures (like participles; GPT-4o loooooves participles).

With Llama we're able to compare base and instruction-tuned models, and it's the instruction-tuned models that show the biggest differences. Evidently the AI companies are (deliberately or not) introducing particular writing styles with their instruction-tuning process. I'd like to get access to more base models to compare and figure out why.

New comment by capnrefsmmat in "Don't Fall for AI: Reasons for Writers to Reject Slop"

capnrefsmmat — Thu, 17 Jul 2025 23:45:59 +0000

I don't think the AI companies are systematically working to make their models sound more human. They're working to make them better at specific tasks, but the writing styles are, if anything, even more strange as they advance.

Comparing base and instruction-tuned models, the base models are vaguely human in style, while instruction-tuned models systematically prefer certain types of grammar and style features. (For example, GPT-4o loves participial clauses and nominalizations.) https://arxiv.org/abs/2410.16107

When I've looked at more recent models like o3, there are other style shifts. The newer OpenAI models increasingly use bold, bulleted lists, and headings -- much more than, say, GPT-3.5 did.

So you get what you optimize for. OpenAI wants short, punchy, bulleted answers that sound authoritative, and that's what they get. But that's not how humans write, and so it'll remain easy to spot AI writing.

New comment by capnrefsmmat in "Lightfastness Testing of Colored Pencils"

capnrefsmmat — Tue, 08 Jul 2025 12:14:04 +0000

Probably because the article uses the Unicode right single quotation mark instead of apostrophes, due to some automated smart-quote machinery. I'll have to adjust the tagger to handle those.

New comment by capnrefsmmat in "Lightfastness Testing of Colored Pencils"

capnrefsmmat — Mon, 07 Jul 2025 22:51:16 +0000

In our studies of ChatGPT's grammatical style (https://arxiv.org/abs/2410.16107), it really loves past and present participial phrases (2-5x more usage than humans). I didn't see any here in a glance through the lightfastness section, though I didn't try running the whole article through spaCy to check. In any case it doesn't trip my mental ChatGPT detector either; it reads more like classic SEO writing you'd see all over blogs in the 20-teens.

edit: yeah, ran it through our style feature tagger and nothing jumps out. Low rate of nominalizations (ChatGPT loves those), only a few present participles, "that" as subject at a usual rate, usual number of adverbs, etc. (See table 3 of the paper.) No contractions, which is unusual for normal human writing but common when assuming a more formal tone. I think the author has just affected a particular style, perhaps deliberately.

New comment by capnrefsmmat in "OpenAI slams court order to save all ChatGPT logs, including deleted chats"

capnrefsmmat — Thu, 05 Jun 2025 12:19:36 +0000

OpenAI is the custodian of the user data, so they are responsible. If you wanted the court (i.e., the plaintiffs) to find specific infringing chatters, first they'd have to get the data from OpenAI to find who it is -- which is exactly what they're trying to do, and why OpenAI is being told to preserve the data so they can review it.

New comment by capnrefsmmat in "Differences in link hallucination and source comprehension across different LLM"

capnrefsmmat — Thu, 05 Jun 2025 12:16:20 +0000

If the output is interpreting sources rather than just regurgitating quotes from them, you need to exert judgment to verify they support its claims. When the LLM output is about some highly technical subject, it can require expert knowledge just to judge whether the source supports the claims.

New comment by capnrefsmmat in "OpenAI slams court order to save all ChatGPT logs, including deleted chats"

capnrefsmmat — Thu, 05 Jun 2025 12:13:17 +0000

Courts have always had the power to compel parties to a current case to preserve evidence. (For example, this was an issue in the Google monopoly case, since Google employees were using chats set to erase after 24 hours.) That becomes an issue in the discovery phase, well after the defendant has an opportunity to file a motion to dismiss. So a case with no specific allegation of wrongdoing would already be dismissed.

The power does not extend to any of your hypotheticals, which are not about active cases. Courts do not accept cases on the grounds that some bad thing might happen in the future; the plaintiff must show some concrete harm has already occurred. The only thing different here is how much potential evidence OpenAI has been asked to retain.

New comment by capnrefsmmat in "My AI skeptic friends are all nuts"

capnrefsmmat — Mon, 02 Jun 2025 23:57:36 +0000

For introductory problems, the kind we use to get students to understand a concept for the first time, the AI would likely (nearly) nail it on the first try. They wouldn't have to fix any non-working code. And annotating the code likely doesn't serve the same pedagogical purpose as writing it yourself.

Students emerge from lectures with a bunch of vague, partly contradictory, partly incorrect ideas in their head. They generally aren't aware of this and think the lecture "made sense." Then they start the homework and find they must translate those vague ideas into extremely precise code so the computer can do it -- forcing them to realize they do not understand, and forcing them to make the vague understanding concrete.

If they ask an AI to write the code for them, they don't do that. Annotating has some value, but it does not give them the experience of seeing their vague understanding run headlong into reality.

I'd expect the result to be more like what happens when you show demonstrations to students in physics classes. The demonstration is supposed to illustrate some physics concept, but studies measuring whether that improves student understanding have found no effect: https://doi.org/10.1119/1.1707018

What works is asking students to make a prediction of the demonstration's results first, then show them. Then they realize whether their understanding is right or wrong, and can ask questions to correct it.

Post-hoc rationalizing an LLM's code is like post-hoc rationalizing a physics demo. It does not test the students' internal understanding in the same way as writing the code, or predicting the results of a demo.