<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: anonymoushn</title><link>https://news.ycombinator.com/user?id=anonymoushn</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 10 Jun 2026 08:23:26 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=anonymoushn" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by anonymoushn in "Show HN: Actual Claude Tokenizer"]]></title><description><![CDATA[
<p>I couldn't reproduce this behavior with Sonnet 4, and Sonnet 3.7 has been deprecated since I messed with this stuff. You can try tokenizing the string "<hello> </hello>"<p>I think the correct tokenization of the string will not have any tokens that contain mixed punctuation and letters, but the result of this approach does contain such claimed tokens.</p>
]]></description><pubDate>Mon, 04 May 2026 22:53:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=48016023</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=48016023</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48016023</guid></item><item><title><![CDATA[New comment by anonymoushn in "Show HN: Actual Claude Tokenizer"]]></title><description><![CDATA[
<p>That's " 'd ".strip(), an english contraction suffix. it's 1 token, but using this echo approach you will be served the apostrophe and the subsequent letter for the first time in different steps.</p>
]]></description><pubDate>Mon, 04 May 2026 20:03:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=48014208</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=48014208</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48014208</guid></item><item><title><![CDATA[New comment by anonymoushn in "Show HN: Actual Claude Tokenizer"]]></title><description><![CDATA[
<p>You can't reliably obtain correct token boundaries with this method. For example, "'d" is 1 token, but the API will return "d" stuck to the next token. Weirdly this seems to be specific to the letter "d". Similar stuff happens around "<". About all caps words, some words are in the vocab in all caps, such as MERCHANTABILITY.</p>
]]></description><pubDate>Tue, 21 Apr 2026 09:40:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47846625</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47846625</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47846625</guid></item><item><title><![CDATA[New comment by anonymoushn in "Claude Token Counter, now with model comparisons"]]></title><description><![CDATA[
<p>their old tokenizer performed some space collapsing that allowed them to use the same token id for a word with and without the leading space (in cases where the context usually implies a space and one is not present, a "no space" symbol is used).</p>
]]></description><pubDate>Mon, 20 Apr 2026 05:28:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47830680</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47830680</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47830680</guid></item><item><title><![CDATA[New comment by anonymoushn in "[dead]"]]></title><description><![CDATA[
<p>Is this the wrong URL? this seems to be a blog post from October 2025 called "Introducing: Local Browser AI"</p>
]]></description><pubDate>Mon, 20 Apr 2026 04:16:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=47830340</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47830340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47830340</guid></item><item><title><![CDATA[New comment by anonymoushn in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"]]></title><description><![CDATA[
<p>How do you guys decide which settings should be configurable via environment variables but not settings files and which settings should be configurable via settings files but not environment variables?</p>
]]></description><pubDate>Mon, 06 Apr 2026 22:31:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668209</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47668209</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668209</guid></item><item><title><![CDATA[New comment by anonymoushn in "Issue: Claude Code is unusable for complex engineering tasks with Feb updates"]]></title><description><![CDATA[
<p>> On of our product principles is to avoid changing settings on users' behalf<p>Ideally there wouldn't be silent changes that greatly reduce the utility of the user's session files until they set a newly introduced flag.<p>I happen to think this is just true in general, but another reason it might be true is that the experience the user has is identical to the experience they would have had if you first introduced the setting, defaulting it to the existing behavior, and then subsequently changed it on users' behalf.</p>
]]></description><pubDate>Mon, 06 Apr 2026 22:23:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47668126</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47668126</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47668126</guid></item><item><title><![CDATA[New comment by anonymoushn in "A Faster Alternative to Jq"]]></title><description><![CDATA[
<p>Oh, can you post some benchmarks? I didn't know that parser throughput per core would change with the amount of data like that.</p>
]]></description><pubDate>Wed, 01 Apr 2026 15:18:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47602072</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47602072</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47602072</guid></item><item><title><![CDATA[New comment by anonymoushn in "The Claude Code Source Leak: fake tools, frustration regexes, undercover mode"]]></title><description><![CDATA[
<p>why</p>
]]></description><pubDate>Tue, 31 Mar 2026 18:37:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=47591611</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47591611</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47591611</guid></item><item><title><![CDATA[New comment by anonymoushn in "A Faster Alternative to Jq"]]></title><description><![CDATA[
<p>are those tools known for their fast json parsers?</p>
]]></description><pubDate>Fri, 27 Mar 2026 12:44:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47542039</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47542039</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47542039</guid></item><item><title><![CDATA[New comment by anonymoushn in "Sub-Millisecond RAG on Apple Silicon. No Server. No API. One File"]]></title><description><![CDATA[
<p>ideally users could be banned for posting LLM outputs as if they were authored by humans <a href="https://www.pangram.com/history/49335ddf-118d-43e4-9340-a58a9068ed35/?ucc=efayZB4DPJy" rel="nofollow">https://www.pangram.com/history/49335ddf-118d-43e4-9340-a58a...</a></p>
]]></description><pubDate>Tue, 17 Feb 2026 19:54:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=47052322</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=47052322</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47052322</guid></item><item><title><![CDATA[New comment by anonymoushn in "LLM Structured Outputs Handbook"]]></title><description><![CDATA[
<p>Hello, the part about canonical filtering in <a href="https://openreview.net/pdf?id=DFybOGeGDS" rel="nofollow">https://openreview.net/pdf?id=DFybOGeGDS</a> doesn't seem to try to account for pretokenization. For example, if you receive " 天天中彩票APP" in o200k, it means there has to be a lowercase letter within the span of letters, and while tokens like (4 spaces) may be pairwise compatible with tokens like "123" according to the BPE merge rules, the pretokenizer would split the span of spaces to give (3 spaces), " ", "123" instead. Are you aware of any work that does actual canonical generation for models with this kind of pretokenization regex?</p>
]]></description><pubDate>Sat, 17 Jan 2026 06:41:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=46655870</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46655870</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46655870</guid></item><item><title><![CDATA[New comment by anonymoushn in "Ask HN: Cursor (LLM) Costs"]]></title><description><![CDATA[
<p>use claude code if you want to use opus</p>
]]></description><pubDate>Tue, 13 Jan 2026 13:09:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=46600486</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46600486</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46600486</guid></item><item><title><![CDATA[New comment by anonymoushn in "Show HN: Create LLM-optimized random identifiers"]]></title><description><![CDATA[
<p>what does "logprobs look off" mean</p>
]]></description><pubDate>Mon, 12 Jan 2026 16:24:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46590619</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46590619</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46590619</guid></item><item><title><![CDATA[New comment by anonymoushn in "I/O is no longer the bottleneck? (2022)"]]></title><description><![CDATA[
<p>Hello, a couple years ago I participated in a contest to count word frequencies and generate a sorted histogram. There's a cool post about it featuring a video discussing the tricks used by some participants. <a href="https://easyperf.net/blog/2022/05/28/Performance-analysis-and-tuning-contest-6#upd-july-20th-2022-results" rel="nofollow">https://easyperf.net/blog/2022/05/28/Performance-analysis-an...</a><p>Some other participants said that they measured 0 difference in runtime between pshufb+eq and eqx3+orx2, but i think your problem has more classes of whitespace, and for the histogram problem, considerations about how to hash all the words in a chunk of the input dominate considerations about how to obtain the bitmasks of word-start or word-end positions.</p>
]]></description><pubDate>Tue, 06 Jan 2026 12:39:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46511507</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46511507</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46511507</guid></item><item><title><![CDATA[New comment by anonymoushn in "Show HN: Steganography in natural language using LLM logit-rank steering"]]></title><description><![CDATA[
<p>requires fully deterministic inference, which turns out to be unusual, but for this sort of thing it's probably fine if you do really slow inference on cpu. cool idea.</p>
]]></description><pubDate>Sat, 03 Jan 2026 16:32:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46478522</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46478522</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46478522</guid></item><item><title><![CDATA[New comment by anonymoushn in "Ask HN: Startup launch destroyed by Bolt.new's AI. 10M tokens gone, no response"]]></title><description><![CDATA[
<p>please write your own posts from now on</p>
]]></description><pubDate>Sun, 21 Dec 2025 04:50:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46342342</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46342342</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46342342</guid></item><item><title><![CDATA[New comment by anonymoushn in "From text to token: How tokenization pipelines work"]]></title><description><![CDATA[
<p>i love stemming, i love searching for "anime" and getting "animal"</p>
]]></description><pubDate>Tue, 16 Dec 2025 22:29:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46295512</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46295512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46295512</guid></item><item><title><![CDATA[New comment by anonymoushn in "SSE sucks for transporting LLM tokens"]]></title><description><![CDATA[
<p>so sad to hear that about Streaming SIMD Extensions</p>
]]></description><pubDate>Sat, 13 Dec 2025 19:05:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46257004</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46257004</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46257004</guid></item><item><title><![CDATA[New comment by anonymoushn in "Leaving Intel"]]></title><description><![CDATA[
<p>This is true economically but in reality if you have much larger cost savings than that for sale then these companies mostly say "we would be happy to buy that for $0 while we pay you a million a year to move to the united states"</p>
]]></description><pubDate>Sat, 06 Dec 2025 03:46:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=46170481</link><dc:creator>anonymoushn</dc:creator><comments>https://news.ycombinator.com/item?id=46170481</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46170481</guid></item></channel></rss>