<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: routerl</title><link>https://news.ycombinator.com/user?id=routerl</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 24 Apr 2026 08:27:20 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=routerl" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by routerl in "A DOGE staffer appears to be posting DOGE work on his public GitHub"]]></title><description><![CDATA[
<p>From the perspective of wanting to maintain the integrity of the American federal government, it seems like all this DOGE stuff (and the whole Trumpist movement in general) serves the purpose of a red team, in the cybersecurity sense; people with nebulous intent have gotten access to everything.<p>So now, if Americans care about the integrity of their government, there needs to be a blue team: how can this catastrophic level of access be dealt with, and how can it be safeguarded against in the future. Alas, I'm not seeing this perspective being enacted. The obvious security compromise is being allowed to stand and continue, usually on the basis that "separation of powers" and "checks and balances" are relied on to be effective; congress will stop this, or the courts will stop this. But we're watching these mechanisms fail.<p>So, what's the plan here? Where's the counter-offensive? We're watching a system being hacked, and I've yet to see anyone talk about a recovery plan, or a prevention plan.</p>
]]></description><pubDate>Sat, 01 Mar 2025 16:34:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=43220955</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=43220955</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43220955</guid></item><item><title><![CDATA[New comment by routerl in "The number line freaks me out (2016)"]]></title><description><![CDATA[
<p>It seems to be an article about all those "harmless" lies we tell students.<p>The vast majority of people think mathematics is about numbers, when it is actually about relations, and numbers are just some of the entities whose relations mathematics studies.<p>Nobody is born with this misconception; we teach it, and test it, and thereby ingrain it in the minds of every student, most of whom will never study mathematics at a level that makes them go "wait, what?". The overwhelming majority of people never get to this level.<p>I suspect this is also why statistics feels so counterintuitive to so many people, including me. The Monty Hall problem is only a problem to those who are naive about probability, which is most people, because most of us don't learn any of this stuff early enough to form long lasting, correct instincts.<p>It's not fair to students to bake "harmless" lies into their early education, as a way to simplify the topic such that it becomes more easily teachable. We've only done this because teaching is hard, and thus expensive. Education is expensive, at every step. It's not fair or productive to build a gate around proper education that makes it available only to those who can afford it at the level where the early misconceptions get corrected. Even those people end up spending a lot of cognitive capital on all those "wait, what?" moments, when their cognitive capital would be better spent elsewhere.</p>
]]></description><pubDate>Wed, 19 Feb 2025 03:02:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=43098089</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=43098089</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43098089</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>OP here, I'm adding a feature that will allow users to save specific words to lists, and export the lists in formats that can be imported to flashcard apps.</p>
]]></description><pubDate>Tue, 11 Feb 2025 22:07:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=43018977</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=43018977</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43018977</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>For anonymous users, I'm using OpenNMT, via Argos. Logged in users get DeepL translations, which correctly translates 气功师.</p>
]]></description><pubDate>Sat, 08 Feb 2025 15:48:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=42983705</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42983705</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42983705</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>Thank you, and thanks for checking it out!<p>I use Pleco almost every day :)</p>
]]></description><pubDate>Sat, 08 Feb 2025 11:52:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=42982329</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42982329</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42982329</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>I did! Jieba is the first step in my segmentation pipeline. As far as I can tell, Jieba's default config tends to work <i>better</i> for simplified, but in my case the custom dictionary I feed it has significantly more traditional entries than simplified entries, especially for historical terms and slang.</p>
]]></description><pubDate>Sat, 08 Feb 2025 11:51:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=42982323</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42982323</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42982323</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>It supports traditional and simplified, as well as pinyin and bopomofo :)<p>It's already possible to switch instantly between pinyin and bopomofo, and I'm working on letting users switch between simplified/traditional, but this is also a non-trivial problem. For now, the app will follow the user's lead: if you enter traditional text, it will return traditional text, and same goes for simplified.</p>
]]></description><pubDate>Sat, 08 Feb 2025 11:48:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=42982315</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42982315</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42982315</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>Thanks for the kind words, and the bug report!<p>The (awful and incorrect) translation you've pointed out comes from the segmenter being too greedy, not finding the (non-existent) word in any dictionary, and therefore dispatching the word to be machine translated, without context. This is the final fallback in the segmentation pipeline, to avoid displaying nothing at all, and my priority right now is making the segmentation pipeline more robust so this rarely (or never) happens, since it sometimes produces hilariously bad results!</p>
]]></description><pubDate>Sat, 08 Feb 2025 11:44:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=42982300</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42982300</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42982300</guid></item><item><title><![CDATA[New comment by routerl in "Nontraditional Red Teams"]]></title><description><![CDATA[
<p>'"Tradition" is a set of solutions for which we have forgotten the problems. Throw away the solution and you get the problem back.'<p>This is, by far, my most conservative opinion. Credit to Donald Kingsbury for the quote.<p>Honorable mention re: the same problem, "dogfooding"[0] is gone from the software industry, which is why users often feel like they're getting suckered by the companies they patronize; the decision makers, who don't themselves use the product, absolutely see the users as suckers.<p>[0] <a href="https://en.wikipedia.org/wiki/Eating_your_own_dog_food" rel="nofollow">https://en.wikipedia.org/wiki/Eating_your_own_dog_food</a></p>
]]></description><pubDate>Fri, 07 Feb 2025 00:48:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=42968084</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42968084</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42968084</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>Got it, thanks!</p>
]]></description><pubDate>Wed, 05 Feb 2025 21:19:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=42955315</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42955315</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42955315</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>Could you post the text you used? This kind of thing goes straight into my unit tests.<p>I'm also working on showing all the pronunciations/definitions for a given hanzi, it should be ready later this week.</p>
]]></description><pubDate>Tue, 04 Feb 2025 21:44:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=42939276</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42939276</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42939276</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: Mandarin Word Segmenter with Translation"]]></title><description><![CDATA[
<p>Thanks for the kind words!<p>I'm using Jieba[0] because it hits a nice balance of fast and accurate. But I'm initializing it with a custom dictionary (~800k entries), and have added several layers of heuristic post-segmentation. For example, Jieba tends to split up chengyu into two words, but I've decided they should be displayed as a single word, since chengyu are typically a single entry in dictionaries.<p>[0] <a href="https://github.com/fxsjy/jieba">https://github.com/fxsjy/jieba</a></p>
]]></description><pubDate>Tue, 04 Feb 2025 19:51:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=42937651</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42937651</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42937651</guid></item><item><title><![CDATA[New comment by routerl in "Ask HN: Who wants to be hired? (February 2025)"]]></title><description><![CDATA[
<p>Location: Toronto, Canada<p>Remote: Yes<p>Willing to relocate: No<p>Technologies:<p>- Frontend: Typescript, Javascript, NextJS, React, Redux, Swift<p>- Backend: Python, Django, PostgreSQL, Docker<p>Résumé/CV: Via e-mail<p>Email: roberto.loja+hn@gmail.com<p>I've submitted my latest work as a Show HN post at <a href="https://news.ycombinator.com/item?id=42936085">https://news.ycombinator.com/item?id=42936085</a></p>
]]></description><pubDate>Tue, 04 Feb 2025 17:59:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=42936132</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42936132</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42936132</guid></item><item><title><![CDATA[Show HN: Mandarin Word Segmenter with Translation]]></title><description><![CDATA[
<p>I've built mandoBot, a web app that segments and translates Mandarin Chinese text. This is a Django API (using Django-Ninja and PostgreSQL) and a NextJS front-end (with Typescript and Chakra). For a sample of what this app does, head to <a href="https://mandobot.netlify.app/?share_id=e8PZ8KFE5Y" rel="nofollow">https://mandobot.netlify.app/?share_id=e8PZ8KFE5Y</a>. This is my presentation of the first chapter of a classic story from the Republican era of Chinese fiction, Diary of a Madman by Lu Xun. Other chapters are located in the "Reading Room" section of the app.<p>This app exists because reading Mandarin is very hard for learners (like me), since Mandarin text does not separate words using spaces in the same way Western languages do. But extensive reading is the most effective way to learn vocabulary and grammar. Thus, learning Mandarin by reading requires first memorizing hundreds or thousands of words, before you can even know where one word ends and the next word begins.<p>I'm solving this problem by allowing users to input Mandarin text, which is then computationally segmented and machine translated by my server, which also adds dictionary definitions for each word and character. The hard part is the segmentation: it turns out that "Chinese Word Segmentation"[0] is <i>the</i> central problem in Chinese Natural Language Processing; no current solutions reach 100% accuracy, whether they're from Stanford[1], Academia Sinica[2], or Tsing Hua University[3]. This includes every LLM currently available.<p>I could talk about this for hours, but the bottom line is that this app is a way to develop my full-stack skills; the backend should be fast, accurate, secure, well-tested, and well-documented, and the front-end should be pretty, secure, well-tested, responsive, and accessible. I am the sole developer, and I'm open to any comments and suggestions: roberto.loja+hn@gmail.com<p>Thanks HN!<p>[0] <a href="https://en.wikipedia.org/wiki/Chinese_word-segmented_writing" rel="nofollow">https://en.wikipedia.org/wiki/Chinese_word-segmented_writing</a><p>[1] <a href="https://nlp.stanford.edu/software/segmenter.shtml" rel="nofollow">https://nlp.stanford.edu/software/segmenter.shtml</a><p>[2] <a href="https://ckip.iis.sinica.edu.tw/project/ws" rel="nofollow">https://ckip.iis.sinica.edu.tw/project/ws</a><p>[3] <a href="http://thulac.thunlp.org/" rel="nofollow">http://thulac.thunlp.org/</a></p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=42936085">https://news.ycombinator.com/item?id=42936085</a></p>
<p>Points: 48</p>
<p># Comments: 35</p>
]]></description><pubDate>Tue, 04 Feb 2025 17:56:33 +0000</pubDate><link>https://mandobot.netlify.app/</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42936085</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42936085</guid></item><item><title><![CDATA[New comment by routerl in "AI Expert's Testimony Collapses over Fake AI Citations"]]></title><description><![CDATA[
<p>Seems like a straightforward case of malpractice. The guy had every ability to double check the hallucinated references, but didn't do so; he used AI to <i>replace</i> his own expertise, rather than augment it. I.e. he didn't "use AI to help write", he instead outsourced his work to AI with no oversight.</p>
]]></description><pubDate>Mon, 03 Feb 2025 23:49:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=42925095</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42925095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42925095</guid></item><item><title><![CDATA[New comment by routerl in "Show HN: DeepSeek Your HN Profile"]]></title><description><![CDATA[
<p>I can't get mad at this, it's so spot-on.<p>"Predictions for 2025
Personal Projects<p>Will start working on a novel semantic search engine for Chinese literature, combining their interest in both Chinese culture and search technologies"<p>Oh wow, I've been working on this for the past two months and haven't posted about it yet.</p>
]]></description><pubDate>Tue, 28 Jan 2025 21:03:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=42857960</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=42857960</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42857960</guid></item><item><title><![CDATA[New comment by routerl in "Forget ChatGPT: why researchers now run small AIs on their laptops"]]></title><description><![CDATA[
<p>Yes.</p>
]]></description><pubDate>Tue, 24 Sep 2024 13:41:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=41636360</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=41636360</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41636360</guid></item><item><title><![CDATA[New comment by routerl in "Forget ChatGPT: why researchers now run small AIs on their laptops"]]></title><description><![CDATA[
<p>Imho opinion, and I'm no expert, but this has been working well for me:<p>Segment the texts into chunks that make sense (i.e. into the lengths of text you'll want to find, whether this means chapters, sub-chapters, paragraphs, etc), create embeddings of each chunk, and store the resultant vectors in a vector database. Your search workflow will then be to create an embedding of your query, and perform a distance comparison (e.g. cosine similarity) which returns ranked results. This way you can now semantically search your texts.<p>Everything I've mentioned above is fairly easily doable with existing LLM libraries like langchain or llamaindex. For reference, this is an RAG workflow.</p>
]]></description><pubDate>Sat, 21 Sep 2024 15:07:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=41610475</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=41610475</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41610475</guid></item><item><title><![CDATA[New comment by routerl in "Steve Ballmer's incorrect binary search interview question"]]></title><description><![CDATA[
<p>This write-up makes the erroneous assumption that he's choosing randomly. He himself says, in this same write-up, that he's choosing adversarially.<p>Nice write-up anyway, and yes, Ballmer is wrong.</p>
]]></description><pubDate>Tue, 03 Sep 2024 13:48:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=41434910</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=41434910</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41434910</guid></item><item><title><![CDATA[New comment by routerl in "China's 'Wukong' Hit Sells 10M Copies in Three Days"]]></title><description><![CDATA[
<p>It's not a "Chinese mythical book", it's one of the classical novels of Chinese civilization. Think of it as a cross between Lord of the Rings and The Iliad, but containing extensive references to ancient Chinese tales, culture, religion (especially Buddhism), and history (the central monk character in the story is based on a real, and revered, historical monk).<p>It also has beloved and well known characters who have featured in all kinds of Chinese stories and media for centuries: e.g. Erlang (no relation to the programming language), who is prominently in the opening cutscene, and is often found in stories accompanied by Nezha (who is so popular that generations of Chinese kids grew up hearing this[0] song and watching that show).<p>And this is by no means just a Chinese phenomenon. This story, Journey to the West, is a cultural keystone in all of Asia: Dragonball is very much based on Journey to the West[1], and "Son Goku" is just the Japanese pronunciation of the name "Sun Wukong", who is the monkey protagonist of this game. The two share many of their powers and characteristics, including flying around on a magic cloud and becoming powerful enough to challenge literal gods.<p>Finally, this is perhaps the first "postmodern" retelling of this extremely popular story. The game is called "black myth" because it is clearly darker and more serious than previous retellings of this story. For someone who knows this story well (i.e. basically anyone who grew up in Asia), it is a fresh version of an old classic. In this sense, this game is the equivalent of what The Witcher was for (mainly Eastern) Europeans; it takes legends, stories, superstitions you grew up hearing about (e.g. vampires, werewolves, etc) and breathes new life into them.<p>[0] <a href="https://m.youtube.com/watch?v=TG_KTrCetcM" rel="nofollow">https://m.youtube.com/watch?v=TG_KTrCetcM</a><p>[1] even down to having a cowardly pigman companion, Bajie, who is also in this game.</p>
]]></description><pubDate>Thu, 29 Aug 2024 00:35:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=41385985</link><dc:creator>routerl</dc:creator><comments>https://news.ycombinator.com/item?id=41385985</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41385985</guid></item></channel></rss>