<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: deusu</title><link>https://news.ycombinator.com/user?id=deusu</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 14:02:39 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=deusu" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by deusu in "Startpage – private search engine"]]></title><description><![CDATA[
<p>You can do your own search-index with 2.3 billion pages for about €300/month:<p><a href="https://deusu.org" rel="nofollow">https://deusu.org</a></p>
]]></description><pubDate>Sun, 29 Jan 2017 20:48:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=13516546</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=13516546</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=13516546</guid></item><item><title><![CDATA[Why DeuSu may succeed where Wikia Search failed]]></title><description><![CDATA[
<p>Article URL: <a href="https://deusu.org/blog/2016-11-29-why_deusu_may_succeed_where_wikia_search_failed.html">https://deusu.org/blog/2016-11-29-why_deusu_may_succeed_where_wikia_search_failed.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=13062278">https://news.ycombinator.com/item?id=13062278</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 29 Nov 2016 10:58:37 +0000</pubDate><link>https://deusu.org/blog/2016-11-29-why_deusu_may_succeed_where_wikia_search_failed.html</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=13062278</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=13062278</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>It's alive and well. The TIOBE index still lists it ahead of Ruby, Swift, Objective-C, GoLang...<p>And I started this software 20 years ago. Granted, a LOT of the software has changed since then. But I don't see a reason to throw away existing code unless it is in need of so much change that rewriting from scratch would be easier. And even then I might stick to what I know best, and what fits best with other parts of the software.</p>
]]></description><pubDate>Wed, 14 Sep 2016 09:48:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=12495317</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12495317</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12495317</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Thank you!</p>
]]></description><pubDate>Wed, 14 Sep 2016 06:27:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=12494624</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12494624</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12494624</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>I don't know. But I <i>do</i> know that the end-of-year statistics from search-engines about what people searched for, are complete BS. I have such a list for the German DeuSu page:<p><a href="https://deusu.de/blog/2015-12-03-alle_jahre_wieder_wonach_deutschland_wirklich_gesucht_hat.html" rel="nofollow">https://deusu.de/blog/2015-12-03-alle_jahre_wieder_wonach_de...</a><p>Warning! This is <i>definitely</i> NSFW! :)</p>
]]></description><pubDate>Tue, 13 Sep 2016 17:56:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490684</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490684</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490684</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Yes. I have downloaded several data dumps, but haven't gotten around to import them yet.</p>
]]></description><pubDate>Tue, 13 Sep 2016 17:41:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490541</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490541</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490541</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Currently €300/month. More details on <a href="https://deusu.org/donate.html" rel="nofollow">https://deusu.org/donate.html</a></p>
]]></description><pubDate>Tue, 13 Sep 2016 17:01:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490155</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490155</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490155</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Bookmarked. Thanks!</p>
]]></description><pubDate>Tue, 13 Sep 2016 17:00:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490138</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490138</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490138</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>File formats will be documented when I publish the data-files in a few weeks.<p>What do you mean with postings?<p>The main index is split into 32 shards (there is also an additional news-index which is updated about every 5-10 minutes). Each shard is updated and queried seperately. The query actually runs 2/3 on a Windows server and 1/3 on a Linux server. The latter in Docker containers. I want to move everything to Linux over time.<p>Query has two phases. First only a rough - but fast - ranking is done. Then the top results of all shards are combined and completely re-ranked. This is basically a meta search engine hidden within.<p>First query phase is in src/searchservernew.dpr, and the second phase is in src/cgi/PostProcess.pas.</p>
]]></description><pubDate>Tue, 13 Sep 2016 16:55:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490087</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490087</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>I have the filter implemented now. It's not perfect yet, but it already filters out a lot of the NSFW stuff. Unless you explicitly search for it.<p>I'm gonna further improve this over the next days. Right now it's just a quick'n dirty hack. :)</p>
]]></description><pubDate>Tue, 13 Sep 2016 16:47:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=12490022</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12490022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12490022</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>I don't know sphinx at all, and my knowledge of lucene is very limited. Which means I don't know how they would compare to DeuSu.</p>
]]></description><pubDate>Tue, 13 Sep 2016 16:04:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=12489532</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12489532</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12489532</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Thank you!<p>Depending on who you are (there were 2 bitcoin donations today), you funded either about 18 or 28 hours of operations. :)</p>
]]></description><pubDate>Tue, 13 Sep 2016 15:37:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=12489206</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12489206</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12489206</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>The software is already open-source.<p>A free search API will be fully available probably next week. It's in testing already. It's just a matter of putting the finishing touches on the documentation.<p>And the crawl- and index-data will be available for download in a few weeks. It's also just a matter of documenting the data-format.<p>BTW: I disagree with your points about privacy. I see DeuSu as a way of fighting back.</p>
]]></description><pubDate>Tue, 13 Sep 2016 14:55:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=12488790</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12488790</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12488790</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Only ASCII and German umlauts (äöüß) at the moment. The parser needs rewriting. It was originally written in pre-unicode times. :)</p>
]]></description><pubDate>Tue, 13 Sep 2016 14:16:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=12488421</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12488421</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12488421</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Originally it was written in Delphi. But I now use FreePascal for the development. I'm even compiling both Windows and Linux versions on my Linux machine.</p>
]]></description><pubDate>Tue, 13 Sep 2016 14:15:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=12488403</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12488403</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12488403</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Yes, it would be better.<p>The snippets are currently the first 255 characters of the page's text. For snippets to be customized to the search term, I would have to store all the text of the page. And that would require a lot more disk space. Space that I can't afford at the moment.</p>
]]></description><pubDate>Tue, 13 Sep 2016 14:13:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=12488382</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12488382</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12488382</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>Some issues that appeared over the years:<p>Block outgoing connects to local IP nets in your firewall. Otherwise your hosting provider might think you are trying to hack them. Apparently there are a lot of links out there that point to hosts which resolve to private IP ranges.<p>Another problem with following links is that you are bound to run across some that are malware command & control servers. Had several complaints to my ISP after authorities took over control of one and used the C&C server's domain as a honeypot. My crawler is on a whitelist now.<p>I had one person who vehemently complained that I was trying to hack him, because the software downloaded his robots.txt. I'm NOT kidding! :)<p>Make sure your robots.txt parsing is working correctly. I had an undiscovered bug in the software at some time which basically caused it to think everything is allowed. Luckily someone was nice enough to let me know. And he was <i>really nice</i> about it. And he would have had every right to be angry.<p>A major bottleneck is DNS queries. Run your own DNS server and even cache the hostname/IP pairs yourself. Do not even think about using your IPS's DNS server. If you bombard them with 100+ DNS requests/s then they WILL be angry. :)</p>
]]></description><pubDate>Tue, 13 Sep 2016 12:31:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=12487524</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12487524</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12487524</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>4 servers in total.<p>2 are used for crawling, index-building and raw-data storage. Quadcore, 32gb RAM, 4tb HDD and 1gbit/s internet connection on each of these. They are rented and in a big data-center. Crawling uses "only" about 200-250mbit/s of bandwidth.<p>2 servers for webserver and queries. Quadcore, 32gb RAM. One with 2x512gb SSD, the other with only 1x512gb SSD. These servers are here at home. I have cable internet with 200mbit/s down, 20mbit/s up. Static IPs obviously.<p>A full crawl currently takes about 3 months.</p>
]]></description><pubDate>Tue, 13 Sep 2016 11:13:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=12487003</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12487003</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12487003</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>It's all open-source. So, yes.</p>
]]></description><pubDate>Tue, 13 Sep 2016 10:55:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=12486931</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12486931</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12486931</guid></item><item><title><![CDATA[New comment by deusu in "Show HN: Open-source search engine with 2bn-page index"]]></title><description><![CDATA[
<p>I will publish the index for download in a few weeks. I'm currently working on the documentation. Oh, and I will publish the raw crawl-data too. Everything together is about 2.5tb.<p>There is also a free API in beta-test right now. Will probably be ready for official release next week.</p>
]]></description><pubDate>Tue, 13 Sep 2016 10:32:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=12486834</link><dc:creator>deusu</dc:creator><comments>https://news.ycombinator.com/item?id=12486834</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=12486834</guid></item></channel></rss>