<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mrtimo</title><link>https://news.ycombinator.com/user?id=mrtimo</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 14 Apr 2026 17:15:54 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mrtimo" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mrtimo in "Design and implementation of DuckDB internals"]]></title><description><![CDATA[
<p>If you are a data scientist or do anything with data... duckdb is like a swiss army knife. So many great ways it can help your workflow. The original video from CMU in 2020 [1] is a classic. Minutes 3-8 present a good argument for adding duckdb to your data cleaning/processing workflow.<p>And if you want to add a semantic layer on top of data, Malloy [2] is my favorite so far (it has duckdb built in):<p>[1]: <a href="https://www.youtube.com/watch?v=PFUZlNQIndo" rel="nofollow">https://www.youtube.com/watch?v=PFUZlNQIndo</a>
[2]: <a href="https://docs.malloydata.dev/documentation/" rel="nofollow">https://docs.malloydata.dev/documentation/</a></p>
]]></description><pubDate>Tue, 14 Apr 2026 04:11:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=47761139</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=47761139</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47761139</guid></item><item><title><![CDATA[New comment by mrtimo in "Show HN: I built a RAG search engine over the Epstein court documents"]]></title><description><![CDATA[
<p>Nice work. Have you blogged about how you built it?</p>
]]></description><pubDate>Mon, 09 Feb 2026 21:06:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=46951318</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46951318</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46951318</guid></item><item><title><![CDATA[Dark Web Data Pricing 2025: Real Costs of Stolen Data and Services]]></title><description><![CDATA[
<p>Article URL: <a href="https://deepstrike.io/blog/dark-web-data-pricing-2025">https://deepstrike.io/blog/dark-web-data-pricing-2025</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46679902">https://news.ycombinator.com/item?id=46679902</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 19 Jan 2026 15:15:28 +0000</pubDate><link>https://deepstrike.io/blog/dark-web-data-pricing-2025</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46679902</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46679902</guid></item><item><title><![CDATA[New comment by mrtimo in "Why DuckDB is my first choice for data processing"]]></title><description><![CDATA[
<p>Spotify says it will take 30 days for the export... it really only takes about 48 hours if I remember correctly.
While you wait for the download here is an example listening history exploration in malloy - I converted the listening history to .parquet: <a href="https://github.com/mrtimo/spotify-listening-history" rel="nofollow">https://github.com/mrtimo/spotify-listening-history</a></p>
]]></description><pubDate>Sat, 17 Jan 2026 00:26:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=46654034</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46654034</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46654034</guid></item><item><title><![CDATA[New comment by mrtimo in "Why DuckDB is my first choice for data processing"]]></title><description><![CDATA[
<p>What I love about duckdb:<p>-- Support for .parquet, .json, .csv (note: Spotify listening history comes in a multiple .json files, something fun to play with).<p>-- Support for glob reading, like: select * from 'tsa20*.csv' - so you can read hundreds of files (any type of file!) as if they were one file.<p>-- if the files don't have the same schema, union_by_name is amazing.<p>-- The .csv parser is amazing. Auto assigns types well.<p>-- It's small! The Web Assembly version is 2mb! The CLI is 16mb.<p>-- Because it is small you can add duckdb directly to your product, like Malloy has done: <a href="https://www.malloydata.dev/" rel="nofollow">https://www.malloydata.dev/</a> - I think of Malloy as a technical persons alternative to PowerBI and Tableau, but it uses a semantic model that helps AI write amazing queries on your data. Edit: Malloy makes SQL 10x easier to write because of its semantic nature. Malloy transpiles to SQL, like Typescript transpiles to Javascript.</p>
]]></description><pubDate>Fri, 16 Jan 2026 17:02:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=46648739</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46648739</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46648739</guid></item><item><title><![CDATA[New comment by mrtimo in "Databases in 2025: A Year in Review"]]></title><description><![CDATA[
<p>Nice work Andy. I'd love to hear about semantic layer developments in this space (e.g. Malloy etc.). Something to consider for the future. Thanks.</p>
]]></description><pubDate>Mon, 05 Jan 2026 17:53:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=46502137</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46502137</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46502137</guid></item><item><title><![CDATA[New comment by mrtimo in "Ask HN: What Are You Working On? (December 2025)"]]></title><description><![CDATA[
<p>Me too</p>
]]></description><pubDate>Mon, 15 Dec 2025 05:20:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=46270744</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46270744</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46270744</guid></item><item><title><![CDATA[Mistral misspelled Ministral on HuggingFace and Ollama]]></title><description><![CDATA[
<p>Article URL: <a href="https://huggingface.co/collections/mistralai/ministral-3">https://huggingface.co/collections/mistralai/ministral-3</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46124284">https://news.ycombinator.com/item?id=46124284</a></p>
<p>Points: 4</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 02 Dec 2025 18:07:12 +0000</pubDate><link>https://huggingface.co/collections/mistralai/ministral-3</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=46124284</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46124284</guid></item><item><title><![CDATA[New comment by mrtimo in "Hosting SQLite Databases on GitHub Pages (2021)"]]></title><description><![CDATA[
<p>I'm using DuckDB WASM on github pages. This will take about 10 seconds to load [1] and shows business trends in my county (Spokane County). This site is built using data-explorer [2] which uses many other open-source projects including malloy and malloy-explorer. One cool thing... if you use the UI to make a query on the data - you can share the URL with someone and they will see the same result / query (it's all embedded in the URL).<p>[1] - <a href="https://mrtimo.github.io/spokane-co-biz/#/model/businesses/explorer/businesses?query=run:%20businesses-%3Ebig_dashboard&run=true" rel="nofollow">https://mrtimo.github.io/spokane-co-biz/#/model/businesses/e...</a>
[2] - <a href="https://github.com/aszenz/data-explorer" rel="nofollow">https://github.com/aszenz/data-explorer</a></p>
]]></description><pubDate>Wed, 29 Oct 2025 18:41:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=45751220</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45751220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45751220</guid></item><item><title><![CDATA[New comment by mrtimo in "Show HN: JSON Query"]]></title><description><![CDATA[
<p>DuckDB can read JSON - you can query JSON with normal SQL.[1]
I prefer to Malloy Data language for querying as it is 10x simpler than SQL.[2]<p>[1] - <a href="https://duckdb.org/docs/stable/data/json/overview" rel="nofollow">https://duckdb.org/docs/stable/data/json/overview</a>
[2] - <a href="https://www.malloydata.dev/" rel="nofollow">https://www.malloydata.dev/</a></p>
]]></description><pubDate>Mon, 27 Oct 2025 19:22:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=45725198</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45725198</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45725198</guid></item><item><title><![CDATA[New comment by mrtimo in "A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec"]]></title><description><![CDATA[
<p>I have experience with duckDB but not databricks... from the perspective of a company, is a tool like databricks more "secure" than duckdb? If my company adopts duckdb as a datalake, how do we secure it?</p>
]]></description><pubDate>Fri, 24 Oct 2025 16:44:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=45696395</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45696395</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45696395</guid></item><item><title><![CDATA[New comment by mrtimo in "Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB"]]></title><description><![CDATA[
<p>Love this! Here is a similar product: <a href="https://sql-workbench.com/" rel="nofollow">https://sql-workbench.com/</a></p>
]]></description><pubDate>Mon, 20 Oct 2025 03:27:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45639968</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45639968</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45639968</guid></item><item><title><![CDATA[New comment by mrtimo in "Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB"]]></title><description><![CDATA[
<p>Based on this comment, you might enjoy the Malloy data language. It compiles to SQL and also have an open source explorer to make filters like what you are saying easy.</p>
]]></description><pubDate>Mon, 20 Oct 2025 02:54:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=45639854</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45639854</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45639854</guid></item><item><title><![CDATA[New comment by mrtimo in "SQLite's File Format"]]></title><description><![CDATA[
<p>It’s 2025. Let’s separate storage from processing. SQLite showed how elegant embedded databases can be, but the real win is formats like Parquet: boring, durable storage you can read with any engine. Storage stays simple, compute stays swappable. That’s the future.</p>
]]></description><pubDate>Sun, 07 Sep 2025 20:38:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=45161916</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45161916</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45161916</guid></item><item><title><![CDATA[New comment by mrtimo in "Polars Cloud and Distributed Polars now available"]]></title><description><![CDATA[
<p>Same. But, I use Malloy which uses duckdb to query data stored in hundreds of parquet files (as if they were one big file).</p>
]]></description><pubDate>Thu, 04 Sep 2025 17:54:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=45130116</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45130116</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45130116</guid></item><item><title><![CDATA[New comment by mrtimo in "Polars Cloud and Distributed Polars now available"]]></title><description><![CDATA[
<p>I agree with this 100%. The creator of duckdb argues that people using pandas are missing out of the 50 years of progress in database research, in the first 5 minutes of his talk here [1].<p>I've been using Malloy [2], which compiles to SQL (like Typescript compiles to Javascript), so instead of editing a 1000 line SQL script, it's only 18 lines of Malloy.<p>I'd love to see a blog post comparing a pandas approach to cleaning to an SQL/Malloy approach.<p>[1] <a href="https://www.youtube.com/watch?v=PFUZlNQIndo" rel="nofollow">https://www.youtube.com/watch?v=PFUZlNQIndo</a>
[2] <a href="https://www.malloydata.dev/" rel="nofollow">https://www.malloydata.dev/</a></p>
]]></description><pubDate>Thu, 04 Sep 2025 14:01:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=45127389</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45127389</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45127389</guid></item><item><title><![CDATA[New comment by mrtimo in "Lessons from building an AI data analyst"]]></title><description><![CDATA[
<p>Very cool to see Malloy mentioned here. Great stuff. There is an MCP server built into Malloy Publisher[1]. Perhaps useful to the author or others trying to do something similar to what the author describes. Directions on how to use the MCP server are here [2].
[1] <a href="https://github.com/malloydata/publisher" rel="nofollow">https://github.com/malloydata/publisher</a>
[2] <a href="https://github.com/malloydata/publisher/blob/main/docs/ai-agents.md" rel="nofollow">https://github.com/malloydata/publisher/blob/main/docs/ai-ag...</a></p>
]]></description><pubDate>Wed, 03 Sep 2025 21:57:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=45120816</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=45120816</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45120816</guid></item><item><title><![CDATA[New comment by mrtimo in "Gemma 3 270M: Compact model for hyper-efficient AI"]]></title><description><![CDATA[
<p>I'm a business professor who teaches Python and more. I'd like to develop some simple projects to help my students fine tune this for a business purpose. If you have ideas (or datasets for fine tuning), let me know!</p>
]]></description><pubDate>Thu, 14 Aug 2025 19:18:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=44904459</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=44904459</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44904459</guid></item><item><title><![CDATA[New comment by mrtimo in "Getting AI to write good SQL"]]></title><description><![CDATA[
<p>Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.<p>[1] <a href="https://www.malloydata.dev/" rel="nofollow">https://www.malloydata.dev/</a>
[2] <a href="https://docs.malloydata.dev/documentation/user_guides/malloy_by_example" rel="nofollow">https://docs.malloydata.dev/documentation/user_guides/malloy...</a>
[3] <a href="https://github.com/malloydata/publisher">https://github.com/malloydata/publisher</a></p>
]]></description><pubDate>Sat, 17 May 2025 01:40:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=44011396</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=44011396</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44011396</guid></item><item><title><![CDATA[New comment by mrtimo in "Launch HN: Miyagi (YC W25) turns YouTube videos into online, interactive courses"]]></title><description><![CDATA[
<p>No python course?</p>
]]></description><pubDate>Tue, 13 May 2025 20:49:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=43977565</link><dc:creator>mrtimo</dc:creator><comments>https://news.ycombinator.com/item?id=43977565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43977565</guid></item></channel></rss>