Hacker News: mrtimo

New comment by mrtimo in "Design and implementation of DuckDB internals"

mrtimo — Tue, 14 Apr 2026 04:11:12 +0000

If you are a data scientist or do anything with data... duckdb is like a swiss army knife. So many great ways it can help your workflow. The original video from CMU in 2020 [1] is a classic. Minutes 3-8 present a good argument for adding duckdb to your data cleaning/processing workflow.

And if you want to add a semantic layer on top of data, Malloy [2] is my favorite so far (it has duckdb built in):

[1]: https://www.youtube.com/watch?v=PFUZlNQIndo [2]: https://docs.malloydata.dev/documentation/

New comment by mrtimo in "Show HN: I built a RAG search engine over the Epstein court documents"

mrtimo — Mon, 09 Feb 2026 21:06:44 +0000

Nice work. Have you blogged about how you built it?

Dark Web Data Pricing 2025: Real Costs of Stolen Data and Services

mrtimo — Mon, 19 Jan 2026 15:15:28 +0000

Article URL: https://deepstrike.io/blog/dark-web-data-pricing-2025

Comments URL: https://news.ycombinator.com/item?id=46679902

Points: 2

# Comments: 0

New comment by mrtimo in "Why DuckDB is my first choice for data processing"

mrtimo — Sat, 17 Jan 2026 00:26:24 +0000

Spotify says it will take 30 days for the export... it really only takes about 48 hours if I remember correctly. While you wait for the download here is an example listening history exploration in malloy - I converted the listening history to .parquet: https://github.com/mrtimo/spotify-listening-history

New comment by mrtimo in "Why DuckDB is my first choice for data processing"

mrtimo — Fri, 16 Jan 2026 17:02:41 +0000

What I love about duckdb:

-- Support for .parquet, .json, .csv (note: Spotify listening history comes in a multiple .json files, something fun to play with).

-- Support for glob reading, like: select * from 'tsa20*.csv' - so you can read hundreds of files (any type of file!) as if they were one file.

-- if the files don't have the same schema, union_by_name is amazing.

-- The .csv parser is amazing. Auto assigns types well.

-- It's small! The Web Assembly version is 2mb! The CLI is 16mb.

-- Because it is small you can add duckdb directly to your product, like Malloy has done: https://www.malloydata.dev/ - I think of Malloy as a technical persons alternative to PowerBI and Tableau, but it uses a semantic model that helps AI write amazing queries on your data. Edit: Malloy makes SQL 10x easier to write because of its semantic nature. Malloy transpiles to SQL, like Typescript transpiles to Javascript.

New comment by mrtimo in "Databases in 2025: A Year in Review"

mrtimo — Mon, 05 Jan 2026 17:53:30 +0000

Nice work Andy. I'd love to hear about semantic layer developments in this space (e.g. Malloy etc.). Something to consider for the future. Thanks.

New comment by mrtimo in "Ask HN: What Are You Working On? (December 2025)"

mrtimo — Mon, 15 Dec 2025 05:20:24 +0000

Me too

Mistral misspelled Ministral on HuggingFace and Ollama

mrtimo — Tue, 02 Dec 2025 18:07:12 +0000

Article URL: https://huggingface.co/collections/mistralai/ministral-3

Comments URL: https://news.ycombinator.com/item?id=46124284

Points: 4

# Comments: 1

New comment by mrtimo in "Hosting SQLite Databases on GitHub Pages (2021)"

mrtimo — Wed, 29 Oct 2025 18:41:59 +0000

I'm using DuckDB WASM on github pages. This will take about 10 seconds to load [1] and shows business trends in my county (Spokane County). This site is built using data-explorer [2] which uses many other open-source projects including malloy and malloy-explorer. One cool thing... if you use the UI to make a query on the data - you can share the URL with someone and they will see the same result / query (it's all embedded in the URL).

[1] - https://mrtimo.github.io/spokane-co-biz/#/model/businesses/e... [2] - https://github.com/aszenz/data-explorer

New comment by mrtimo in "Show HN: JSON Query"

mrtimo — Mon, 27 Oct 2025 19:22:42 +0000

DuckDB can read JSON - you can query JSON with normal SQL.[1] I prefer to Malloy Data language for querying as it is 10x simpler than SQL.[2]

[1] - https://duckdb.org/docs/stable/data/json/overview [2] - https://www.malloydata.dev/

New comment by mrtimo in "A sharded DuckDB on 63 nodes runs 1T row aggregation challenge in 5 sec"

mrtimo — Fri, 24 Oct 2025 16:44:19 +0000

I have experience with duckDB but not databricks... from the perspective of a company, is a tool like databricks more "secure" than duckdb? If my company adopts duckdb as a datalake, how do we secure it?

New comment by mrtimo in "Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB"

mrtimo — Mon, 20 Oct 2025 03:27:25 +0000

Love this! Here is a similar product: https://sql-workbench.com/

New comment by mrtimo in "Show HN: Duck-UI – Browser-Based SQL IDE for DuckDB"

mrtimo — Mon, 20 Oct 2025 02:54:11 +0000

Based on this comment, you might enjoy the Malloy data language. It compiles to SQL and also have an open source explorer to make filters like what you are saying easy.

New comment by mrtimo in "SQLite's File Format"

mrtimo — Sun, 07 Sep 2025 20:38:37 +0000

It’s 2025. Let’s separate storage from processing. SQLite showed how elegant embedded databases can be, but the real win is formats like Parquet: boring, durable storage you can read with any engine. Storage stays simple, compute stays swappable. That’s the future.

New comment by mrtimo in "Polars Cloud and Distributed Polars now available"

mrtimo — Thu, 04 Sep 2025 17:54:03 +0000

Same. But, I use Malloy which uses duckdb to query data stored in hundreds of parquet files (as if they were one big file).

New comment by mrtimo in "Polars Cloud and Distributed Polars now available"

mrtimo — Thu, 04 Sep 2025 14:01:49 +0000

I agree with this 100%. The creator of duckdb argues that people using pandas are missing out of the 50 years of progress in database research, in the first 5 minutes of his talk here [1].

I've been using Malloy [2], which compiles to SQL (like Typescript compiles to Javascript), so instead of editing a 1000 line SQL script, it's only 18 lines of Malloy.

I'd love to see a blog post comparing a pandas approach to cleaning to an SQL/Malloy approach.

[1] https://www.youtube.com/watch?v=PFUZlNQIndo [2] https://www.malloydata.dev/

New comment by mrtimo in "Lessons from building an AI data analyst"

mrtimo — Wed, 03 Sep 2025 21:57:03 +0000

Very cool to see Malloy mentioned here. Great stuff. There is an MCP server built into Malloy Publisher[1]. Perhaps useful to the author or others trying to do something similar to what the author describes. Directions on how to use the MCP server are here [2]. [1] https://github.com/malloydata/publisher [2] https://github.com/malloydata/publisher/blob/main/docs/ai-ag...

New comment by mrtimo in "Gemma 3 270M: Compact model for hyper-efficient AI"

mrtimo — Thu, 14 Aug 2025 19:18:03 +0000

I'm a business professor who teaches Python and more. I'd like to develop some simple projects to help my students fine tune this for a business purpose. If you have ideas (or datasets for fine tuning), let me know!

New comment by mrtimo in "Getting AI to write good SQL"

mrtimo — Sat, 17 May 2025 01:40:47 +0000

Malloy [1] has a semantic layer [2]... and Model Context Protocol (MCP) support is being added through Publisher [3]. Something to keep an eye on. Seems like a great fit for LLMs.

[1] https://www.malloydata.dev/ [2] https://docs.malloydata.dev/documentation/user_guides/malloy... [3] https://github.com/malloydata/publisher

New comment by mrtimo in "Launch HN: Miyagi (YC W25) turns YouTube videos into online, interactive courses"

mrtimo — Tue, 13 May 2025 20:49:47 +0000

No python course?