<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: szarnyasg</title><link>https://news.ycombinator.com/user?id=szarnyasg</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 14:05:32 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=szarnyasg" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by szarnyasg in "Distributed DuckDB Instance"]]></title><description><![CDATA[
<p>DuckDB devrel here. You are right. This was in the FAQ but I also added it to the DuckLake documentation's main page at <a href="https://ducklake.select/docs/stable/" rel="nofollow">https://ducklake.select/docs/stable/</a></p>
]]></description><pubDate>Wed, 15 Apr 2026 13:44:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47778883</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47778883</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47778883</guid></item><item><title><![CDATA[New comment by szarnyasg in "Distributed DuckDB Instance"]]></title><description><![CDATA[
<p>DuckLake does not use B+ trees and it handles fragmentation with techniques like partial files and compaction upon checkpointing. The developers of DuckLake talks about this here: <a href="https://youtu.be/7Su0aVzbb-U?t=689" rel="nofollow">https://youtu.be/7Su0aVzbb-U?t=689</a><p>(Disclaimer: I work at DuckDB Labs)</p>
]]></description><pubDate>Tue, 14 Apr 2026 19:56:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=47770643</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47770643</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47770643</guid></item><item><title><![CDATA[New comment by szarnyasg in "Distributed DuckDB Instance"]]></title><description><![CDATA[
<p>Hi, DuckDB DevRel here. To have concurrent read-write access to a database, you can use our DuckLake lakehouse format and coordinate concurrent access through a shared Postgres catalog. We released v1.0 yesterday: <a href="https://ducklake.select/2026/04/13/ducklake-10/" rel="nofollow">https://ducklake.select/2026/04/13/ducklake-10/</a><p>I updated your reference [0] with this information.</p>
]]></description><pubDate>Tue, 14 Apr 2026 07:56:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47762638</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47762638</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47762638</guid></item><item><title><![CDATA[New comment by szarnyasg in "Grafeo – A fast, lean, embeddable graph database built in Rust"]]></title><description><![CDATA[
<p>That's a difficult question and I would like to avoid giving a direct answer (because I co-lead a nonprofit benchmarking graph databases) but even knowing what you need for a graph database can be a tricky decision. See my FOSDEM 2025 talk, where I tried to make sense of the field:<p><a href="https://archive.fosdem.org/2025/schedule/event/fosdem-2025-5413-graph-databases-after-15-years-where-are-they-headed-/" rel="nofollow">https://archive.fosdem.org/2025/schedule/event/fosdem-2025-5...</a></p>
]]></description><pubDate>Sat, 21 Mar 2026 19:38:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=47470498</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47470498</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47470498</guid></item><item><title><![CDATA[New comment by szarnyasg in "Big data on the cheapest MacBook"]]></title><description><![CDATA[
<p>That's a good point. I re-ran the benchmark on two instances:<p>- c8gd.4xlarge - this has a single 950 GB NVMe SSD.<p>- c5ad.4xlarge - this has 2 x 300 GB disks, which I put in a RAID 0 array. There are no c6ad.4xlarge instances, so this is the closes NVMe-enabled approximate to ClickBench's most popular choice, c6a.4xlarge.<p>I also added results from my local dev machine, a MacBook M1 Max with 64 GB RAM and 10 cores.<p>Here are the results:<p><pre><code>  | machine        | cold_run_avg | cold_run_sum | hot_run_avg | hot_run_sum |
  | -------------- | -----------: | -----------: | ----------: | ----------: |
  | macbook m1 max |         0.48 |        20.68 |        0.43 |       18.60 |
  | macbook neo    |         1.39 |        59.73 |        1.26 |       54.27 |
  | c8gd.4xlarge   |         0.51 |        22.04 |        0.24 |       10.36 |
  | c5ad.4xlarge   |         1.29 |        54.14 |        0.55 |       22.91 |
  | c6a.4xlarge    |         3.37 |       145.08 |        1.11 |       47.86 |
  | c8g.metal-48xl |         3.95 |       169.67 |        0.10 |        4.35 |
</code></pre>
On the cold run, the MacBook is on par with the c5ad.4xlarge. The c8gd.4xlarge is about ~2.5x faster on the cold run.<p>I know this is moving the goalpost, however, it's quite interesting that both of these cloud instances with instance-attached storage are still outperformed by the M1 Max (which is 4+ years old) on the cold run. And they would quite likely lose against the latest MacBook Pro with the M5 Pro/Max on both the cold and the hot runs. But that's an experiment for another day.</p>
]]></description><pubDate>Fri, 13 Mar 2026 07:58:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47361723</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47361723</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47361723</guid></item><item><title><![CDATA[New comment by szarnyasg in "Big data on the cheapest MacBook"]]></title><description><![CDATA[
<p>Indeed, it would have been interesting but I really wanted to get the blog post out on the launch day of the MacBook Neo and did not have the bandwidth to run additional cloud experiments.<p>I ran TPC-DS SF300 now on the c6a.4xlarge. It turns out that it's still quite limited by the EBS disk's IO: while 32 GB memory is much more than 8 GB, DuckDB needs to spill to disk a lot and this shows on the runtimes. Running all 99 queries took 37 minutes, so about half of the MacBook's 79 minutes.<p>> Command being timed: "duckdb tpcds-sf300.db -f bench.sql"<p>> Percent of CPU this job got: 250%<p>> Elapsed (wall clock) time (h:mm:ss or m:ss): 37:00.96<p>> Maximum resident set size (kbytes): 25559652</p>
]]></description><pubDate>Thu, 12 Mar 2026 20:38:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47356732</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47356732</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47356732</guid></item><item><title><![CDATA[New comment by szarnyasg in "Big Data on the Cheapest MacBook"]]></title><description><![CDATA[
<p>You're right! I pushed an updated TL;DR block.</p>
]]></description><pubDate>Thu, 12 Mar 2026 14:30:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=47351048</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=47351048</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47351048</guid></item><item><title><![CDATA[DuckDB 1.4.3 LTS with Native Windows ARM64 Support]]></title><description><![CDATA[
<p>Article URL: <a href="https://duckdb.org/2025/12/09/announcing-duckdb-143">https://duckdb.org/2025/12/09/announcing-duckdb-143</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46204891">https://news.ycombinator.com/item?id=46204891</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 09 Dec 2025 13:48:37 +0000</pubDate><link>https://duckdb.org/2025/12/09/announcing-duckdb-143</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=46204891</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46204891</guid></item><item><title><![CDATA[DuckLake 0.3 with Iceberg Interoperability and Geometry Support]]></title><description><![CDATA[
<p>Article URL: <a href="https://ducklake.select/2025/09/17/ducklake-03/">https://ducklake.select/2025/09/17/ducklake-03/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45291313">https://news.ycombinator.com/item?id=45291313</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 18 Sep 2025 16:02:53 +0000</pubDate><link>https://ducklake.select/2025/09/17/ducklake-03/</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=45291313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45291313</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckLake is an integrated data lake and catalog format"]]></title><description><![CDATA[
<p>Yes - updates on existing rows are supported.<p>(I work at DuckDB Labs.)</p>
]]></description><pubDate>Tue, 27 May 2025 19:02:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=44109768</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=44109768</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44109768</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckLake is an integrated data lake and catalog format"]]></title><description><![CDATA[
<p>Great!<p>> About the COPY statement, it means we can drop Parquet files ourselves in the blob storage ?<p>Dropping the Parquet files on the blob storage will not work – you have to COPY them through DuckLake so that the catalog databases is updated with the required catalog and metadata information.</p>
]]></description><pubDate>Tue, 27 May 2025 15:03:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=44107670</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=44107670</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44107670</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckLake is an integrated data lake and catalog format"]]></title><description><![CDATA[
<p>The YouTube video “Apache Iceberg: What It Is and Why Everyone’s Talking About It” by Tim Berglund explains data lakes really well in the opening minutes: <a href="https://www.youtube.com/watch?v=TsmhRZElPvM" rel="nofollow">https://www.youtube.com/watch?v=TsmhRZElPvM</a></p>
]]></description><pubDate>Tue, 27 May 2025 14:49:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=44107516</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=44107516</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44107516</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckLake is an integrated data lake and catalog format"]]></title><description><![CDATA[
<p>Yes, you can use standard SQL constructs such as INSERT statements and COPY to load data into DuckLake.<p>(diclaimer: I work at DuckDB Labs)</p>
]]></description><pubDate>Tue, 27 May 2025 14:46:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=44107484</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=44107484</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44107484</guid></item><item><title><![CDATA[New comment by szarnyasg in "A lost decade chasing distributed architectures for data analytics?"]]></title><description><![CDATA[
<p>AWS started offering local SSD storage up to 2 TB in 2012 (HI1 instance type) and in late 2013 this went up to 6.4 TB (I2 instance type). While these amounts don't cover all customers, plenty of data fits on these machines. But the software stack to analyze it efficiently was lacking, especially in the open-source space.</p>
]]></description><pubDate>Thu, 22 May 2025 13:43:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=44061989</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=44061989</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44061989</guid></item><item><title><![CDATA[New comment by szarnyasg in "The DuckDB Local UI"]]></title><description><![CDATA[
<p>Hi, DuckDB devrel here. DuckDB is an analytical SQL database in the form factor of SQLite (i.e., in-process). This quadrant summarizes its space in the landscape:<p><a href="https://blobs.duckdb.org/slides/goto-amsterdam-2024-duckdb-gabor-szarnyas.pdf#page=19" rel="nofollow">https://blobs.duckdb.org/slides/goto-amsterdam-2024-duckdb-g...</a><p>It works as a replacement / complementary component to dataframe libraries due to it's speed and (vertical) scalability. It's lightweight and dependency-free, so it also works as part of data processing pipelines.</p>
]]></description><pubDate>Thu, 13 Mar 2025 10:04:21 +0000</pubDate><link>https://news.ycombinator.com/item?id=43351767</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=43351767</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43351767</guid></item><item><title><![CDATA[New comment by szarnyasg in "The DuckDB Local UI"]]></title><description><![CDATA[
<p>I'm a co-author of the blog post. I agree that the wording was confusing – apologies for the confusion. I added a note at the end:<p>> The repository does not contain the source code for the frontend, which is currently not available as open-source. Releasing it as open-source is under consideration.</p>
]]></description><pubDate>Wed, 12 Mar 2025 17:31:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=43345622</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=43345622</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43345622</guid></item><item><title><![CDATA[New comment by szarnyasg in "Be Aware of the Makefile Effect"]]></title><description><![CDATA[
<p>I have observed the Makefile effect many times for LaTeX documents. Most researchers I worked with had a LaTeX file full of macros that they have been carrying from project to project for years. These were often inherited from more senior researchers, and were hammered into heavily-modified forks of article templates used in their field or thesis templates used at their institution.</p>
]]></description><pubDate>Sat, 11 Jan 2025 06:50:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=42663859</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=42663859</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42663859</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckDB is faster at counting the lines of a CSV file than wc"]]></title><description><![CDATA[
<p>I am the author of the original post and I also wrote a followup blog post on it yesterday: <a href="https://szarnyasg.org/posts/duckdb-vs-coreutils/" rel="nofollow">https://szarnyasg.org/posts/duckdb-vs-coreutils/</a><p>Yes, if you break the file into parts with GNU Parallel, you can easily beat DuckDB as I show in the blog post.<p>That said, I maintain that it's surprising that DuckDB outperforms wc (and grep) on many common setups, e.g., on a MacBook. This is not something many databases can do, and the ones which can usually don't run on a laptop.</p>
]]></description><pubDate>Thu, 05 Dec 2024 23:27:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=42334022</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=42334022</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42334022</guid></item><item><title><![CDATA[New comment by szarnyasg in "DuckDB over Pandas/Polars"]]></title><description><![CDATA[
<p>Hi – DuckDB Labs devrel here. It's great that you find DuckDB useful!<p>On the setup side, I agree that local (instance-attached) disks should be preferred but does EBS incur an IO fee? It incurs a significant latency for sure but it doesn't have a per-operation pricing:<p>> I/O is included in the price of the volumes, so you pay only for each GB of storage you provision.<p>(<a href="https://aws.amazon.com/ebs/pricing/" rel="nofollow">https://aws.amazon.com/ebs/pricing/</a>)</p>
]]></description><pubDate>Wed, 06 Nov 2024 15:30:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=42063813</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=42063813</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42063813</guid></item><item><title><![CDATA[DuckDB in Python in the Browser with Pyodide, PyScript, and JupyterLite]]></title><description><![CDATA[
<p>Article URL: <a href="https://duckdb.org/2024/10/02/pyodide.html">https://duckdb.org/2024/10/02/pyodide.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41741410">https://news.ycombinator.com/item?id=41741410</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 04 Oct 2024 13:52:08 +0000</pubDate><link>https://duckdb.org/2024/10/02/pyodide.html</link><dc:creator>szarnyasg</dc:creator><comments>https://news.ycombinator.com/item?id=41741410</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41741410</guid></item></channel></rss>