<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tomnicholas1</title><link>https://news.ycombinator.com/user?id=tomnicholas1</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 08:53:04 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tomnicholas1" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tomnicholas1 in "Matadisco – Decentralized Data Discovery"]]></title><description><![CDATA[
<p>Awesome to see this project here - it was partly inspired by my blog post (original is linked from the OP, but there's a slightly newer version on my personal site here[0]).<p>[0]: <a href="https://tom-nicholas.com/blog/2025/science-needs-a-social-network/" rel="nofollow">https://tom-nicholas.com/blog/2025/science-needs-a-social-ne...</a></p>
]]></description><pubDate>Sat, 28 Mar 2026 16:03:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=47555822</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=47555822</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47555822</guid></item><item><title><![CDATA[Periodic Labs]]></title><description><![CDATA[
<p>Article URL: <a href="https://periodic.com/">https://periodic.com/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47281266">https://news.ycombinator.com/item?id=47281266</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 06 Mar 2026 21:24:16 +0000</pubDate><link>https://periodic.com/</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=47281266</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47281266</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "A distributed queue in a single JSON file on object storage"]]></title><description><![CDATA[
<p>What you describe is very similar to how Icechunk[1] works. It works beautifully for transactional writes to "repos" containing PBs of scientific array data in object storage.<p>[1]: <a href="https://icechunk.io/en/latest/" rel="nofollow">https://icechunk.io/en/latest/</a></p>
]]></description><pubDate>Tue, 24 Feb 2026 19:10:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47141269</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=47141269</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47141269</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Show HN: Streaming gigabyte medical images from S3 without downloading them"]]></title><description><![CDATA[
<p>People have literally used Zarr for this - at one point Gemini used Zarr for checkpointing model weights. Not sure what the current fashion in that space is though.<p>It's definitely one of many fields that see convergent evolution towards something that just looks like Zarr. In fact you can use VirtualiZarr to parse HuggingFace's "SafeTensors" format [0].<p>[0]: <a href="https://github.com/zarr-developers/VirtualiZarr/pull/555" rel="nofollow">https://github.com/zarr-developers/VirtualiZarr/pull/555</a></p>
]]></description><pubDate>Sat, 17 Jan 2026 18:35:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=46660598</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=46660598</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46660598</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Show HN: Streaming gigabyte medical images from S3 without downloading them"]]></title><description><![CDATA[
<p>IMO Zarr is that newer format. It abstracts over the features of all these other formats so neatly that it can literally subsume them.<p>I feel that we no longer really need TIFF etc. - for scientific use cases in the cloud Zarr is all that's needed going forwards. The other file formats become just archival blobs that either are converted to Zarr or pointed at by virtual Zarr stores.</p>
]]></description><pubDate>Sat, 17 Jan 2026 17:53:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=46660185</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=46660185</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46660185</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Show HN: Streaming gigabyte medical images from S3 without downloading them"]]></title><description><![CDATA[
<p>The generalized form of this range-request-based streaming approach looks something like my project VirtualiZarr [0].<p>Many of these scientific file formats (HDF5, netCDF, TIFF/COG, FITS, GRIB, JPEG and more) are essentially just contiguous multidimensional array(/"tensor") chunks embedded alongside metadata about what's in the chunks. Efficiently fetching these from object storage is just about efficiently fetching the metadata up front so you know where the chunks you want are [1].<p>The data model of Zarr [2] generalizes this pattern pretty well, so that when backed by Icechunk [3], you can store a "datacube" of "virtual chunk references" that point at chunks anywhere inside the original files on S3.<p>This allows you to stream data out as fast as the S3 network connection allows [4], and then you're free to pull that directly, or build tile servers on top of it [5].<p>In the Pangeo project and at Earthmover we do all this for Weather and Climate science data. But the underlying OSS stack is domain-agnostic, so works for all sorts of multidimensional array data, and VirtualiZarr has a plugin system for parsing different scientific file formats.<p>I would love to see if someone could create a virtual Zarr store pointing at this WSI data!<p>[0]: <a href="https://virtualizarr.readthedocs.io/en/stable/" rel="nofollow">https://virtualizarr.readthedocs.io/en/stable/</a><p>[1]: <a href="https://earthmover.io/blog/fundamentals-what-is-cloud-optimized-scientific-data" rel="nofollow">https://earthmover.io/blog/fundamentals-what-is-cloud-optimi...</a><p>[2]: <a href="https://earthmover.io/blog/what-is-zarr" rel="nofollow">https://earthmover.io/blog/what-is-zarr</a><p>[3]: <a href="https://earthmover.io/blog/icechunk-1-0-production-grade-cloud-native-array-storage-is-here" rel="nofollow">https://earthmover.io/blog/icechunk-1-0-production-grade-clo...</a><p>[4]: <a href="https://earthmover.io/blog/i-o-maxing-tensors-in-the-cloud" rel="nofollow">https://earthmover.io/blog/i-o-maxing-tensors-in-the-cloud</a><p>[5]: <a href="https://earthmover.io/blog/announcing-flux" rel="nofollow">https://earthmover.io/blog/announcing-flux</a></p>
]]></description><pubDate>Sat, 17 Jan 2026 16:22:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46659254</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=46659254</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46659254</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Programmers and software developers lost the plot on naming their tools"]]></title><description><![CDATA[
<p>God this article is 10000% better than the posted one. This is great:<p>> Names should not describe what you currently think the thing you’re naming is for. Imagine naming your newborn child "Doctor", or "SupportsMeInMyOldAge". Poor kid.</p>
]]></description><pubDate>Fri, 12 Dec 2025 15:39:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46245135</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=46245135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46245135</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "F3: Open-source data file format for the future [pdf]"]]></title><description><![CDATA[
<p>Thank you for the explanation! But what a mess.<p>I would love to bring these benefits to the multidimensional array world, via integration with the Zarr/Icechunk formats somehow (which I work on). But this fragmentation of formats makes it very hard to know where to start.</p>
]]></description><pubDate>Thu, 02 Oct 2025 14:37:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=45450205</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=45450205</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45450205</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "F3: Open-source data file format for the future [pdf]"]]></title><description><![CDATA[
<p>The pitch for this sounds very similar to the pitch for Vortex (i.e. obviating the need to create a new format every time a shift occurs in data processing and computing by providing a data organization structure and a general-purpose API to allow developers to add new encoding schemes easily).<p>But I'm not totally clear what the relationship between F3 and Vortex is. It says their prototype uses the encoding implementation in Vortex, but does not use the Vortex type system?</p>
]]></description><pubDate>Thu, 02 Oct 2025 01:17:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=45445468</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=45445468</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45445468</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Progress toward fusion energy gain as measured against the Lawson criteria"]]></title><description><![CDATA[
<p>The really depressing part is if you plot rate of new delays against real time elapsed, the projected finishing date is even further.<p>This is why much of the fusion research community feel disillusioned with ITER, and so are more interested in these smaller (and supposedly more "agile") machines with high-temperature superconductors instead.</p>
]]></description><pubDate>Thu, 08 May 2025 18:19:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=43929462</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43929462</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43929462</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Progress toward fusion energy gain as measured against the Lawson criteria"]]></title><description><![CDATA[
<p>Presumably because everyone in MCF has been waiting for ITER for decades, and JET is being decommissioned after a last gasp. Every other tokamak is considerably smaller (or similar size like DIII-D or JT-60SA).<p>Much of the interesting tokamak engineering ideas were on small (so low-power) machines or just concepts using high-temperature superconducting magnets.</p>
]]></description><pubDate>Thu, 08 May 2025 17:17:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=43928568</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43928568</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43928568</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "What Is Cloud-Optimized Scientific Data?"]]></title><description><![CDATA[
<p>I wrote the article I wish I could have read back when I first heard of Zarr and cloud-native science back in 2018.<p>This explains how object storage and conventional filesystems are different, and the key properties that make Zarr work so well in cloud object storage.</p>
]]></description><pubDate>Thu, 17 Apr 2025 17:59:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=43720228</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43720228</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43720228</guid></item><item><title><![CDATA[What Is Cloud-Optimized Scientific Data?]]></title><description><![CDATA[
<p>Article URL: <a href="https://earthmover.io/blog/fundamentals-what-is-cloud-optimized-scientific-data">https://earthmover.io/blog/fundamentals-what-is-cloud-optimized-scientific-data</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43720227">https://news.ycombinator.com/item?id=43720227</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 17 Apr 2025 17:59:31 +0000</pubDate><link>https://earthmover.io/blog/fundamentals-what-is-cloud-optimized-scientific-data</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43720227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43720227</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "What Is Entropy?"]]></title><description><![CDATA[
<p>Isn't that more about enumerating the microstates? The Pauli exclusion principle just ends up forbidding some of the microstates (forbidding a significant fraction of them if you're in the low-temperature regime).</p>
]]></description><pubDate>Mon, 14 Apr 2025 22:41:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=43687087</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43687087</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43687087</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "What Is Entropy?"]]></title><description><![CDATA[
<p>Yes, that assumption is called the Ergodic Hypothesis, and generally justified in undergraduate statistical mechanics courses by proving and appealing to Liouville's theorem.<p>[1] <a href="https://en.wikipedia.org/wiki/Ergodic_hypothesis" rel="nofollow">https://en.wikipedia.org/wiki/Ergodic_hypothesis</a></p>
]]></description><pubDate>Mon, 14 Apr 2025 21:56:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=43686753</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43686753</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43686753</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Tensors vs. Tables: Why tabular tools trip over gridded data"]]></title><description><![CDATA[
<p>The scientific community works primarily with array (or "tensor") data, using tools like numpy, xarray, and zarr. People familiar with modern relational database tools such as DuckDB and Parquet often ask why can't we just use those? This article explains why: it's massively inefficient to use tabular tools on array data, and demonstrates with a benchmark showing a 10x difference in query speed.</p>
]]></description><pubDate>Thu, 03 Apr 2025 18:07:26 +0000</pubDate><link>https://news.ycombinator.com/item?id=43573306</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43573306</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43573306</guid></item><item><title><![CDATA[Tensors vs. Tables: Why tabular tools trip over gridded data]]></title><description><![CDATA[
<p>Article URL: <a href="https://earthmover.io/blog/tensors-vs-tables/">https://earthmover.io/blog/tensors-vs-tables/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43573305">https://news.ycombinator.com/item?id=43573305</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Thu, 03 Apr 2025 18:07:26 +0000</pubDate><link>https://earthmover.io/blog/tensors-vs-tables/</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43573305</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43573305</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Apache iceberg the Hadoop of the modern-data-stack?"]]></title><description><![CDATA[
<p>I think the posted article was generated from this one - the structure of the content is so similar.</p>
]]></description><pubDate>Thu, 06 Mar 2025 15:16:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=43281187</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43281187</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43281187</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Building an Open, Multi-Engine Data Lakehouse with S3 and Python"]]></title><description><![CDATA[
<p>This entire stack also now exists for arrays as well as for tabular data. It's still S3 for storage, but Zarr instead of parquet, Icechunk instead of Iceberg, and Xarray for queries in python.</p>
]]></description><pubDate>Wed, 19 Feb 2025 05:27:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=43098878</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43098878</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43098878</guid></item><item><title><![CDATA[New comment by tomnicholas1 in "Apache Iceberg now supports geospatial data types natively"]]></title><description><![CDATA[
<p>Icechunk can handle growing dimensions with ACID transactions!<p>For irregular shapes in some cases using multiple groups + xarray.DataTree can help you, but in general yeah ragged data is hard.</p>
]]></description><pubDate>Sat, 15 Feb 2025 19:53:32 +0000</pubDate><link>https://news.ycombinator.com/item?id=43061706</link><dc:creator>tomnicholas1</dc:creator><comments>https://news.ycombinator.com/item?id=43061706</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43061706</guid></item></channel></rss>