<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: willvarfar</title><link>https://news.ycombinator.com/user?id=willvarfar</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 17 Apr 2026 09:56:40 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=willvarfar" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by willvarfar in "The Economics of Software Teams: Why Most Engineering Orgs Are Flying Blind"]]></title><description><![CDATA[
<p>With a long time in the industry and seeing how so many big software companies work, this really really chimed with me.  Many/most teams and projects and busy work are not actually moving the bottom line, at massive opportunity cost!  And there's so little awareness that most people in squads and their managers will think they are the exception.<p>Whereas Whatsapp with its 30 software engineers was the exception etc.<p>A chat with friends showed how there are parallels with how LLMs will happen in the short-term future - say the next 5 years - and the whole MapReduce mess.  Back when Hadoop came along you built operators and these operators communicated through disk.  It took years even after Spark was about for the hadoop userbase as a whole to realise that it is orders of magnitude more efficient to only communicate through disk when two operators are not colocatable on the same machine and that most operators in most pipelines can be fused together.<p>So for a while LLMs will be in the Hadoop phase where they are acting like junior devs and making more islands that communicate in bigger bloated codebases and then there might be a realisation in about 2030 that actually the LLMs could have been used to clean up and streamline and fuse software and approach the Whatsapp style of business impact.</p>
]]></description><pubDate>Mon, 13 Apr 2026 08:39:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=47749385</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=47749385</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47749385</guid></item><item><title><![CDATA[New comment by willvarfar in "Challenges in join optimization"]]></title><description><![CDATA[
<p>Yeah it's pretty obscure, sorry.<p>It's called cogroup in Spark and similar architectures.<p>It does a group-by to convert data into the format (key_col_1, ... key_col_n) -> [(other_col_1, ... other_col_n), ...]<p>This is useful and ergonomic in itself for lots of use-cases.  A lot of Spark and similar pipelines do this just to make things easier to manipulate.<p>Its also especially useful if you cogroup each side before join, which gives you the key column and two arrays of matching rows, one for each side of the join.<p>A quick search says it's called "group join" in academia.  I'm sure I've bumped into as another name in other DB engines but can't remember right now.<p>One advantage of this is that it is bounded memory.  It doesn't actually iterate over the cartesian product of non-unique keys.  In fact, the whole join can be done on pointers into the sides of the join, rather than shuffling and writing the values themselves.<p>My understanding is that a lot of big data distributed query engines do this, at least in mixer nodes.  Then the discussion becomes how late they actually expand the product - are they able to communicate the cogrouped format to the next step in the plan or must they flatten it?  Etc.<p>(In SQL big data engines sometimes you do this optimisation explicitly e.g. doing SELECT key, ARRAY_AGG(value) FROM ... on each side before join.  But things are nicer when it happens transparently under the hood and users get the speedup without the boilerplate and brittleness and fear that it is a deoptimisation when circumstances change in the future.)</p>
]]></description><pubDate>Thu, 22 Jan 2026 08:03:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=46716467</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46716467</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46716467</guid></item><item><title><![CDATA[New comment by willvarfar in "Significant US farm losses persist, despite federal assistance"]]></title><description><![CDATA[
<p>This video is very liberal but does a good job of explaining which companies and industries pay for breaks and which don't.  And uses soy bean farmers as a prominent example of a group who haven't been giving Trump bribes <a href="https://youtu.be/RPzcGeiNYvk?si=bfy_5KEo_ZUxOBHu" rel="nofollow">https://youtu.be/RPzcGeiNYvk?si=bfy_5KEo_ZUxOBHu</a></p>
]]></description><pubDate>Thu, 22 Jan 2026 06:58:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=46716103</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46716103</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46716103</guid></item><item><title><![CDATA[New comment by willvarfar in "Challenges in join optimization"]]></title><description><![CDATA[
<p>Can join cardinality can be tackled with cogroup and not expanding the rows until final write?</p>
]]></description><pubDate>Wed, 21 Jan 2026 23:27:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46713150</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46713150</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46713150</guid></item><item><title><![CDATA[New comment by willvarfar in "Porsche sold more electrified cars in Europe in 2025 than pure gas-powered cars"]]></title><description><![CDATA[
<p>everyone is expecting everyone to actually go Gripen with Rolls Royce or MECA engines?</p>
]]></description><pubDate>Tue, 20 Jan 2026 18:27:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=46695792</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46695792</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46695792</guid></item><item><title><![CDATA[New comment by willvarfar in "Porsche sold more electrified cars in Europe in 2025 than pure gas-powered cars"]]></title><description><![CDATA[
<p>Having a pending order that can be cancelled is negotiation leverage?</p>
]]></description><pubDate>Tue, 20 Jan 2026 13:36:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=46691680</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46691680</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46691680</guid></item><item><title><![CDATA[New comment by willvarfar in "Porsche sold more electrified cars in Europe in 2025 than pure gas-powered cars"]]></title><description><![CDATA[
<p>And what about all those huge pending orders for F35 in ... Denmark and Canada?  Etc.</p>
]]></description><pubDate>Tue, 20 Jan 2026 07:27:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=46688931</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46688931</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46688931</guid></item><item><title><![CDATA[New comment by willvarfar in "40% of Kids Can't Read and Teachers Are Quitting [video]"]]></title><description><![CDATA[
<p>The very last clip in the video says that it is kids in affluent families taking that direction.</p>
]]></description><pubDate>Mon, 19 Jan 2026 12:25:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46678218</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46678218</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46678218</guid></item><item><title><![CDATA[New comment by willvarfar in "Why DuckDB is my first choice for data processing"]]></title><description><![CDATA[
<p>(I work a lot with BigQuery's BigLake adaptor and it's basically caching the metadata of the iceberg manifest and parquet footers in Bigtable (this is Google) so query planning is super fast etc.  Really helps)</p>
]]></description><pubDate>Fri, 16 Jan 2026 20:43:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=46651956</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46651956</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46651956</guid></item><item><title><![CDATA[New comment by willvarfar in "Danish Armed Forces expand their presence and continue exercises in Greenland"]]></title><description><![CDATA[
<p>Greenland and Denmark have always been encouraging minerals deals etc, they just haven't materialized.</p>
]]></description><pubDate>Thu, 15 Jan 2026 13:29:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=46632220</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46632220</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46632220</guid></item><item><title><![CDATA[New comment by willvarfar in "Ask HN: What did you find out or explore today?"]]></title><description><![CDATA[
<p>Seriously, this is not what big data does today.  Distributed query engines don't have the primitives to zip through two tables and treat them as column groups of the same wider logical table.  There's a new kid on the block called LanceDB that has some of the same features but is aiming for different use-cases.  My trick retrofits vertical partitioning into mainstream data lake stuff.  It's generic and works on the tech stack my company uses but would also work on all the mainstream alternative stacks.  Slightly slower on AWS.  But anyway.  I guess HN just wants to see an industrial track paper.</p>
]]></description><pubDate>Thu, 15 Jan 2026 08:49:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46629873</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46629873</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46629873</guid></item><item><title><![CDATA[New comment by willvarfar in "Ask HN: What did you find out or explore today?"]]></title><description><![CDATA[
<p>specifically I've discovered how to 'trick' mainstream cloud storage and mainstream query engines using mainstream table formats how to read parallel arrays that are stored outside the table without using a classic join and treat them as new columns or schema evolution.  It'll work on spark, bigquery etc.</p>
]]></description><pubDate>Thu, 15 Jan 2026 06:57:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=46629029</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46629029</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46629029</guid></item><item><title><![CDATA[New comment by willvarfar in "Ask HN: What did you find out or explore today?"]]></title><description><![CDATA[
<p>crazy to think that soon not being able to successfully complete the captcha will be a signal that the user is human</p>
]]></description><pubDate>Thu, 15 Jan 2026 06:55:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=46629007</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46629007</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46629007</guid></item><item><title><![CDATA[New comment by willvarfar in "Ask HN: What did you find out or explore today?"]]></title><description><![CDATA[
<p>I had a great euphoric epiphany feeling today.  Doesn't come along too often, will celebrate with a nice glass of wine :)<p>Am doing data engineering for some big data (yeah, big enough) and thinking about efficiency of data enrichment.  There's this classic trilemma with data enrichment where you can have good write efficiency, good read efficiency and/or good storage cost, pick two.<p>E.g. you have a 1TB table and you want to add a column that, say, will take 1GB to store.<p>You can create a new table that is 1.1TB and then delete the old table, but this is both write-inefficient and often breaks how normal data lake orchestration works.<p>You can create a new wide table that is 1.1TB and keep it along side the old table, but this is both write-inefficient and expensive to store.<p>You can create a narrow companion table that has just a join key and 1GB of data.  This is efficient to write and store, but inefficient to query when you force all users to do joins on read.<p>And I've come up with a cunning forth way where you write a narrow table and read a wide table so its literally best of all worlds!  Kinda staggering :)  Still on a high.<p>Might actually be a conference paper, which is new territory for me.  Lets see :)<p>/off dancing</p>
]]></description><pubDate>Thu, 15 Jan 2026 06:39:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=46628896</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46628896</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46628896</guid></item><item><title><![CDATA[New comment by willvarfar in "Network of Scottish X accounts go dark amid Iran blackout"]]></title><description><![CDATA[
<p>I agree that social media is a net negative, but want to also point out that before social media it was the mainstream press and TV have been shaping society for decades.  Things like buying a used car from Nixon or fighting in Vietnam etc are all mainstream press impact.</p>
]]></description><pubDate>Tue, 13 Jan 2026 13:30:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46600711</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46600711</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46600711</guid></item><item><title><![CDATA[New comment by willvarfar in "Why is there a tiny hole in the airplane window? (2023)"]]></title><description><![CDATA[
<p>I've always noticed and wondered, so I guess it's easy to overlook but it's there.</p>
]]></description><pubDate>Fri, 09 Jan 2026 10:40:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46552346</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46552346</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46552346</guid></item><item><title><![CDATA[New comment by willvarfar in "Anthropic blocks third-party use of Claude Code subscriptions"]]></title><description><![CDATA[
<p>Presumably there will soon be banner ads in Claude Code then? </s></p>
]]></description><pubDate>Fri, 09 Jan 2026 08:01:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46551188</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46551188</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46551188</guid></item><item><title><![CDATA[New comment by willvarfar in "I program without syntax highlighting"]]></title><description><![CDATA[
<p>I remember when syntax highlighting was introduced in Borland's Turbo Pascal editor (on DOS).  It was a very major usability improvement and put TP's IDE at the forefront of getting things done.  Fond memories :)</p>
]]></description><pubDate>Thu, 08 Jan 2026 14:23:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=46541307</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46541307</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46541307</guid></item><item><title><![CDATA[New comment by willvarfar in "Lessons from Hash Table Merging"]]></title><description><![CDATA[
<p>Kudos, neat digging and writeup that makes us think :)<p>If you merge linear probed tables by iterating in sorted hash order then you are matching the storage order and can congest particular parts of the table and cause the linear probing worse case behaviour.<p>By changing the iteration order, or salting the hash, you can avoid this.<p>Of course chained hash tables don't suffer from this particular problem.<p>My quick thought is that hash tables ought keep an internal salt hidden away.  This seems good to avoid 'attacks' as well as speeding up merging etc.  The only downside I can think of is that the creation of the table needs to fetch a random salt that might not be quick, although that can alleviated by allowing it to be set externally in the table creation so people who don't care can set it to 0 or whatever.  What am I missing?</p>
]]></description><pubDate>Thu, 08 Jan 2026 09:17:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=46538984</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46538984</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46538984</guid></item><item><title><![CDATA[New comment by willvarfar in "Bill to Eliminate H-1B Visa Program Introduced in Congress"]]></title><description><![CDATA[
<p>This is going to be an interesting take but I think it is plausible that we'll see a quiet growth in American tech companies having even bigger offshore campuses instead.  Google Zurich or Google London could grow, Google does hardware in Taiwan and Apple and Intel do hardware in Israel, and pretty much all the big tech companies have the biggest chunk in Hyperbad.<p>The withdrawal of the H1B means companies can't compete on offering them to attract talent, but that talent still wants to work somewhere and companies can instead complete on the perks they offer at those offshore places.<p>Things will get interesting if Europe can become the place that US tech companies offer visa support for people to move to though.</p>
]]></description><pubDate>Wed, 07 Jan 2026 07:51:14 +0000</pubDate><link>https://news.ycombinator.com/item?id=46523720</link><dc:creator>willvarfar</dc:creator><comments>https://news.ycombinator.com/item?id=46523720</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46523720</guid></item></channel></rss>