Hacker News: snidane

Asia's Billionaires Are Bankrolling a Push for More Babies

snidane — Fri, 24 Apr 2026 15:36:55 +0000

Article URL: https://www.bloomberg.com/news/articles/2026-04-24/asia-s-billionaires-offer-cash-incentives-to-boost-birth-rates

Comments URL: https://news.ycombinator.com/item?id=47891716

Points: 2

# Comments: 0

Henry George vs. Silvio Gesell – land value tax or land nationalization

snidane — Sat, 09 Nov 2024 17:40:50 +0000

Article URL: https://www.youtube.com/watch?v=BkwNxSkn1kk

Comments URL: https://news.ycombinator.com/item?id=42095668

Points: 2

# Comments: 0

New comment by snidane in "Get me out of data hell"

snidane — Sun, 03 Nov 2024 13:02:40 +0000

The architecture is sound - typically called ELT these days. Dump contents of upstream straight into a database and apply stateless and deterministic operations to achieve the final result tables.

SQL server is where this breaks though. You'll get yelled by DBAs for bad db practices like storing wide text fields without casting them to varchar(32) or varchar(12), primary keys on strings or no indexes at all, and most importantly taking majority of storage on the db host for tbese raw dumps. SQL Server and any traditional database scales by adding machines, so you end up paying compute costs for your storage.

If you use a shared disk system with decoupled compute scaling from storage, then your system is the way to go. Ideally these days dump your files into a file storage like s3 and slap a table abstraction over it with some catalog and now you have 100x less storage costs and about 5-10x increased compute power with things like duckdb. Happy data engineering!

New comment by snidane in "Get me out of data hell"

snidane — Sun, 03 Nov 2024 00:37:08 +0000

Looks like the classic mistake of every data team. Every single office person works with data in one way or another. Having a team called 'data' just opens a blanch check for anyone in the organization to dump every issue and every piece of garbage to this team as long as they can identify it as data.

That's why you build data platforms and name your team accordingly. This is much easier position to defend, where you and your team have a mandate to build tools for other to be efficient with data.

If upstream provides funky logs or jsons where you expect strings, that's for your downstream to worry about. They need the data and they need to chase down the right people in the org to resolve that. Your responsibility should be only to provide a unified access to that external data and ideally some governance around the access like logging and lineage.

Tldr; Open your 'data' mandate too wide and vague and you won't survive as a team. Build data platforms instead.

Hong Kong's Arts Hub Turns to Selling Land to Stay Afloat

snidane — Sat, 07 Sep 2024 18:50:59 +0000

Article URL: https://www.bloomberg.com/news/articles/2024-09-03/hong-kong-s-west-kowloon-cultural-hub-turns-to-selling-land-to-stay-afloat

Comments URL: https://news.ycombinator.com/item?id=41475639

Points: 5

# Comments: 0

New comment by snidane in "Why Americans Stopped Moving"

snidane — Mon, 02 Sep 2024 06:33:52 +0000

In post Czechoslovakia if you moved out of your hometown, you lost valuable networks. Healthcare and childcare were always connection based - you needed to know the local doctor, teacher, homebuilder - to get connected with specialists, skip the queues, get preferred treatment, faster processing of documents, etc. Once you part with your network, you start to build new one from scratch which takes many years. Unlike in US which is more market based - as long as you have the money you can recreate similar lifestyle elsewhere in the country.

Second, there was never really a need to rush to move to another location offering better opportunities. As consequence of the 1990s policies the local capital vanished into the hands of western entities and with it the opportunities worth moving for. The post 2000 capital which moved to the region just found spots with cheap labor to build new factories or logistic centers to keep the German powerhouse running. With an unfair advantage of cheap eastern energy, cheap eastern workers across the border and cheap euro currency as a result of sharing it with the unproductive european south.

New comment by snidane in "Pipe Syntax in SQL"

snidane — Sun, 25 Aug 2024 01:53:51 +0000

This style is familiar to those writing dataframe logic in df libraries with sql semantics - spark, polars or duckdb relational (https://duckdb.org/docs/api/python/relational_api.html).

It definitely makes things easier to follow, but only for linear, ie. single table, transformations. The moment joins of multiple tables come into the picture things become hairy quick and then you actually start to appreciate the plain old sql which accounts for exactly this and allows you to specify column aliases in the entire cte clause. With this piping you lose scope of the table aliases and then you have to use weird hacks like mangling names of the joined in table in polars.

For single table processing the pipes are nice though. Especially eliminating the need for multiple different keywords for filter based on the order of execution (where, having, qualify (and pre-join filter which is missing)).

A missed opportunity here is the redundant [AGGREGATE sum(x) GROUP BY y]. Unless you need to specify rollups, [AGGREGATE y, sum(x)] is a sufficient syntax for group bys and duckdb folks got it right in the relational api.

New comment by snidane in "DuckDB Community Extensions"

snidane — Sat, 06 Jul 2024 12:30:32 +0000

What do you need non-columnar layout for? Do you expect thousands of concurrent single row writes at a time?

If you use embedded duckdb on the client, unless the person goes crazy clicking their mouse at 60 clicks/s, duckdb should handle it fine.

If you run it on the backend and expect concurrent writes, you can buffer the writes in concatenated arrow tables, one per minibatch, and merge to duckdb every say 10 seconds. You'd just need to query both the historical duckdb and realtime arrow tables separately and combine results later.

I agree that having a native support for this so called Lambda architecture would be cool to have natively in duckdb. Especially when drinking fast moving data from a firehose.

New comment by snidane in "SEQUEL: A Structured English Query Language (1974)"

snidane — Tue, 14 May 2024 12:24:13 +0000

Does anybody know what is the relation between SQL as SEQUEL and then the competing language QUEL from Ingres db?

It always looked to me as if somebody back then in the database wars tried to word play on each other, one way or another.

https://en.m.wikipedia.org/wiki/QUEL_query_languages

New comment by snidane in "Rejected from YC. Reason: Because I don't have a cofounder"

snidane — Sat, 11 May 2024 15:43:53 +0000

Two people already form a 'company' and the hard part of convincing somebody else of your idea has already been accomplished. You can argue growing from 1 person to 2 is the most critical part of company growth. With 2+ cofounders this critical part is already figured out making it much less risky proposition than a single founder who hasn't convinced anybody else to join them yet.

New comment by snidane in "Csvlens: Command line CSV file viewer. Like less but made for CSV"

snidane — Sat, 06 Jan 2024 18:51:12 +0000

How does one output properly csv quoted rows? It seems thr csv flag works only for parsing inputs.

New comment by snidane in "Csvlens: Command line CSV file viewer. Like less but made for CSV"

snidane — Sat, 06 Jan 2024 18:49:19 +0000

There is sc-im, which is the closest you'll get to a full spreadsheet app in terminal with vi controls.

https://github.com/andmarti1424/sc-im

New comment by snidane in "Did English ever have a formal version of "you"? (2011)"

snidane — Sun, 24 Dec 2023 20:37:34 +0000

As someone coming from a culture with T-V distinction [1], I always wished we dropped one of the branches like English did in the past.

The informal T vs formal V causes confusion in conversation with semi-strangers. Eg. At work you never really know which way to speak to someone at a watercooler. If you choose the informal T you make them your equal. Ehich might be perceived as insulting to them, since they might want to keep a perception of superiority to you for eg. being older, more tenured, etc. Often you'd rather choose not to even engage in a conversation and just keep your thoughts to yourself. Better than ending up in a inferior position when choosing the safe V, or risking insulting someone when using the informal T form.

This felt to me like one of the reasons why English became the dominant language for business over time. Together with simplified morphology (you only need to learn the plural by adding 's' at the end, vs. 5+ other tenses of each word) it just ended up being much easier to pick up and less risky to engage in conversations and therefore higher chances of adoption by non-speakers.

[1] https://en.wikipedia.org/wiki/T%E2%80%93V_distinction

Teip: CLI to apply sed and Awk over rows and columns of a file

snidane — Sat, 23 Dec 2023 19:18:44 +0000

Article URL: https://github.com/greymd/teip

Comments URL: https://news.ycombinator.com/item?id=38747186

Points: 2

# Comments: 0

New comment by snidane in "Qsv: Efficient CSV CLI Toolkit"

snidane — Sat, 23 Dec 2023 18:45:44 +0000

This looks great!

Please consider removing any implicit network calls like the initial "Checking GitHub for updates...". This itself will prevent people from adoption or even trying it any further. This is similar to gnu parallel's --citation, which, albeit a small thing - will scare many people off.

Consider adding pivot and unpivot operations. Mlr gets it quite right with syntax, but is unusable since it doesn't work in streaming mode and tries to load everything into memory, despite claiming otherwise.

Consider adding basic summing command. Sum is the most common data operation, which could warrant its own special optimized command, instead offloading this to external math processor like lua or python. Even better if this had a group by (-by) and window by (-over) capability. Eg. 'qsv sum col1,col2 -by col3,col4'. Brimdata's zq utility is the only one I know that does this quite right, but is quite clunky to use.

Consider adding a laminate command. Essentially adding a new column with a constant. This probably could be achieved by a join with a file with a single row, but why not make this common operation easier to use.

Consider the option to concatenate csv files with mismatched headers. cat rows or cat columns complains about the mismatch. One of the most common problems with handling csvs is schema evolution. I and many others would appreciate if we could merge similar csvs together easily.

Conversions to and from other standard formats would be appreciated (parquet, ion, fixed width lenghts, avro, etc.). Othe compression formats as well - especially zstd.

It would be nice if the tool enabled embedding outputs of external commands easily. Lua and python builtin support is nice, but probably not sufficient. i'd like to be able to run a jq command on a single column and merge it back as another for example.

Inspiration:

  - csvquote: https://news.ycombinator.com/item?id=31351393
  - teip: https://github.com/greymd/teip

New comment by snidane in "Qsv: Efficient CSV CLI Toolkit"

snidane — Sat, 23 Dec 2023 18:05:26 +0000

Out of core computations. While your python and R script will choke after reading few hundred megs, my compiled binary cli will keep streaming through many such files with memory usage sitting somewhere near zero.

Put the OS in the Database

snidane — Sat, 19 Aug 2023 14:28:43 +0000

Article URL: https://www.forbes.com/sites/johnwerner/2023/08/15/put-the-os-in-the-database-performance-cybersecurity-and-endurance-in-the-cloud/

Comments URL: https://news.ycombinator.com/item?id=37189147

Points: 2

# Comments: 2

New comment by snidane in "Miller: Like Awk, sed, cut, join, and sort for CSV, TSV, and tabular JSON"

snidane — Thu, 16 Mar 2023 12:53:27 +0000

Great tool.

BUT, leaks memory like crazy.

Despite documentation stating the verbs are fully streaming.

> Fully streaming verbs > These don't retain any state from one record to the next. They are memory-friendly, and they don't wait for end of input to produce their output.

https://miller.readthedocs.io/en/6.7.0/streaming-and-memory/...

New comment by snidane in "I don't read web articles anymore, but I read books"

snidane — Mon, 09 Jan 2023 20:42:54 +0000

Another trick is to search the book title on youtube. There is a fair chance the author wants to promote the book and delivers the entire message in one hour speech in their own words instead of in hundreds of fluffed pages in the book. You can decide if you would like the buy book after watching the video.

New comment by snidane in "ChatGPT won’t replace search engines any time soon"

snidane — Sun, 08 Jan 2023 03:49:47 +0000

Google made terrible mistakes with their main cash cow of search.

They removed the important feature to search only forums, ie. human generated content, and promoted SEO spam to the top instead. Public forums became undiscoverable and people moved to walled gardens of facebook and similar instead.

Then Google killed the search by trying to make it some AI answering robot. Now they ignore what you even ask it and just return to you what they think you'd want.

All that people were asking for was a better search engine and all we got was an inferior version of a chat bot.