Hacker News: sqlcook

New comment by sqlcook in "Craig Venter has died"

sqlcook — Thu, 30 Apr 2026 04:59:13 +0000

Sad news, I’ve worked at HumanLongevity and got to interact with Craig several times. He was a legend and truly will be missed.

New comment by sqlcook in "Open Source Python ETL"

sqlcook — Wed, 19 Jun 2024 03:56:47 +0000

You’re missing the point of the benefits of solutions like these, and the original set of tools like the Informatica of the kind. Those tools come with limitations and constrains, like a box of legos you can build a very powerful pipeline without having to wire up a lot of redundant code as you pass data frames between validation stages. Tools like Airflow/Spark etc are great for what they are, but they don’t come with guidelines or best practices when it comes to reusable code at scale, your team has to establish that early on.

You can open a pretty complicated large DAG in and right away you’ll understand the data flow and processing steps. If you were to do similar in code, it becomes a lot harder unless you comply to good modular design practices.

This is also why common game engine and 3d rendering tools come with a UI for flow driven scripting. It’s intuitive and much easier to organize.

New comment by sqlcook in "Gallery of database schema diagrams of open-source packages"

sqlcook — Tue, 28 Apr 2020 16:26:46 +0000

This is indeed one of the nicest UI's I've seen, is this all custom made or are you using some open components? I wanted to get similar effect for something completely different I had in mind (like a flow based diagram), any advice greatly appreciated.

New comment by sqlcook in "Dbeaver – Multi-platform database tool"

sqlcook — Fri, 18 Jan 2019 03:22:36 +0000

Nice if you’re only connecting to a single instance, nightmare if you have multiple. DbVisualizer is better imho.

New comment by sqlcook in "How NoSQL forced the evolution of a scalable relational database"

sqlcook — Fri, 13 Jul 2018 08:59:38 +0000

It actually IS quite hard to imagine solid engineers finding it hard to learn simple SQL. It's a lot harder to write "nasty" sql than "nasty" js, and if you're a front-end dev struggling with sql, then you probably should not be doing back end work. Yes not all databases are equal, thats why there are ORMs and ANSI standards.

From the list of bad practices you've "seen" people do in SQL, I would guess their comfort zone code is probably a hot mess as well.

New comment by sqlcook in "How NoSQL forced the evolution of a scalable relational database"

sqlcook — Fri, 13 Jul 2018 07:39:59 +0000

EE features without support. I hate to bring up Mongo as example, but something similar...where support, additional software/plugins and cloud hosting are where the $ is made.

I did thorough testing of Memsql two years ago but went with Aurora instead. Would love to see how the product evolved since (Spark and Streaming integration was just being rolled out at the time), but something tells me pricing will be a deal breaker.

New comment by sqlcook in "How NoSQL forced the evolution of a scalable relational database"

sqlcook — Fri, 13 Jul 2018 07:18:04 +0000

Hi Rick, I've been following Memsql for a few years now, are there any plans to release "community" edition? Last time I checked about 1.5 years ago json support was very basic and EE pricing (dont remember exact #s) was rather high. Thanks

New comment by sqlcook in "First field report of iPhone X"

sqlcook — Mon, 30 Oct 2017 12:47:45 +0000

You mean with a picture of your face....

New comment by sqlcook in "Show HN: CoinHub for iOS"

sqlcook — Thu, 05 Oct 2017 02:11:48 +0000

Nice looking app! What did you use for the charts?

The Sixense Sense was once the ultimate VR controller. Where is it now?

sqlcook — Tue, 02 May 2017 16:58:51 +0000

Article URL: https://www.theverge.com/2017/5/2/15477520/sixense-stem-oculus-rift-vr-controller-kickstarter-problems

Comments URL: https://news.ycombinator.com/item?id=14248137

Points: 1

# Comments: 0

New comment by sqlcook in "Horizon 1.0: a realtime, open-source JavaScript back end from RethinkDB"

sqlcook — Tue, 17 May 2016 21:07:50 +0000

Excellent work Slava! Have been planning to migrate to RethinkDB for one of the existing projects, great timing with Horizon :D

New comment by sqlcook in "How we get high availability with Elasticsearch and Ruby on Rails"

sqlcook — Sat, 09 Apr 2016 03:18:26 +0000

if you want the fastest ingress, disable replica until your ingress is done, its faster to create replica at the end of ETL for that given index. Also, you want to disable auto allocation as well, this will disable shard movement during ingress, re-enable it afterwards.

on a 100 node cluster i had roughly 500GB on each node. this was not a single index, multiple indexes, with roughly 8 shards per index per node. Shard count is pretty important to get correct.

I did not manually control document routing (it was hard based on the type of data i was ingressing), so it was set to auto and during the load i observed hotspots in the cluster (you have to look at BULK thread/queue length), some nodes were getting burst of docs while others were idle, roughly 40-50% of the nodes in the cluster were under utilized, and maybe 5-10% had hot spots from time to time.

Also, depending what you use to push data in, (I used ES hadoop plugin) , you have to account for shard segment merges, which literally pause ingress for a brief moment and merge segments in a given shard. You have to set retry to -1 (infinite) and retry delay to something like a second or two, otherwise you will end up with dropped documents.

New comment by sqlcook in "How we get high availability with Elasticsearch and Ruby on Rails"

sqlcook — Sat, 09 Apr 2016 02:24:14 +0000

100 data nodes

basically if you want fast ingress, keep shards small, once they get past ~5-10gb , ingress significantly slows down. Also this was on ES 1.5 , have not tested latest 2.0+ builds

New comment by sqlcook in "How we get high availability with Elasticsearch and Ruby on Rails"

sqlcook — Sat, 09 Apr 2016 01:10:44 +0000

I've indexed ~ 1 million docs a second, but with proper routing, can probably even 5x that. Total cluster size was 50 terabytes, at the end.

New comment by sqlcook in "pg_rewind in PostgreSQL 9.5"

sqlcook — Mon, 23 Mar 2015 21:57:58 +0000

The post is about Postgres, you were referring to MSSQL mirroring, which can fail back from primary to secondary and back to primary almost seamlessly, depending on type of failure and log catchup.

Postgres has been a pain as described in the article.......

New comment by sqlcook in "pg_rewind in PostgreSQL 9.5"

sqlcook — Mon, 23 Mar 2015 21:45:00 +0000

? MSSQL mirroring failover is very easy and painless( arguably one of best compared to other rdbms), just make sure not to do automatic failover as that can cause false failover with spotty network between nodes and witness.