Hacker News: mgachka

New comment by mgachka in "Is ClickHouse Moving Away from Open Source?"

mgachka — Fri, 22 Sep 2023 10:11:03 +0000

"A license that doesn’t allow reselling the service is good enough For example, I saw this with Tailwind UI" FYI, we discussed it in this thread https://github.com/ClickHouse/ClickHouse/issues/44767. For the moment, the only way to get the new features is to use the cloud version (which is likely to be a no-go for most companies managing their own clickhouse infrastructure).

New comment by mgachka in "Is ClickHouse Moving Away from Open Source?"

mgachka — Fri, 22 Sep 2023 08:07:20 +0000

As a user of clickhouse since 2018 I'm fully aligned with the content of this article. This technology is one of the best I've been using in my career.

The choice of clickhouse for a new project in my company has always been a no-brainer, but the recent move from clickhouse.inc to a closed source version has made this choice less straightforward.

New comment by mgachka in "Decentralized cluster membership in Rust"

mgachka — Thu, 28 Apr 2022 11:41:13 +0000

Hi,

It's a good read. I have a few questions/comments:

- given the description of the Rumor-mongering approach vs the Anti-entropy approach, it looks like the Anti-entropy approach:

-- has an important overhead in terms of network/messages sent (since nodes are always chatting even when there are no changes in the cluster).

-- is slower to propagate a change of cluster state to all the nodes. Does it mean that in case of a node failures/shutdown, the cluster will be instable for longer (since dead nodes will receive queries)?

- the article mention a "seed node" but doesn't define what this is.

- on a dynamic quickwit setup (that often downscales/upscales depending on the load), it seems that the size of the metadata in each node will keep increasing since the state of the dead nodes will be kept, unless the same node_unique_id can be reused after a downscale/upscale (but I don't know enough how kube works to see if it's the case).

New comment by mgachka in "ClickHouse as an alternative to Elasticsearch for log storage and analysis"

mgachka — Tue, 02 Mar 2021 22:45:51 +0000

FYI, we're using clickhouse since 2018 at ContentSquare.

I did a few POCs to compare clickhouse vs other databases on ContentSquare's use case. One of them was memSQL. Although memSQL was very good, since we don't need to JOIN big datasets or need killer features like fulltext search, clickhouse gave a better perf/cost ratio for us (I don't remember exactly but it was at least twice cheaper).

New comment by mgachka in "ClickHouse as an alternative to Elasticsearch for log storage and analysis"

mgachka — Tue, 02 Mar 2021 22:14:07 +0000

I stopped answering to people about the release date of my next blog post because I'm always postponing it ;-).

But don't worry Paul, the day I'll release it you'll be one of the first to be informed.

New comment by mgachka in "ClickHouse as an alternative to Elasticsearch for log storage and analysis"

mgachka — Tue, 02 Mar 2021 22:09:29 +0000

I'm of the guy who did the 2 presentations of Clickhouse at ContentSquare. There are no blog posts on the migration from ES to CH. But you can find the slides of the 2018 presentation here https://www.slideshare.net/VianneyFOUCAULT/clickhouse-meetup... And the slides of the 2019 presentation here https://www.slideshare.net/VianneyFOUCAULT/meetup-a-successf...

There is also a video recording of the 2019 presentation available here. https://www.youtube.com/watch?v=lwYSYMwpJOU nb: The video is not great because the camera is often losing focus but it's still understandable.

New comment by mgachka in "How does a relational database work?"

mgachka — Thu, 20 Aug 2015 15:31:35 +0000

Hi, I'm glad to read this comment.

If you liked this article, maybe you'll like my article on Shazam. I used the same pattern: I start from the basics of sound processing and computer science and finish with an in-depth explanation of Shazam.

New comment by mgachka in "How does a relational database work?"

mgachka — Thu, 20 Aug 2015 08:15:07 +0000

I you're right. In fact in the optimizer part I say (in a simple way) that big O (i.e. asymptotic complexity) is not the same as CPU cost but it's easier for me because the real cost of an operation depends on the CPU architecture.

Someone told me the same on the article comments and here is the answer I gave him:

You’re right and I agree with you. When I wrote this part, I REALLY hesitated to give the real asymptotic definition and what it means for the number of operations but I chose a simpler explanation since the aim of this post is not to become an expert but to have a good idea. I hope that this won’t mislead people but I thought the real definition was too hard for a “newcomer” and not important to understand a database. This is also why I added in this part “The time complexity doesn’t give the exact number of operations but a good idea.” and said at the end of the part “I didn’t give you the real definition of the big O notation but just the idea” with a link to the real definition.

New comment by mgachka in "How does a relational database work?"

mgachka — Wed, 19 Aug 2015 19:14:14 +0000

I understand your point (and it’s a good one) and here is mine: Unless you're working in a team with a lot of good IT guys, you're likely to end up with worse performances and problems.

For example, when I started in Big Data, in less than 3 weeks I was able to optimize some batches just because I read the documentation of the framework used (PIG in this case) and read a small part of the source code to dig deeper. And it was not some touchy optimizations: I used in-memory joins and reduced the number relations in the scripts to reduce the generation of Hadoop jobs (which led to batchs 4 times faster).

There are often problems with our HBase database because it’s often overloaded (I’m not an IT operator so I can’t give more details) and no one really masters this database whereas it’s in production since 2014.

I do understand that in some cases a NoSQL database is mandatory and like you I like to understand what I’m doing. But:

- I’m not working in Silicon Valley

- Most of my co-worker are not geeks (and I respect that)

- It's VERY hard to find guys with real Big Data or NoSQL skills (this comes from a French technical recruiter)

So, if the geek part of me loves Big Data and NoSQL, the rational part prefers using well known technologies. If NoSQL and Big Data becomes mainstream and more known then the rational part will love them too.

New comment by mgachka in "How does a relational database work?"

mgachka — Wed, 19 Aug 2015 17:40:30 +0000

(I'm the author of the article) I'm 28 and I’m currently a Big Data developer (I use Hadoop, HBase, Hive …) and I don’t understand the buzz surrounding Big Data and NoSQL.

With a relational database the complexity is hidden (more or less…) whereas with Big Data and NoSQL the developer needs to deal with this complexity himself/herself. As a result, most of the Big Data applications I’ve seen don’t work well.

A really like Big Data because it’s more complex but to be honest, most of the time my work does not required the “Big Data scale”.

How does a relational database work?

mgachka — Wed, 19 Aug 2015 09:35:11 +0000

Article URL: http://coding-geek.com/how-databases-work/

Comments URL: https://news.ycombinator.com/item?id=10084449

Points: 377

# Comments: 60

New comment by mgachka in "Let’s Build a Web Server, Part 3"

mgachka — Tue, 04 Aug 2015 00:34:50 +0000

A good and well written article with pictures. I hope I could read more articles like this one.

New comment by mgachka in "How Shazam works"

mgachka — Sat, 11 Jul 2015 23:41:41 +0000

When I started this side project in 2012, I looked for publicly reliable information (especially thesis or research papers) and the only useful information I found was Shazam confounder’s paper.

Since this paper was written in 2003, I wouldn't be surprised if they have changed their algorithms since this time.

But from my understanding, the 2003 paper describes a highly scalable architecture and a noise tolerant and "time efficient" algorithm (that can be modified using thresholds) so it could still work in 2015 with a few optimizations. Still, I'm not working at Shazam and I'm not a researcher so I could be wrong.

New comment by mgachka in "How Shazam works"

mgachka — Sat, 11 Jul 2015 20:34:37 +0000

I, I’m the author of the article and I already know that. The big difference between Roy van Rijn and I is that I only put algorithms whereas he put “ready to use” java code. On paper I should be bulletproof to any lawsuit since this article is nothing more than a very detailed version of the confounder Shazam paper (+some unexplained algorithms).

New comment by mgachka in "How Shazam works"

mgachka — Sat, 11 Jul 2015 20:29:55 +0000

I, I’m the author of the article. In fact I also did the same when I did my prototype of Shazam. When I wrote the article, I hesitated to add a sub chapter in the Shazam chapter when I would have put a well-known music and its fingerprinted version so that everyone can hear what it sounds like but I didn’t do it because I feared copyright lawsuit.