Hacker News: londogard

New comment by londogard in "Gzip and KNN Outperforms Transformers on Text Classification"

londogard — Thu, 13 Jul 2023 13:19:41 +0000

Yes, what I'm saying is that gzip does not perform as well when it's not overlapping tokens exact.

Gzip does not support a "semantic" mode, hence it won't and does not (according to the papers metric) perform as well.

Deep learning can capture these semantic similarities.

New comment by londogard in "Gzip and KNN Outperforms Transformers on Text Classification"

londogard — Thu, 13 Jul 2023 11:41:13 +0000

I'd like to note that this is only stronger on news.

Yahoo Questions it is not top performer. It's not far fetched to think that news are written in a similar way, sometimes even partly copied, and therefore have a lot of words in common. Yahoo Questions is a forum and I'd expect there to be a greater variation of word, but the word themself have a semantic similarity.

That is, gzip is strong when many words overlap (the size increase when gzipped is smaller) but if it's semantic similarity DNN's win everyday.

The results are interesting but not as interesting as it sounds IMO.

New comment by londogard in "Ask HN: Could you share your personal blog here?"

londogard — Tue, 04 Jul 2023 18:15:01 +0000

blog.londogard.com

Mainly Data Science or Kotlin I'm blogging about it at least for now.

Solara – “A New Reactive Streamlit”

londogard — Sat, 01 Jul 2023 15:07:42 +0000

Article URL: https://blog.londogard.com/posts/2023-06-30-solara/

Comments URL: https://news.ycombinator.com/item?id=36550924

Points: 2

# Comments: 0

New comment by londogard in "Log and track ML metrics, parameters, models with Git and DVC"

londogard — Tue, 23 May 2023 11:32:22 +0000

Thanks!

We solved 1. the same way, but it felt "off" somehow. Perhaps it's a good solution.

2. That's a sound solution, but a tiny bit cumbersome. I have projects where we deploy both classifier and regressor, where it'd nice to keep all in main. Alas, you can't have it all.

New comment by londogard in "Log and track ML metrics, parameters, models with Git and DVC"

londogard — Sun, 21 May 2023 08:13:29 +0000

I really enjoy using DVC. I do have some drawbacks compared to other offering like MLFlow and W&B.

1. Harder to track experiments on remote VM's (e.g. Azure) as there's no server (we need to feed results back somehow) 2. Impossible (?) to track different types of experiments in the same repo. MLFlow has a way to define experiments and runs, which means I can easily group Regression vs Classification or even if I try a completely different task with the same data.

If anyone has a good suggestion on how to solve these two I'd love to fully commit to DVC!

New comment by londogard in "DuckDB 0.7.0"

londogard — Mon, 13 Feb 2023 10:33:55 +0000

The Polars integration is golden! Now I can use my two favorite tools without any awkward conversations via Arrow.

Really smooth, love it!

New comment by londogard in "TypeScript and Set Theory"

londogard — Fri, 22 Apr 2022 06:15:57 +0000

I think it's excalidraw

New comment by londogard in "Tribuo, a Machine Learning Library for Java"

londogard — Tue, 15 Sep 2020 20:10:36 +0000

Hi,

What would you say differentiates you from Smile which includes a simplistic datagrame, visualisation and support for CBLAS etc.

Is speed on par?

New comment by londogard in "AllenNLP Interpret: A Framework for Explaining Predictions of NLP Models"

londogard — Sun, 29 Sep 2019 06:35:48 +0000

Awesome, I love to see how there's coming more and more frameworks for interpretability. It's incredibly important, especially when selling your solution to higher-ups.

There's another solution named LIME which seems to take a similar but more general approach, I like this more tailored idea as it'll probably give a better interpretation of the NLP questions.