Hacker News: kacperlukawski

New multimodal Gemini embeddings from Google (videos and PDFs supported)

kacperlukawski — Tue, 10 Mar 2026 17:34:13 +0000

Article URL: https://haystack.deepset.ai/blog/multimodal-embeddings-gemini-haystack

Comments URL: https://news.ycombinator.com/item?id=47326357

Points: 1

# Comments: 0

New comment by kacperlukawski in "Ask HN: Is building a calm, non-gamified learning app a mistake?"

kacperlukawski — Mon, 15 Dec 2025 16:26:28 +0000

Although it's in a different area, I wanted to mention https://calmcode.io/ as an excellent example of a calm learning platform.

There is a whole movement around enshittification, and I see potential in this kind of app, even though it still seems to be a niche.

Ask HN: Front end stack for a new app in 2025

kacperlukawski — Fri, 21 Mar 2025 12:49:46 +0000

I read "The Frontend Treadmill" (https://polotek.net/posts/the-frontend-treadmill/) and it makes me think a lot.

I used to be a backend developer, and most of my experience is there. For frontend, last time I used jQuery and it was OK for me. Simple and worked fine. Now I want to build new side project and I think maybe vanilla JS is best? JavaScript has many good features now and browsers can do many things.

1. Can vanilla JS work for medium-sized apps in 2025? 2. What small libraries would you add if needed?

I don't want to rewrite everything in 2-3 years when frameworks change again. This seems like a waste of time. Has anyone here stopped using big frameworks and feels better about it?

Comments URL: https://news.ycombinator.com/item?id=43435010

Points: 1

# Comments: 2

Elasticsearch Hybrid Search in Practice

kacperlukawski — Wed, 12 Feb 2025 12:37:34 +0000

Article URL: https://softwaredoug.com/blog/2025/02/08/elasticsearch-hybrid-search

Comments URL: https://news.ycombinator.com/item?id=43024764

Points: 3

# Comments: 0

New comment by kacperlukawski in "Show HN: TalkNotes – A site that turns your ideas into tasks"

kacperlukawski — Sat, 01 Feb 2025 12:42:02 +0000

Is there a free trial available? Even 24 hours should be enough to say if I like it, but currently I have to pay from the day one. Or did I miss it?

New comment by kacperlukawski in "Show HN: I made an extension that turns Google Sheets into Google Slides"

kacperlukawski — Thu, 23 Jan 2025 13:51:04 +0000

I'm always a bit worried if an extension gets permission to do anything it wants with all my files, including deleting them. Is there a way to restrict it and allow it to modify only the files it created?

New comment by kacperlukawski in "Don't use cosine similarity carelessly"

kacperlukawski — Wed, 15 Jan 2025 09:22:35 +0000

The problem is to scale that properly. If you have millions of documents, that won't scale that well. You are not going to prompt the LLM millions of times, aren't you?

Embedding models usually have fewer parameters than the LLMs, and once we index the documents, their retrieval is also pretty fast. Using LLM as a judge makes sense, but only on a limited scale.

New comment by kacperlukawski in "Instant Video Search"

kacperlukawski — Thu, 12 Sep 2024 16:02:18 +0000

Interesting! Does it work based on speech or transcriptions?

New comment by kacperlukawski in "Automatically Detecting Under-Trained Tokens in Large Language Models"

kacperlukawski — Sun, 12 May 2024 14:48:52 +0000

Why is that an issue? Training the tokenizer seems much more straightforward than training the model as it is based on the statistics of the input data. I guess it may take a while for massive datasets, but is calculating the frequencies impossible to be done on a bigger scale?

New comment by kacperlukawski in "Automatically Detecting Under-Trained Tokens in Large Language Models"

kacperlukawski — Sun, 12 May 2024 13:06:12 +0000

Are there any specific reasons for using BPE, not Unigram, in LLMs? I've been trying to understand the impact of the tokenization algorithm, and Unigram was reported to be a better alternative (e.g., Byte Pair Encoding is Suboptimal for Language Model Pretraining: https://arxiv.org/abs/2004.03720). I understand that the unigram training process should eliminate under-trained tokens if trained on the same data as the LLM itself.

New comment by kacperlukawski in "OpenAI announces GPT-4.5 Turbo"

kacperlukawski — Tue, 12 Mar 2024 18:43:05 +0000

Yeah, it seems like it got published too early.

OpenAI announces GPT-4.5 Turbo

kacperlukawski — Tue, 12 Mar 2024 18:37:19 +0000

Article URL: https://www.bing.com/search?q=https%3A%2F%2Fopenai.com%2Fblog%2Fgpt-4-5-turbo

Comments URL: https://news.ycombinator.com/item?id=39683178

Points: 5

# Comments: 2

New comment by kacperlukawski in "Qdrant 1.7.0"

kacperlukawski — Tue, 12 Dec 2023 15:12:03 +0000

I'm unsure if there is any comparison of LanceDB and Qdrant available out there, but there shouldn't be any issues with Python 3.12 and qdrant-client compatibility. Windows is also not a problem, as the typical local setup is usually based on Docker. Are there any specific features you are interested in?

New comment by kacperlukawski in "Qdrant 1.7.0"

kacperlukawski — Tue, 12 Dec 2023 14:34:41 +0000

If you will be the only app user, then the Python SDK's local mode might be suitable. However, in the long run, when you decide to publish the app, you rather have to switch to an on-premise or cloud environment. Using Qdrant from the very beginning might be a good idea, as the interfaces are kept the same, and the switch is seamless.

Local mode: https://github.com/qdrant/qdrant-client#local-mode

New comment by kacperlukawski in "Show HN: PromptTools – open-source tools for evaluating LLMs and vector DBs"

kacperlukawski — Wed, 02 Aug 2023 07:29:38 +0000

Qdrant here! We're already working on that :D

New comment by kacperlukawski in "Serverless Semantic Search, Free tier only"

kacperlukawski — Wed, 12 Jul 2023 14:00:48 +0000

How would you host sentence-transformers model for free? You need it to vectorize each query so that has to be hosted somewhere. Is there any way to do it for free?

New comment by kacperlukawski in "Serverless Semantic Search, Free tier only"

kacperlukawski — Wed, 12 Jul 2023 13:54:12 +0000

If you need semantic search locally then it's fine, but serving an embedding model might be still challenging. And if you want to expose it, your laptop might be not enough.

New comment by kacperlukawski in "Introduction to vector similarity search (2022)"

kacperlukawski — Wed, 12 Jul 2023 12:44:46 +0000

This is also an interesting piece of how to do it completely for free: https://news.ycombinator.com/item?id=36693239

New comment by kacperlukawski in "Serverless Semantic Search, Free tier only"

kacperlukawski — Wed, 12 Jul 2023 12:25:52 +0000

It's a bit easier in Python if you use tools like https://www.serverless.com/. I'm not sure if Rust has something similar yet.

New comment by kacperlukawski in "MdBook – A command line tool to create books with Markdown"

kacperlukawski — Fri, 30 Jun 2023 12:26:36 +0000

It's Hugo, with a custom styling. https://gohugo.io/documentation/