Hacker News: rkochanowski

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 20:46:47 +0000

Currently, only whole functions (including function-like constructs depending on language) are considered as unit.

Skipping the extraction of conditional branches was my decision to not overcomplicate the first versions, which was intended to validate the idea. I will add this in future versions because I agree it's needed for large functions.

I don't think it needs configurable granularity. In the current version, there is an analogous mechanism: when functions are nested, both outer and inner are embedded separately. When both are similar to each other, this pair is excluded. Inner or outer functions can appear in results depending on similarity to other units.

Regarding comments, they are removed and I will think about handling them. The challenge is not with extraction, but with how to present this in a report. This may be a nice addition because coding agents often add comments.

Thanks for the feedback.

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 20:11:40 +0000

Based on your example there is only a single function a() which is embedded. The rest is just a code and dependencies are not resolved. Did you think about adding this feature in your tool?

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 18:51:04 +0000

Generally, I chunk by function/method (not by whole class), but different languages have specific concepts and features. Nested code units, anonymous functions, lambdas, closures are extracted as separate chunks.

The chunk size has allowed range and those outside are simply ignored.

- Upper limit is hardcoded with a body size of 10k chars

- Lower limit is configurable with a default of 10 AST nodes inside the body

The chunking strategy is something that can be improved in future versions.

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 18:26:09 +0000

There are good mature tools for deterministic duplication detection and I intentionally focused on embedding-based to fill this gap (I didn't find other tools using this approach).

If by "more efficient" you mean to avoid embedding of the same code multiple times, this optimization is already implemented internally.

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 18:03:37 +0000

Recently there was a popular article on HN saying that sometimes code duplication is better than abstraction, so I assume that this question is not a joke.

While testing this tool, one detected duplication was interesting for a use case. Permission check logic was duplicated and placed in different distant places in the codebase. The code was similar, but not identical, the logic was not the same. One version had stricter checks. I analyzed this with the coding agent, and we found out that both versions are used for the same thing, which means that in some cases validation is insufficient. Having only a single validation place, this bug could be prevented or easily detected.

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 17:42:06 +0000

No. It depends the most on general code quality and architecture. Some implementations require more code similarity by design. Some languages, like Java, may tend to have more duplication, but it's only a theoretical guess. It also depends on what kind of software is developed with what language.

If you are interested in data, you can check my article. Analysis was done with this tool, but a previous version where exact-copy duplicates were excluded from analysis. https://rkochanowski.com/article/analysis-code-duplication/

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 15:16:22 +0000

PHP support can be easily added, I will release a new version soon.

New comment by rkochanowski in "Show HN: CLI tool for detecting non-exact code duplication with embedding models"

rkochanowski — Thu, 02 Jul 2026 14:19:57 +0000

I built Slopo to solve one specific problem: finding similar code that is hardest to detect by other tools, coding AI agents, and humans.

It finds similar-looking code with embeddings. This detects more than just copy-paste clones or even clones with minor changes. Similar code is often not a clone to refactor, and this is a trade-off. Initial results need to be verified, but coding agents can do this quickly. Example prompts are available on https://slopo.dev

Additionally, similar code distant in the codebase is ranked higher to focus on less obvious duplication.

The results differ a lot depending on the codebase. I noticed that sometimes most of the detected duplicates are false positives, but the remaining ones are strong candidates to refactor or even bugs. Sometimes it reveals much more real duplication.

Show HN: CLI tool for detecting non-exact code duplication with embedding models

rkochanowski — Thu, 02 Jul 2026 14:19:24 +0000

Article URL: https://github.com/rafal-qa/slopo

Comments URL: https://news.ycombinator.com/item?id=48762038

Points: 61

# Comments: 29

New comment by rkochanowski in "Ask HN: Am I missing something with AI"

rkochanowski — Thu, 25 Jun 2026 10:41:55 +0000

That's why it's so important to design a solution before letting AI implement it. I noticed that AI often can't see clean, elegant and primitive solutions that fit best for a given problem. Even when I ask if it can be designed in a simpler way, it can't see what I can see. I think this is one of the most important points where humans should be involved.

New comment by rkochanowski in "Show HN: Recall – Local project memory for Claude Code"

rkochanowski — Mon, 22 Jun 2026 08:32:12 +0000

There are bunch of tools to manage context or fix what Claude does wrong. They may be popular because non software engineers want to improve their workflows, like you mentioned.

But are they really working instead of making it worse? Are there any tests or real case studies done by users not tool's author? From my experience, removing from context works more often then adding.

New comment by rkochanowski in "Ask HN: What is your #1 practical lesson or "aha" moment from coding with AI?"

rkochanowski — Sun, 21 Jun 2026 22:02:53 +0000

I realized how easily it can be confused and generate a bad answer in effect. It's difficult to control context when we have instructions in AGENTS/CLAUDE.md, memory, SKILLs, sometimes additional tools for optimization. On top of this our prompt. When coding agent misbehaves, there is often some issue on my side that steered it in a wrong direction. It requires more precision than I initially thought.

New comment by rkochanowski in "Ask HN: Do you find vibe coding / agentic engineering to be fulfilling?"

rkochanowski — Sat, 20 Jun 2026 13:38:49 +0000

I think "lack of effort required" is the key.

We can just throw prompts without understanding and caring about output. This way we can build something in 10 min we don't own.

We can also do real engineering with our architecture design, thinking about what exactly are we building and why, being critical about produced output. This way of building takes more time, but this is something we built ourselves and can be potentially more fulfilling. And this is not something anyone can copy with right prompt. There are more nuances than AI-generated code.

Another way to find fulfillment is to focus more on solving a real problem and building something people will use. Less on writing a code.