Hacker News: sdesol

New comment by sdesol in "Amp, Inc. – Amp is spinning out of Sourcegraph"

sdesol — Tue, 09 Dec 2025 07:24:32 +0000

I think the issue was they (the parent commenter) didn't properly convey and/or did not realize they were arguing for context. Data that is difficult to come by that can be used in a prompt is valuable. Being able to workaround something with clever wording (i.e. prompt) is not a moat.

New comment by sdesol in "ClickHouse acquires LibreChat, open-source AI chat platform"

sdesol — Mon, 10 Nov 2025 18:12:28 +0000

Full Disclosure. I am the author of https://github.com/gitsense/chat

> The idea behind the Agentic Data Stack is a higher-level integration to provide a composable software stack for agentic analytics that users can setup quicky, with room for customization.

I agree with this. For those who have been programming with LLM, the difference between something working and not working can be a simple "sentence" conveying the required context. I strongly believe data enrichment will be one of the main ways we can make agents more effective and efficient. Data enrichment is the foundation for my personal assistant feature https://github.com/gitsense/chat/blob/main/packages/chat/wid...

Basically instead of having agents blindly grep for things, you would provide them with analyzers that they can use to search with. By making it dead simple for domain experts to extract 'business logic' from their codebase/data, we can solve a lot of problems, much more efficiently. Since data is the key, I can see why ClickHouse will make this move since they probably want to become the storage for all business logic.

Note: I will be dropping a massive update to how my tool generates and analyzes metadata this week, so don't read too much into the demo or if you decide to play with it. I haven't really been promoting it because the flow hasn't been right, but it should be this week.

New comment by sdesol in "New York Times, AP, Newsmax and others say they won't sign new Pentagon rules"

sdesol — Tue, 14 Oct 2025 06:10:46 +0000

> all I want at this point, is my politicians to be smarter than me

I don't care if they are smarter than me. I need them to be smart enough to know they are not that smart. I don't expect politicians to be smart. I expect them to be good listeners and be the voice for the people.

New comment by sdesol in "OpenAI Is Just Another Boring, Desperate AI Startup"

sdesol — Fri, 03 Oct 2025 18:14:51 +0000

> I'm in awe they are still allowing free users at all.

I am not.

> The free tier is enough for me to use it as a helper at work, and I'd probably pay for it tomorrow if they cut off the free tier.

You are sort of proving the point that thid isn't crazy. They want to be the dealer of choice and they can afford to give you the hit now for free.

New comment by sdesol in "Cerebras systems raises $1.1B Series G"

sdesol — Tue, 30 Sep 2025 23:11:54 +0000

> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.

Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.

Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:

- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.

- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.

- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.

- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.

I will probably continue to pay for the $50 dollar plan for these use cases.

1. Applying LLM generated patches

Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.

2. Building context

Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.

3. AI Assistant

Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.

Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.

New comment by sdesol in "Context is the bottleneck for coding agents now"

sdesol — Fri, 26 Sep 2025 17:32:31 +0000

> A human can effectively discard or disregard prior information as the narrow window of focus moves to a new task, LLMs seem incredibly bad at this.

This is how I designed my LLM chat app (https://github.com/gitsense/chat). I think agents have their place, but I really think if you want to solve complex problems without needlessly burning tokens, you will need a human in the loop to curate the context. I will get to it, but I believe in the same way that we developed different flows for working with Git, we will have different 'Chat Flows' for working with LLMs.

I have an interactive demo at https://chat.gitsense.com which shows how you can narrow the focus of the context for the LLM. Click "Start GitSense Chat Demos" then "Context Engineering & Management" to go through the 30 second demo.

New comment by sdesol in "Everyone's trying vectors and graphs for AI memory. We went back to SQL"

sdesol — Thu, 25 Sep 2025 01:14:08 +0000

How are you quantify the speed at which results are reviewed?

New comment by sdesol in "Everyone's trying vectors and graphs for AI memory. We went back to SQL"

sdesol — Wed, 24 Sep 2025 17:11:50 +0000

Honestly Gemini Flash Lite and models on Cerebras are extremely fast. I know what you are saying. If the goal is to get a lot of results where they may or may not be relevant, then yes, it is an order of a magnitude slower.

If you take into consideration the post analysis process, which is what inference is trying to solve, is it an order of a magnitude slower?

New comment by sdesol in "Everyone's trying vectors and graphs for AI memory. We went back to SQL"

sdesol — Wed, 24 Sep 2025 17:05:42 +0000

I'm guessing you are referring to https://github.com/gitsense/chat/tree/main/data/analyze or https://github.com/gitsense/chat/tree/main/packages/chat/wid...

The number is actually the order in the chat so 1.md would be the first message, 2.md would be the second and so forth.

If you goto https://chat.gitsense.com and click on the "Load Personal Help Guide" you can see how it is used. Since I want you to be able to chat with the document, I will create a new chat tree and use the directory structure and the 1,2,3... markdown files to determine message order.

New comment by sdesol in "Everyone's trying vectors and graphs for AI memory. We went back to SQL"

sdesol — Wed, 24 Sep 2025 16:50:44 +0000

You could instruct the LLM to classify messages with high level tags like for coffee, drinks, etc. always include beverage.

Given how fast interference has become and given current supported context window sizes for most SOTA models, I think summarizing and having the LLM decide what is relevant is not that fragile at all for most use cases. This is what I do with my analyzers which I talk about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

New comment by sdesol in "Everyone's trying vectors and graphs for AI memory. We went back to SQL"

sdesol — Wed, 24 Sep 2025 16:09:10 +0000

I haven't looked at the code, but it might do what I do with my chat app which is talked about at https://github.com/gitsense/chat/blob/main/packages/chat/wid...

The basic idea is, you don't search for a single term but rather you search for many. Depending on the instructions provided in the "Query Construction" stage, you may end up with a very high level search term like beverage or you may end up with terms like 'hot-drinks', 'code-drinks', etc.

Once you have the query, you can do a "Broad Search" which returns an overview of the message and from there the LLM can determine which messages it should analyze further if required.

Edit.

I should add, this search strategy will only work well if you have a post message process. For example, after every message save/upddate, you have the LLM generate an overview. These are my instructions for my tiny overview https://github.com/gitsense/chat/blob/main/data/analyze/tiny... that is focused on generating the purpose and keywords that can be used to help the LLM define search terms.

New comment by sdesol in "Tesla market share in US drops to lowest since 2017"

sdesol — Tue, 09 Sep 2025 02:45:31 +0000

> We're putting aside the political stuff because there isn't a lot to discuss

I don't agree, as we are not quantifying the emotional aspect of the purchasing process. If people "love" the brand, they are willing to overlook a lot of things. Tesla was a status symbol and is now seen as a regret purchase and a toxic brand for many (see Europe and Canada for examples). I can't see how "politics" should not be considered as it does play a critical role in how people spend money. There is a reason why a lot of companies are not open about politics and I don't think I've ever seen a CEO that was so forth coming with their beliefs as Elon Musk.

New comment by sdesol in "GLM 4.5 with Claude Code"

sdesol — Sat, 06 Sep 2025 06:18:44 +0000

> But in my testing, other models do not work well. It looks like prompts are either very optimized for Claude, or other models are just not great yet with such an agentic environment.

Anybody who has done any serious development with LLMs would know that prompts are not universal. The reason why Claude Code is good is because Anthropic knows Claude Sonnet is good, and that they only need to create prompts that work well with their models. They also have the ability to train their models to work with specific tools and so forth.

It really is a kind of fool's errand to try to create agents that can work well with many different models from different providers.

New comment by sdesol in "Anthropic raises $13B Series F"

sdesol — Wed, 03 Sep 2025 15:38:29 +0000

It might not be their money, but they are paid a management fee and if they cannot provide some return, people will stop using them.

New comment by sdesol in "A staff engineer's journey with Claude Code"

sdesol — Tue, 02 Sep 2025 23:45:29 +0000

It will certainly be interesting to see how businesses evolve in the upcoming years. What is written in stone is, you (employee) will be measured and I am curious to see what developers will be measured by in the future. Will you be at a greater risk of layoffs/lack of promotions/etc. if you spend more on AI? How do you as a developer prove that it is you and not the LLM that should be praised?

New comment by sdesol in "Anthropic raises $13B Series F"

sdesol — Tue, 02 Sep 2025 17:53:24 +0000

> So it is a game of being the one that is left standing

Or the last investor. When this type of money is raised, you can be sure the earlier investors are looking for ways to have a soft landing.

New comment by sdesol in "Survey: a third of senior developers say over half their code is AI-generated"

sdesol — Mon, 01 Sep 2025 07:29:55 +0000

Joking aside, if he is one of the top developers in the company and if he is "actually" a good developer, when compared to others outside of the company, then I can see this bill.

The current feature that I'm working on, required 100 messages to finalize things and I would say the context window was around 35k - 50k per "chat completion". My model of choice is Gemini 2.5 Flash which has an input cost of $0.30/1M. Compare this to Sonnet which is $3.00/1M.

If the person was properly designing and instructing the LLM to build something advanced correctly, I can see the bill being quite high. I personally don't think you need to use Sonnet 99% of the time, but if somebody else is willing to pay the bill, why not.

New comment by sdesol in "Deploying DeepSeek on 96 H100 GPUs"

sdesol — Sat, 30 Aug 2025 16:37:57 +0000

No what I am saying is there are more applications for batch processing that will help with utilization. I can see developers and companies using off hour processing to prep their data for agentic coding.

New comment by sdesol in "Deploying DeepSeek on 96 H100 GPUs"

sdesol — Fri, 29 Aug 2025 21:11:55 +0000

I don't think you need to be big data to benefit.

A major issue we have right now is, we want the coding process to be more "Agentic", but we don't have an easy way for LLMs to determine what to pull into context to solve a problem. This is a problem that I am working on with my personal AI search assistant, which I talk about below:

https://github.com/gitsense/chat/blob/main/packages/chat/wid...

Analyzers are the "Brains" for my search, but generating the analysis is both tedious and can be costly. I'm working on the tedious part and with batch processing, you can probably process thousands of files for under 5 dollars with Gemini 2.5 Flash.

With batch processing and the ability to continuously analyze 10s of thousands of files, I can see companies wanting to make "Agentic" coding smarter, which should help with GPU utilization and drive down the cost of software development.

New comment by sdesol in "Grok Code Fast 1"

sdesol — Fri, 29 Aug 2025 18:20:11 +0000

As a bit of a side note, I want to like Cerebras, but using any of the models through OpenRouter that uses them has lead to, too many throttling responses. Like you can't seem to make a few calls per minute. I'm not sure if Cerebras is throttling OpenRouter or if they are throttling everybody.

If somebody from Cerebras is reading this, are you having capacity issues?