Hacker News: nyellin

New comment by nyellin in "Atlassian enables default data collection to train AI"

nyellin — Mon, 20 Apr 2026 19:26:48 +0000

Why does Atlassian need to train AI models?

Ask HN: Agent orchestrators / UIs you use on top of Claude?

nyellin — Thu, 16 Apr 2026 20:18:39 +0000

Comments URL: https://news.ycombinator.com/item?id=47798952

Points: 2

# Comments: 2

New comment by nyellin in "Launch HN: Freestyle – Sandboxes for Coding Agents"

nyellin — Mon, 06 Apr 2026 20:22:44 +0000

Is it possible to run a Kubernetes cluster inside one? (E.g. via KIND.)

If so, we'd very much like to test this. We make extensive use of Claude Code web but it can't effectively test our product inside the sandbox without running a K8s cluster

Ask HN: Giving AI agents a credit card

nyellin — Fri, 20 Feb 2026 15:13:58 +0000

Is there a safe way to give agents a credit card with usage caps? Ideally also the ability for purchases to pend until human approval.

Comments URL: https://news.ycombinator.com/item?id=47089014

Points: 2

# Comments: 0

New comment by nyellin in "Benchmarking OpenTelemetry: Can AI trace your failed login?"

nyellin — Thu, 29 Jan 2026 20:43:11 +0000

We publish the benchmarks for HolmesGPT (CNCF sandbox project) at https://holmesgpt.dev/development/evaluations/

New comment by nyellin in "Benchmarking OpenTelemetry: Can AI trace your failed login?"

nyellin — Thu, 29 Jan 2026 20:38:52 +0000

HolmesGPT maintainer here: our benchmarks [1] tell a very different story, as does anecdotal evidence from our customers- including Fortune 500 using SRE agents in incredibly complex production environments.

We're actually struggling a bit with benchmark saturation right now. Opus does much better in the real world than Sonnet but it's hard to create sophisticated enough benchmarks to show that in the lab. When we run benchmarks with a small number of iterations Sonnet even wins sometimes.

[1] https://holmesgpt.dev/development/evaluations/history/

New comment by nyellin in "How to code Claude Code in 200 lines of code"

nyellin — Thu, 08 Jan 2026 20:58:59 +0000

Haiku is called often, but not always the way you think. E.g. every time you write something CC invokes Haiku multiple times to generate the 'delightful 1-2 word phrase used to indicate progress to the user' (Doing Stuff, Wizarding, etc)

New comment by nyellin in "How to code Claude Code in 200 lines of code"

nyellin — Thu, 08 Jan 2026 20:56:42 +0000

Not necessarily true. Subagents allow for parallelization but they can decrease accuracy dramatically if you're not careful because there are often dependencies between tasks and swapping context windows with a summary is extremely lossy.

For the longest time, Claude Code itself didnt really use subagents much by default, other than supporting them as a feature eager users could configure. (Source is reverse engineering we did on Claude code using the fantastic CC tracing tool Simon Willison wrote about once. This is also no longer true on latest versions that have e.g. an Explore subagent that is actively used.)

New comment by nyellin in "How to code Claude Code in 200 lines of code"

nyellin — Thu, 08 Jan 2026 20:52:37 +0000

Forgot to address the easiest part:

> - how can I reliably call tools with the right schema?

This is typically done by enabling strict mode for tool calling which is a hermetic solution. Makes llm unable to generate tokens that would violate the schema. (I.e. LLM samples tokens only from the subset of tokens that lead to valid schema generation.)

New comment by nyellin in "How to code Claude Code in 200 lines of code"

nyellin — Thu, 08 Jan 2026 20:50:42 +0000

Re (1) use a TODOs system like Claude code.

Re (2) also fairly easy! It's just a summarization prompt. E.g. this is the one we use in our agent: https://github.com/HolmesGPT/holmesgpt/blob/62c3898e4efae69b...

Or just use the Claude Code SDK that does this all for you! (You can also use various provider-specific features for 2 like automatic compaction on OpenAI responses endpoint.)

New comment by nyellin in "How to code Claude Code in 200 lines of code"

nyellin — Thu, 08 Jan 2026 20:45:13 +0000

There's a bit more to it!

For example, the agent in the post will demonstrate 'early stopping' where it finishes before the task is really done. You'd think you can solve this with reasoning models, but it doesn't actually work on SOTA models.

To fix 'early stopping' you need extra features in the agent harness. Claude Code does this with TODOs that are injected back into every prompt to remind the LLM what tasks remain open. (If you're curious somewhere in the public repo for HolmesGPT we have benchamrks with all the experiments we ran to solve this - from hypothesis tracking to other exotic approaches - but TODOs always performed best.)

Still, good article. Agents really are just tools in a loop. It's not rocket science.

Show HN: HolmesGPT – OSS AI Agent for On-Call and Observability

nyellin — Mon, 24 Mar 2025 13:53:11 +0000

Hi HN, I'm Natan from Robusta.dev.

We open-sourced HolmesGPT, an AI agent to help on-call engineers triage and respond to alerts. It pulls observability data from your existing tools and use an LLM to correlate data, highlight the most important logs to look at, and (experimentally) open pull requests to fix the problem.

We started with integrations for:

* Observability tools to fetch metrics, logs, and traces

* Incident response (bi-directional OpsGenie/PagerDuty)

* ITSM (ServiceNow coming soon)

* Slack (start investigating by tagging @holmes on an any alert)

We're soon adding MCP support to bring more data into investigations.

One unique feature is multimodal output - the ability to render graphs or diagrams related to the incident. That's only in our commercial platform right now, but we're thinking how to bring it to the OSS in a way that makes sense.

We built this because we were tired digging through dashboards while half-asleep at 3AM. We're releasing the core as open source because we think for this to succeed it has to be able to access data from all different sources - and OSS is the best way to encourage that. Too many solutions are locked into a specific vendor that limit usefulness. (E.g. DataDog AI can't tap data in Splunk when investigating. But in the real world everyone has multiple observability tools.)

We’d love feedback from the HN community - thoughts, ideas, and of course PRs.

Comments URL: https://news.ycombinator.com/item?id=43461114

Points: 2

# Comments: 0

New comment by nyellin in "Meta Uses LLMs to Improve Incident Response"

nyellin — Wed, 20 Nov 2024 07:37:43 +0000

I know there are already a number of comments here about proprietary solutions.

If you're looking for something open source: https://github.com/robusta-dev/holmesgpt/

Show HN: We released an AI Agent for Prometheus alerts and on-call

nyellin — Wed, 06 Nov 2024 14:10:10 +0000

Article URL: https://github.com/robusta-dev/holmesgpt

Comments URL: https://news.ycombinator.com/item?id=42062526

Points: 1

# Comments: 0

New comment by nyellin in "Leveraging AI for efficient incident response"

nyellin — Fri, 23 Aug 2024 11:03:26 +0000

Here you go: https://www.youtube.com/live/Jml1hk6I5Wo?si=YbjJKRkO4yf0bOlx

And thanks for submitting!

New comment by nyellin in "Leveraging AI for efficient incident response"

nyellin — Fri, 23 Aug 2024 04:48:43 +0000

We've open sourced something with similar goals that you can use today: https://github.com/robusta-dev/holmesgpt/

We're taking a slightly different angle than what Facebook published, in that we're primarily using tool calling and observability data to run investigations.

What we've released really shines at surfacing up relevant observability data automatically, and we're soon planning to add the change-tracking elements mentioned in the Facebook post.

If anyone is curious, I did a webinar with PagerDuty on this recently.

New comment by nyellin in "Ask HN: Who is hiring? (July 2024)"

nyellin — Sun, 07 Jul 2024 08:54:45 +0000

Thank you!

New comment by nyellin in "Ask HN: Who is hiring? (July 2024)"

nyellin — Tue, 02 Jul 2024 12:20:23 +0000

Robusta.dev | REMOTE (EUROPE) or ONSITE(ISRAEL)| Staff Software Engineer, Backend Team Lead

We investigate cloud alerts with LLMs - see http://github.com/robusta-dev/holmesgpt/

Email natan at our domain

New comment by nyellin in "Claude 3.5 Sonnet"

nyellin — Thu, 27 Jun 2024 04:00:32 +0000

aider is pretty good - https://github.com/paul-gauthier/aider

New comment by nyellin in "Claude 3.5 Sonnet"

nyellin — Thu, 27 Jun 2024 03:36:47 +0000

Interesting. We're finding that it's more verbose than GPT-4o (and often ignores instructions to give output in a specific format), but does better when faced with ambiguity and potentially confusing situations.

Here is an example using http://github.com/robusta-dev/holmesgpt/ where ask it to troubleshoot a payment processing workload in a Kubernetes cluster vs the GPT-4o output underneath:

  holmes ask --model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0" "what is wrong with the payment processor?"
  AI: To investigate what's wrong with the payment processor, we'll need to gather more information about it. First, let's try to find the relevant Kubernetes resources related to the payment processor.

  Running `kubectl get -A --show-labels pod | grep payment`

  Thank you for your question about the payment processor. initial search, I've found two pods related to payment processing. Let's investigate the one that seems to be having issues:

  1. In the default namespace, there's a pod named "payment-processing-worker-5bb5bd75cd-x4g99" which is in a CrashLoopBackOff state. This indicates that the pod is repeatedly crashing and restarting.

  2. In the sock-shop namespace, there's a pod named "payment-ff86cd6f8-rs47b" which appears to be running normally.

  Let's focus on the problematic pod in the default namespace. To get more information about what's wrong, we'll need to look at its logs and description.

  Running `kubectl describe pod payment-processing-worker-5bb5bd75cd-x4g99 -n default`

  AI: Based on the information gathered, it appears that there is an issue with the payment processor in the Kubernetes cluster. Let me explain what's wrong:

  1 The payment processing worker pod is in a CrashLoopBackOff state:
    • Pod name: payment-processing-worker-5bb5bd75cd-x4g99
    • Namespace: default
    • Status: CrashLoopBackOff
  2 The pod is continuously restarting and failing:
    • Restart Count: 242
    • The container is in a "Waiting" state with the reason "CrashLoopBackOff"
  3 The root cause of the issue appears to be related to an environment variable:
    • The container is running a bash script that checks for the presence of the DEPLOY_ENV environment variable.
    • If DEPLOY_ENV is not set, the container exits immediately after printing an error message.
    • The command being run is:

       if [[ -z "${DEPLOY_ENV}" ]]; then echo Environment variable DEPLOY_ENV is undefined ; else while true; do echo hello; sleep 10;done; fi

  4 The environment section in the pod description shows:

    Environment: 

   This indicates that no environment variables are set for the container.
  5 The container is exiting with a status of "Completed" and an exit code of 0, which suggests that it's running the script, finding that DEPLOY_E

  To fix this issue:

  1 The DEPLOY_ENV environment variable needs to be set for the payment-processing-worker deployment.
  2 You should update the deployment configuration to include this environment variable.
  3 After setting the environment variable, the pod should stop crashing and enter a running state where it prints "hello" every 10 seconds.

  To resolve this, you'll need to update the deployment configuration for the payment-processing-worker. You can do this by editing the deployment and adding the DEPLOY_ENV environment variable with an appropriate value.

  Would you like me to show you how to update the deployment with the required environment variable?

Note that it explicitly picks up on the ambiguity in the question and notes that there are 2 payment related workloads and one is healthy.

GPT-4o:

  $ holmes ask "what is wrong with the payment processor?"

  Running `kubectl get -A --show-labels pod | grep payment-processor`                     
  Running `kubectl get -A --show-labels deployment | grep payment-processor`                    
  Running `kubectl get -A --show-labels service | grep payment-processor`              
  Running `kubectl get -A --show-labels pod | grep payment`
  Running `kubectl describe pod payment-processing-worker-5bb5bd75cd-x4g99 -n default`
  Running `kubectl logs payment-processing-worker-5bb5bd75cd-x4g99 -n default --previous`
  
  AI: payment-processing-worker-5bb5bd75cd-x4g99 in namespace default crashes due to missing DEPLOY_ENV environment variable.

  Fix: Define DEPLOY_ENV in the deployment spec.

  Example patch:

  env:
   - name: DEPLOY_ENV
     value: "production"