Hacker News: ofermend

Ask HN: How do you handle PR density (and slop) in open source

ofermend — Mon, 23 Mar 2026 16:56:22 +0000

Given that coding agents like Claude Code or Codex are now quite good, there's of course a massive increase in PRs submitted to open source projects. Not all of them are great, some are true AI slop.

See for example Daniel Stenberg on the topic as it relates to cURL: https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/ and https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-slops/

I'm curious how much of a problem this is to other open source projects

Committers, how much of this pain are you seeing, and are you using any AI tools to mitigate or address this issue?

Comments URL: https://news.ycombinator.com/item?id=47492078

Points: 3

# Comments: 1

New comment by ofermend in "Gemini 3 Flash: Frontier intelligence built for speed"

ofermend — Thu, 18 Dec 2025 16:18:15 +0000

Gemini-3-flash is now on Vectara hallucination leaderboard, and rated at 13.5% grounded hallucination rate.

https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "Nvidia Nemotron 3 Family of Models"

ofermend — Wed, 17 Dec 2025 01:20:51 +0000

We just evaluated Nemotron-3 for Vectara's hallucination leaderboard.

It scores at 9.6% hallucination rate, similar to qwen3-next-80b-a3b-thinking (9.3%) but of course it is much smaller.

https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "GPT-5.2"

ofermend — Fri, 12 Dec 2025 05:25:46 +0000

GPT-5.2 just added to Vectara Hallucination Leaderboard. Definitely an improvement over GPT-5.1 - congrats to the team

https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "Claude Opus 4.5"

ofermend — Tue, 25 Nov 2025 01:08:06 +0000

Can't wait to try Opus 4.5

We just evaluated it for Vectara's grounded hallucination leaderboard: it scores at 10.9% hallucination rate, better than Gemini-3, GPT-5.1-high or Grok-4.

https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "[dead]"

ofermend — Wed, 03 Sep 2025 02:48:30 +0000

If you have built AI agents in the last 6-12 months you know they fail a lot.

I built this repository to be a community-curated list of failure modes, techniques to mitigate, and other resources, so that we can all learn from each other and build better agents.

Contributions encouraged!

New comment by ofermend in "[dead]"

ofermend — Thu, 17 Jul 2025 14:42:31 +0000

Enterprise Deep Research is like "consumer" deep research just pointed at your private data, and I think may become the "killer app" of Agentic AI for business.

Lots of valuable use-cases: compliance monitoring, sales enablement, onboarding, legal, and many others.

What use-cases would you use this for?

New comment by ofermend in "About AI Evals"

ofermend — Sat, 05 Jul 2025 18:26:37 +0000

One of the biggest challenges in RAG Evaluation is the assumption that you somehow can get the "source of truth" generated, specifically the set of "golden answers" (or golden chunks/documents). In practice that is extremely difficult and non scalable. Open-RAG-Eval is a new open source project that aims to address that via reference-free evaluation such as UMBRELA and AutoNuggetizer scores.

Repo: https://github.com/vectara/open-rag-eval and a nice UI to use this with: openevaluation.ai

Would love to hear feedback on this after you try it out and what you might want to see on the roadmap.

New comment by ofermend in "Trust in AI"

ofermend — Thu, 03 Jul 2025 16:32:40 +0000

Well, we expect AI to become AGI sometime in the future. Some say it's here, others say it's in 5 years or 50 years or whatever. So imagine AGI is here already (for sake of argument), and really has superintelligence, and will be able to have agency. How do we need to treat "it"? Over history, humans and society created mechanism to overcome distrust, and our ability to collaborate is what helped us thrive. Should we think about our upcoming "relationship" with AI from that perspective as well?

Trust in AI

ofermend — Thu, 03 Jul 2025 14:51:31 +0000

We often debate "can humans trust AI?" An equally important question: can AI trust humans?

Comments URL: https://news.ycombinator.com/item?id=44455654

Points: 1

# Comments: 3

Shadow AI

ofermend — Tue, 17 Jun 2025 00:06:44 +0000

Shadow AI = Shadow IT 2.0

Large companies implementing generative AI are experiencing the re-emergence of all the issues and headaches we know well and remember, that were associated with "shadow IT".

This time, applied to RAG, LLMs and agents.

Curious how people are addressing this?

Comments URL: https://news.ycombinator.com/item?id=44294500

Points: 1

# Comments: 0

New comment by ofermend in "[dead]"

ofermend — Fri, 13 Jun 2025 04:54:12 +0000

RAG Evaluation is difficult, primarily because it's hard to come up with "golden answers" (or golden chunks).

We made Open-RAG-Eval to solve this - RAG Eval that only requires the question, yet provides great metrics for retrieval, generation, hallucination and citations for any RAG setup.

This was in collaboration with Jimmy Lin and his students at UWaterloo.

It has connectors to LangChain, LlamaIndex and Vectara, and hoping others can contribute more connectors to other RAG systems.

repo: https://github.com/vectara/open-rag-eval

UI for reviewing eval results: https://openevaluation.ai/

Papers: https://arxiv.org/pdf/2406.06519 and https://arxiv.org/abs/2504.15068

New comment by ofermend in "The Llama 4 herd"

ofermend — Tue, 08 Apr 2025 01:09:39 +0000

A great day for open source, and so glad to see llama4 out. However, I'm a bit disappointed that the hallucination rates of Llama4 are not as low as I would have liked (TL;DR slightly higher than Llama3).

Check the numbers on the hallucination leaderboard: https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "Gemini 2.5"

ofermend — Wed, 26 Mar 2025 03:16:29 +0000

This model is quite impressive. Not just useful for math/research with great reasoning, it also maintained a very low hallucination rate of 1.1% on Vectara Hallucination Leaderboard: https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "[dead]"

ofermend — Tue, 25 Mar 2025 13:53:39 +0000

It is common these days to see in large companies multiple teams developing isolated RAG applications. This is similar to the problem of "Shadow IT" back in the early cloud era - causes a big headache to IT teams.

I work at Vectara, and we see this all the time. Wondering how others are experiencing this?

New comment by ofermend in "[dead]"

ofermend — Tue, 28 Jan 2025 16:06:03 +0000

DeepSeek-R1 is an amazing reasoning LLM, but it seems to hallucinate more than we might expect.

New comment by ofermend in "Gemini 2.0: our new AI model for the agentic era"

ofermend — Wed, 11 Dec 2024 23:50:23 +0000

Gemini-2.0-Flash does extremely well on the Hallucination Evaluation Leaderboard, at 1.3% hallucination rate https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "[dead]"

ofermend — Wed, 27 Nov 2024 20:20:29 +0000

We've done a study (see link) that shows that - unlike common belief - semantic chunking is not always the best approach.

Curious to hear from the YC community - anyone else did systemic testing and if so what did you find?

New comment by ofermend in "IBM Granite 3.0: open enterprise models"

ofermend — Thu, 24 Oct 2024 15:48:48 +0000

Check out Granite 3.0 on the hallucination leaderboard: https://github.com/vectara/hallucination-leaderboard

New comment by ofermend in "[dead]"

ofermend — Wed, 23 Oct 2024 17:59:22 +0000

We recently launched UDF reranking as part of the RAG stack, and we think this supports a lot of interesting use-cases to go beyond simple relevance. For example, it supports ranking by distance (geo-location), by recency, and more.

I wanted to ask advice from the HN community: what are some real use-cases you have that can benefit from UDF reranking in RAG?