<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ofermend</title><link>https://news.ycombinator.com/user?id=ofermend</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 05 Apr 2026 23:56:02 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ofermend" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Ask HN: How do you handle PR density (and slop) in open source]]></title><description><![CDATA[
<p>Given that coding agents like Claude Code or Codex are now quite good, there's of course a massive increase in PRs submitted to open source projects. Not all of them are great, some are true AI slop.<p>See for example Daniel Stenberg on the topic as it relates to cURL:
https://daniel.haxx.se/blog/2026/01/26/the-end-of-the-curl-bug-bounty/ and https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-slops/<p>I'm curious how much of a problem this is to other open source projects<p>Committers, how much of this pain are you seeing, and are you using any AI tools to mitigate or address this issue?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47492078">https://news.ycombinator.com/item?id=47492078</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 23 Mar 2026 16:56:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=47492078</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=47492078</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47492078</guid></item><item><title><![CDATA[New comment by ofermend in "Gemini 3 Flash: Frontier intelligence built for speed"]]></title><description><![CDATA[
<p>Gemini-3-flash is now on Vectara hallucination leaderboard, and rated at 13.5% grounded hallucination rate.<p><a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Thu, 18 Dec 2025 16:18:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=46314655</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=46314655</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46314655</guid></item><item><title><![CDATA[New comment by ofermend in "Nvidia Nemotron 3 Family of Models"]]></title><description><![CDATA[
<p>We just evaluated Nemotron-3 for Vectara's hallucination leaderboard.<p>It scores at 9.6% hallucination rate, similar to qwen3-next-80b-a3b-thinking (9.3%) but of course it is much smaller.<p><a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Wed, 17 Dec 2025 01:20:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=46297084</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=46297084</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46297084</guid></item><item><title><![CDATA[New comment by ofermend in "GPT-5.2"]]></title><description><![CDATA[
<p>GPT-5.2 just added to Vectara Hallucination Leaderboard. Definitely an improvement over GPT-5.1 - congrats to the team<p><a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Fri, 12 Dec 2025 05:25:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=46241136</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=46241136</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46241136</guid></item><item><title><![CDATA[New comment by ofermend in "Claude Opus 4.5"]]></title><description><![CDATA[
<p>Can't wait to try Opus 4.5<p>We just evaluated it for Vectara's grounded hallucination leaderboard: it scores at 10.9% hallucination rate, better than Gemini-3, GPT-5.1-high or Grok-4.<p><a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Tue, 25 Nov 2025 01:08:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=46041274</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=46041274</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46041274</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>If you have built AI agents in the last 6-12 months you know they fail a lot.<p>I built this repository to be a community-curated list of failure modes, techniques to mitigate, and other resources, so that we can all learn from each other and build better agents.<p>Contributions encouraged!</p>
]]></description><pubDate>Wed, 03 Sep 2025 02:48:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=45111787</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=45111787</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45111787</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>Enterprise Deep Research is like "consumer" deep research just pointed at your private data, and I think may become the "killer app" of Agentic AI for business.<p>Lots of valuable use-cases: compliance monitoring, sales enablement, onboarding, legal, and many others.<p>What use-cases would you use this for?</p>
]]></description><pubDate>Thu, 17 Jul 2025 14:42:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=44593970</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44593970</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44593970</guid></item><item><title><![CDATA[New comment by ofermend in "About AI Evals"]]></title><description><![CDATA[
<p>One of the biggest challenges in RAG Evaluation is the assumption that you somehow can get the "source of truth" generated, specifically the set of "golden answers" (or golden chunks/documents).
In practice that is extremely difficult and non scalable.
Open-RAG-Eval is a new open source project that aims to address that via reference-free evaluation such as UMBRELA and AutoNuggetizer scores.<p>Repo: <a href="https://github.com/vectara/open-rag-eval">https://github.com/vectara/open-rag-eval</a>
and a nice UI to use this with: openevaluation.ai<p>Would love to hear feedback on this after you try it out and what you might want to see on the roadmap.</p>
]]></description><pubDate>Sat, 05 Jul 2025 18:26:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44474563</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44474563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44474563</guid></item><item><title><![CDATA[New comment by ofermend in "Trust in AI"]]></title><description><![CDATA[
<p>Well, we expect AI to become AGI sometime in the future. Some say it's here, others say it's in 5 years or 50 years or whatever.
So imagine AGI is here already (for sake of argument), and really has superintelligence, and will be able to have agency. How do we need to treat "it"? 
Over history, humans and society created mechanism to overcome distrust, and our ability to collaborate is what helped us thrive. Should we think about our upcoming "relationship" with AI from that perspective as well?</p>
]]></description><pubDate>Thu, 03 Jul 2025 16:32:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=44456757</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44456757</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44456757</guid></item><item><title><![CDATA[Trust in AI]]></title><description><![CDATA[
<p>We often debate "can humans trust AI?"
An equally important question: can AI trust humans?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44455654">https://news.ycombinator.com/item?id=44455654</a></p>
<p>Points: 1</p>
<p># Comments: 3</p>
]]></description><pubDate>Thu, 03 Jul 2025 14:51:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=44455654</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44455654</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44455654</guid></item><item><title><![CDATA[Shadow AI]]></title><description><![CDATA[
<p>Shadow AI = Shadow IT 2.0<p>Large companies implementing generative AI are experiencing the re-emergence of all the issues and headaches we know well and remember, that were associated with "shadow IT".<p>This time, applied to RAG, LLMs and agents.<p>Curious how people are addressing this?</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44294500">https://news.ycombinator.com/item?id=44294500</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 17 Jun 2025 00:06:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=44294500</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44294500</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44294500</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>RAG Evaluation is difficult, primarily because it's hard to come up with "golden answers" (or golden chunks).<p>We made Open-RAG-Eval to solve this - RAG Eval that only requires the question, yet provides great metrics for retrieval, generation, hallucination and citations for any RAG setup.<p>This was in collaboration with Jimmy Lin and his students at UWaterloo.<p>It has connectors to LangChain, LlamaIndex and Vectara, and hoping others can contribute more connectors to other RAG systems.<p>repo: <a href="https://github.com/vectara/open-rag-eval">https://github.com/vectara/open-rag-eval</a><p>UI for reviewing eval results: <a href="https://openevaluation.ai/" rel="nofollow">https://openevaluation.ai/</a><p>Papers: <a href="https://arxiv.org/pdf/2406.06519" rel="nofollow">https://arxiv.org/pdf/2406.06519</a> and <a href="https://arxiv.org/abs/2504.15068" rel="nofollow">https://arxiv.org/abs/2504.15068</a></p>
]]></description><pubDate>Fri, 13 Jun 2025 04:54:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=44265791</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=44265791</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44265791</guid></item><item><title><![CDATA[New comment by ofermend in "The Llama 4 herd"]]></title><description><![CDATA[
<p>A great day for open source, and so glad to see llama4 out.
However, I'm a bit disappointed that the hallucination rates of Llama4 are not as low as I would have liked (TL;DR slightly higher than Llama3).<p>Check the numbers on the hallucination leaderboard:
<a href="https://github.com/vectara/hallucination-leaderboard">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Tue, 08 Apr 2025 01:09:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=43617440</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=43617440</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43617440</guid></item><item><title><![CDATA[New comment by ofermend in "Gemini 2.5"]]></title><description><![CDATA[
<p>This model is quite impressive. Not just useful for math/research with great reasoning, it also maintained a very low hallucination rate of 1.1% on Vectara Hallucination Leaderboard:
<a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Wed, 26 Mar 2025 03:16:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=43478533</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=43478533</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43478533</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>It is common these days to see in large companies multiple teams developing isolated RAG applications. 
This is similar to the problem of "Shadow IT" back in the early cloud era - causes a big headache to IT teams.<p>I work at Vectara, and we see this all the time.
Wondering how others are experiencing this?</p>
]]></description><pubDate>Tue, 25 Mar 2025 13:53:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=43471340</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=43471340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43471340</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>DeepSeek-R1 is an amazing reasoning LLM, but it seems to hallucinate more than we might expect.</p>
]]></description><pubDate>Tue, 28 Jan 2025 16:06:03 +0000</pubDate><link>https://news.ycombinator.com/item?id=42853901</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=42853901</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42853901</guid></item><item><title><![CDATA[New comment by ofermend in "Gemini 2.0: our new AI model for the agentic era"]]></title><description><![CDATA[
<p>Gemini-2.0-Flash does extremely well on the Hallucination Evaluation Leaderboard, at 1.3% hallucination rate
<a href="https://github.com/vectara/hallucination-leaderboard" rel="nofollow">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Wed, 11 Dec 2024 23:50:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=42394456</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=42394456</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42394456</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>We've done a study (see link) that shows that - unlike common belief - semantic chunking is not always the best approach.<p>Curious to hear from the YC community - anyone else did systemic testing and if so what did you find?</p>
]]></description><pubDate>Wed, 27 Nov 2024 20:20:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=42259340</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=42259340</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42259340</guid></item><item><title><![CDATA[New comment by ofermend in "IBM Granite 3.0: open enterprise models"]]></title><description><![CDATA[
<p>Check out Granite 3.0 on the hallucination leaderboard: 
<a href="https://github.com/vectara/hallucination-leaderboard">https://github.com/vectara/hallucination-leaderboard</a></p>
]]></description><pubDate>Thu, 24 Oct 2024 15:48:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=41936721</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=41936721</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41936721</guid></item><item><title><![CDATA[New comment by ofermend in "[dead]"]]></title><description><![CDATA[
<p>We recently launched UDF reranking as part of the RAG stack, and we think this supports a lot of interesting use-cases to go beyond simple relevance. For example, it supports ranking by distance (geo-location), by recency, and more.<p>I wanted to ask advice from the HN community: what are some real use-cases you have that can benefit from UDF reranking in RAG?</p>
]]></description><pubDate>Wed, 23 Oct 2024 17:59:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=41927617</link><dc:creator>ofermend</dc:creator><comments>https://news.ycombinator.com/item?id=41927617</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41927617</guid></item></channel></rss>