<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mtricot</title><link>https://news.ycombinator.com/user?id=mtricot</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 06 May 2026 12:57:25 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mtricot" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mtricot in "Show HN: Airbyte Agents – context for agents across multiple data sources"]]></title><description><![CDATA[
<p>Great to see you here!</p>
]]></description><pubDate>Wed, 06 May 2026 02:53:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=48031576</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=48031576</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48031576</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Airbyte Agents – context for agents across multiple data sources"]]></title><description><![CDATA[
<p>Let me see what's up and fix that!</p>
]]></description><pubDate>Wed, 06 May 2026 00:16:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=48030542</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=48030542</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48030542</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Airbyte Agents – context for agents across multiple data sources"]]></title><description><![CDATA[
<p>Just want to call out a couple of nuances in our methodology. In general, we tried our best to do apples-to-apples comparisons where we could, and gave ourselves a discount where we couldn’t. Unsurprisingly, it’s a challenge to find MCPs for various vendors (which is another reason we are trying to solve this). Here’s a video walkthrough of the benchmark harness:<a href="https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc" rel="nofollow">https://www.loom.com/share/9d96c8c64c1a4b7fad0356774fc54acc</a><p>Where the comparison wasn't valid or not apples-to-apples:<p>Gong and Zendesk: no official native MCP exists, so we used the most popular community implementations we could find. We were only able to benchmark Gong Search as the Gong MCP does not have a Get tool call.<p>While our Search testing yielded the same number of records on either path, vendor-specific search implementations means results aren’t identical. Contents are similar in general, so the ratios remain directionally correct.<p>The general test set:<p>2 scenarios (Retrieval and Search) across 4 connectors isn’t a huge test set. While we hope to extend this over time, we’ve made the harness public so anyone can contribute in the meantime. Let us know if you find any MCP with better results!<p>Where the vendor MCP wins or ties:<p>Salesforce showed the smallest win at 16%. This is primarily because Salesforce, unlike many vendors, uniquely provides great search support out of the box with their SOQL.<p>We see identical records for Get. As noted, Search returns different sets of identical counts. Airbyte uses fewer tokens because the Salesforce records contain mandatory metadata (type and url).<p>Where the vendor MCP is costly to context:<p>Zendesk is a great example of this. The extreme gap is because the Zendesk MCP (reminder - a community alternative) returns the entire API response in search results. This averages to 9KB per record against our production Zendesk account!<p>Airbyte’s implementation provides filtering, which allows agents to retrieve the minimal data needed to achieve the outcome, explaining the drastic gap.</p>
]]></description><pubDate>Tue, 05 May 2026 15:19:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=48023759</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=48023759</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48023759</guid></item><item><title><![CDATA[Show HN: Airbyte Agents – context for agents across multiple data sources]]></title><description><![CDATA[
<p>I’m Michel, co-founder and CEO of Airbyte (<a href="https://airbyte.com/" rel="nofollow">https://airbyte.com/</a>). We’ve spent the last six years building data connectors. Today we're launching Airbyte Agents (<a href="https://docs.airbyte.com/ai-agents/" rel="nofollow">https://docs.airbyte.com/ai-agents/</a>), a unified data layer for agents to discover information and take action across operational systems.<p>Here’s a quick walkthrough: <a href="https://www.youtube.com/watch?v=ZosDytyf1fg" rel="nofollow">https://www.youtube.com/watch?v=ZosDytyf1fg</a><p>As agents move into real workflows, they need access to more tools (e.g. Slack, Salesforce, Linear). That means a ton of API plumbing: authentication, pagination, filters, handling schema, and matching entities across systems.<p>Most MCPs don’t fix this. They’re thin wrappers over APIs, so agents inherit their weak primitives and still get it wrong most of the time, especially when working across tools.<p>An even deeper issue is that APIs assume you already know what to query (think endpoints, Object IDs, fields), whereas agents usually start one step earlier: they need first to discover what matters before they can even start reasoning.<p>So we built Airbyte Agents to be a context layer between your Agents and all of your data. The core of this is something we call Context Store: a data index optimized for agentic search, populated by our replication connectors. All that work on data connectors the last six years comes in handy here!<p>This gives agents a structured way to discover data, while still allowing them to read and write directly to the upstream system when needed.<p>What got us working on this was an insane trace from an agent we were migrating to our new SDK. It was supposed to answer "which customers are at risk of leaving this quarter?" The trace had 47 steps. Most were API calls. The agent first had to find a bunch of accounts, then map them to the right customers, then look for tickets, bla bla... and when the Agent finally responded, the answer sounded ok, but was wrong. Not only that, it was excruciatingly slow. So we had to do something about it.<p>That 47-step agent is one example of a question where Airbyte Agents does particularly well. Other examples: - “Show me all enterprise deals closing this month with open support tickets." - “Find every support ticket that doesn’t have a Github issue opened”<p>Some of these might sound simple, but the quality of the answer changes dramatically when the agent doesn’t have to assemble all that context at runtime.<p>Once we had an early version of the product, I spent a weekend building a benchmark harness to see if it worked. Also for fun, I like writing benchmarks :). I compared calling the Airbyte Agent MCP vs calling a bunch of vendor MCPs directly. I tested retrieval, and search.<p>For the sake of simplicity, I used token consumption as a unit of measure. I think that’s a good proxy for how well agents are working. A failing agent (like the one that took 47 steps), will churn through lots of tokens while getting nowhere, while a successful one will get straight to the point.<p>Here's what I found when measuring: for Gong, it used up to 80% fewer tokens than their own MCP, for Zendesk up to 90% fewer, for Linear up to 75%, and for Salesforce up to 16% (Salesforce’s own SOQL does a good job here).<p>Of course there is the usual obvious bias: we are the builders of what we are benchmarking. So we made the test harness public: <a href="https://github.com/airbytehq/airbyte-agents-benchmarks" rel="nofollow">https://github.com/airbytehq/airbyte-agents-benchmarks</a>. Feel free to poke at it, and please tell us what you find if you do!<p>It's still early and some parts are rough, but we wanted to share this with the community asap. We'd love to hear from people building agents:
- Are you indexing data ahead of time, or letting the agent call APIs live?
- How are you matching entities across systems?<p>Would also love to hear any thoughts, comments, or ideas of how we could make this better, and if there are obvious things we’re missing. For now, we’re excited to keep building!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48023496">https://news.ycombinator.com/item?id=48023496</a></p>
<p>Points: 123</p>
<p># Comments: 31</p>
]]></description><pubDate>Tue, 05 May 2026 15:03:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48023496</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=48023496</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48023496</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Airbyte 1.0, Marketplace, AI Assist, GenAI Support and Enterprise GA"]]></title><description><![CDATA[
<p>Talking about going back memory lane :) The initial name of the project was "conduit"...</p>
]]></description><pubDate>Tue, 24 Sep 2024 15:35:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=41637687</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=41637687</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41637687</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>correct</p>
]]></description><pubDate>Wed, 09 Aug 2023 15:25:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=37064136</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37064136</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37064136</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Not at the moment but let me bring that to the team so we can brainstorm what it could look like.</p>
]]></description><pubDate>Tue, 08 Aug 2023 23:44:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056959</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056959</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056959</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>When reading the tutorial, we are describing one stack to build a specific app. But the stack is made of building blocks that you can replace with others if you need to.<p>- Airbyte has two self-hosted options: OSS & Enterprise<p>- Langchain: OSS<p>- OpenAI: you can host an OSS model if you want to<p>- Pinecone: there are OSS/self-hosted alternatives</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:30:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056355</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056355</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056355</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>No good reason. Does "it made the post's title too long" work?</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:27:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056322</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056322</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056322</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Isn't it the dream? Today there is a lot of stack that needs to be built to enable what you're describing. This is actually what we are doing with that post. What foundations do we need to build so that the UX for the end user is what you're describing. Will take some time to get there :)</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:25:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056292</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056292</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056292</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>It depends.<p>Airbyte comes in 3 flavors: OSS, Cloud, Enterprise.<p>For OSS & Enterprise, data doesn't leave your infra since Airbyte is running in your infrastructure.
For Cloud, you would have to allow some IPs to allow us to access your local db.</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:20:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056256</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056256</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>For the purpose of the tutorial that we built, it really comes down to the type of data that you're using.<p>If you have data with PII:<p>One option would be to use Airbyte and bring the data into files/local db rather than directly to the vector store, add an extra step that strips the data from all PII and then configure Airbyte to move the clean file/record to the vector store.<p>The option that jmorgan mention is relevant here, using a "self-hosted" model.</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:17:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056215</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056215</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056215</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>On the roadmap! We want to get more clarity on how to fit the Embedding part in the ELT model. Once we figure it out we will add it to PG.</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:11:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056143</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056143</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056143</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Thanks! I agree with your point. There is a lot of tuning that needs to happen, including context aware splitting and any other kind of transformation before the unstructured data gets indexed. This is one of the big challenge of productionizing LLM apps with external data. So far we are using internally since the team as experience dealing with building these connectors and that becomes a great co-pilot.<p>The great thing we get by plugging this whole stack together is that we get all the refreshed data as more issues/connectors get created.</p>
]]></description><pubDate>Tue, 08 Aug 2023 22:07:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=37056102</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37056102</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37056102</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>I am sure we can build something around that. Going to take a look at it. Thanks for mentioning it.</p>
]]></description><pubDate>Tue, 08 Aug 2023 17:01:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=37052146</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37052146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37052146</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Shouldn't have any limits here. Can you let us know how it goes?</p>
]]></description><pubDate>Tue, 08 Aug 2023 17:00:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=37052125</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37052125</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37052125</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Great question :) We want to get to value as fast as possible. I am certain that at some point we will need to go deeper with those integrations and they will likely require to be separate destinations. It will also depend on how they differentiate from each others, we will need more granularity with configurations.</p>
]]></description><pubDate>Tue, 08 Aug 2023 16:59:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=37052114</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37052114</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37052114</guid></item><item><title><![CDATA[New comment by mtricot in "Show HN: Chat with your data using LangChain, Pinecone, and Airbyte"]]></title><description><![CDATA[
<p>Let me know how that works out for you and if you would add anything to this tutorial!</p>
]]></description><pubDate>Tue, 08 Aug 2023 15:45:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=37050767</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37050767</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37050767</guid></item><item><title><![CDATA[Show HN: Chat with your data using LangChain, Pinecone, and Airbyte]]></title><description><![CDATA[
<p>Hi HN,<p>A few of our team members at Airbyte (and Joe, who killed it!) recently played with building our own internal support chat bot, using Airbyte, Langchain, Pinecone and OpenAI, that would answer any questions we ask when developing a new connector on Airbyte.<p>As we prototyped it, we realized that it could be applied for many other use cases and sources of data, so... we created a tutorial that other community members can leverage [<a href="http://airbyte.com/tutorials/chat-with-your-data-using-openai-pinecone-airbyte-and-langchain">http://airbyte.com/tutorials/chat-with-your-data-using-opena...</a>] and the Github repo to run it [<a href="https://github.com/airbytehq/tutorial-connector-dev-bot">https://github.com/airbytehq/tutorial-connector-dev-bot</a>]<p>The tutorial shows: 
- How to extract unstructured data from a variety of sources using Airbyte Open Source
- How to load data into a vector database (here Pinecone), preparing the data for LLM usage along the way
- How to integrate a vector database into ChatGPT to ask questions about your proprietary data<p>I hope some of it is useful, and would love your feedback!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=37050532">https://news.ycombinator.com/item?id=37050532</a></p>
<p>Points: 220</p>
<p># Comments: 59</p>
]]></description><pubDate>Tue, 08 Aug 2023 15:32:13 +0000</pubDate><link>https://airbyte.com/tutorials/chat-with-your-data-using-openai-pinecone-airbyte-and-langchain</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=37050532</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37050532</guid></item><item><title><![CDATA[New comment by mtricot in "Airbyte acquires Grouparoo to accelerate Data Movement"]]></title><description><![CDATA[
<p>Super thrilled to see two open source projects come together!</p>
]]></description><pubDate>Thu, 07 Apr 2022 20:17:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=30949828</link><dc:creator>mtricot</dc:creator><comments>https://news.ycombinator.com/item?id=30949828</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=30949828</guid></item></channel></rss>