<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: pkhodiyar</title><link>https://news.ycombinator.com/user?id=pkhodiyar</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 12:38:49 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=pkhodiyar" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by pkhodiyar in "Microsoft builds MacBook Pro rival with NVIDIA-powered Surface Laptop Ultra"]]></title><description><![CDATA[
<p>is this based on ARM? or x64</p>
]]></description><pubDate>Tue, 02 Jun 2026 13:06:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48369756</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=48369756</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48369756</guid></item><item><title><![CDATA[New comment by pkhodiyar in "Ask HN: Who wants to be hired? (June 2026)"]]></title><description><![CDATA[
<p>Not hired, looking for funding for vaquill.ai and quilldraft.com , a solo developer got a handful of paying users at 99$, so its PMF, stable product.</p>
]]></description><pubDate>Tue, 02 Jun 2026 06:07:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48366634</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=48366634</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48366634</guid></item><item><title><![CDATA[New comment by pkhodiyar in "macOS needs its grid back"]]></title><description><![CDATA[
<p>there is a project that makes macOS alt+tab look like windows grids (if anyone coming from there), its all something alt_tabs or something</p>
]]></description><pubDate>Tue, 02 Jun 2026 04:59:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48366245</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=48366245</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48366245</guid></item><item><title><![CDATA[Agent Credential Brokers in 2026]]></title><description><![CDATA[
<p>Article URL: <a href="https://authsome.ai/blog/top-agent-proxy-tools-what-to-know">https://authsome.ai/blog/top-agent-proxy-tools-what-to-know</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48222665">https://news.ycombinator.com/item?id=48222665</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 21 May 2026 13:58:12 +0000</pubDate><link>https://authsome.ai/blog/top-agent-proxy-tools-what-to-know</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=48222665</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48222665</guid></item><item><title><![CDATA[Running AI agents without losing my keys]]></title><description><![CDATA[
<p>Article URL: <a href="https://zriyansh.medium.com/running-agents-without-losing-my-keys-a-month-with-authsome-039690fe5e6f">https://zriyansh.medium.com/running-agents-without-losing-my-keys-a-month-with-authsome-039690fe5e6f</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48177527">https://news.ycombinator.com/item?id=48177527</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 18 May 2026 10:18:31 +0000</pubDate><link>https://zriyansh.medium.com/running-agents-without-losing-my-keys-a-month-with-authsome-039690fe5e6f</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=48177527</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48177527</guid></item><item><title><![CDATA[New comment by pkhodiyar in "Authsome – open-source local auth proxy for AI agents"]]></title><description><![CDATA[
<p>Every agent I've built starts the same way. Paste an API key into .env, export it, hope it doesn't end up in a log or a subprocess env dump. token expires and something quietly breaks. We've all been there<p>so I wrote authsome. The bit I think is actually interesting is the run command:<p><pre><code>  authsome run -- python my_agent.py
</code></pre>
It launches the child behind a local auth proxy and the proxy intercepts outbound HTTPS and injects Auth headers at request time. the child process never has the secret in its environment, so it can't leak through os.environ, ps -e, or anything that dumps a subprocess env and the agent code doesn't change as well.<p>the tokens are stored locally, encrypted, and refreshed before they expire. Oauth flows for interactive and headless, plus a browser bridge for API-key providers. There is a cli for pulling headers directly when you don't want the proxy.<p>the proxy only sees traffic that goes through it, so libraries that pin their own CA bundle slip past, also the streaming uploads and long-lived connections probably have edge cases I haven't hit. It's still alpha, v0.2.1.<p>Most interested in feedback on the proxy approach itself, that's the part I'm least sure about.<p><a href="https://github.com/manojbajaj95/authsome" rel="nofollow">https://github.com/manojbajaj95/authsome</a></p>
]]></description><pubDate>Tue, 28 Apr 2026 15:52:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47936191</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=47936191</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47936191</guid></item><item><title><![CDATA[Authsome – open-source local auth proxy for AI agents]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/manojbajaj95/authsome">https://github.com/manojbajaj95/authsome</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47936190">https://news.ycombinator.com/item?id=47936190</a></p>
<p>Points: 7</p>
<p># Comments: 3</p>
]]></description><pubDate>Tue, 28 Apr 2026 15:52:37 +0000</pubDate><link>https://github.com/manojbajaj95/authsome</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=47936190</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47936190</guid></item><item><title><![CDATA[Show HN: API for 13M+ Indian court cases with citation graphs and vector search]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.vaquill.ai">https://www.vaquill.ai</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47765679">https://news.ycombinator.com/item?id=47765679</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 14 Apr 2026 13:52:55 +0000</pubDate><link>https://www.vaquill.ai</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=47765679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47765679</guid></item><item><title><![CDATA[New comment by pkhodiyar in "[dead]"]]></title><description><![CDATA[
<p>So I sat down one day thinking this sucks, there isn't any platform that solves this problem for lawyers who are not Supreme Court or High Court Related as most companies build for them (more like the middle kid who gets ignored.<p>So built this, let me know what you guys think.<p>This covers:
- ITAT (Income Tax Appellate Tribunal)
- CESTAT (Customs, Excise & Service Tax Appellate Tribunal)
- GST AAR (GST Authority for Advance Rulings)<p>- NCLT (National Company Law Tribunal)
- IBBI (Insolvency & Bankruptcy Board of India)
- DRT (Debt Recovery Tribunal)
- SAT (Securities Appellate Tribunal)
- CCI (Competition Commission of India)<p>- NGT (National Green Tribunal)
- APTEL (Appellate Tribunal for Electricity)<p>- TDSAT (Telecom Disputes Settlement & Appellate Tribunal)
- CAT (Central Administrative Tribunal)
- AFT (Armed Forces Tribunal)
- RERA (Real Estate Regulatory Authority)<p>Would love to pick your brains</p>
]]></description><pubDate>Wed, 18 Mar 2026 12:47:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47425112</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=47425112</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47425112</guid></item><item><title><![CDATA[Show HN: LegalTech – A curated list of tools and software]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/Vaquill-AI/awesome-legaltech">https://github.com/Vaquill-AI/awesome-legaltech</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47351809">https://news.ycombinator.com/item?id=47351809</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 12 Mar 2026 15:11:39 +0000</pubDate><link>https://github.com/Vaquill-AI/awesome-legaltech</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=47351809</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47351809</guid></item><item><title><![CDATA[New comment by pkhodiyar in "Ask HN: What Are You Working On? (December 2025)"]]></title><description><![CDATA[
<p>working on <a href="https://socdefenders.ai" rel="nofollow">https://socdefenders.ai</a>, reddit + HN for cybersecurity</p>
]]></description><pubDate>Sun, 14 Dec 2025 18:50:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=46265633</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=46265633</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46265633</guid></item><item><title><![CDATA[Show HN: Reddit and HN for Cybersecurity [Free]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.socdefenders.ai/">https://www.socdefenders.ai/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=46265631">https://news.ycombinator.com/item?id=46265631</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 14 Dec 2025 18:50:15 +0000</pubDate><link>https://www.socdefenders.ai/</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=46265631</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46265631</guid></item><item><title><![CDATA[New comment by pkhodiyar in "[dead]"]]></title><description><![CDATA[
<p>quick tldr; We are doing a live 60 minutes AMA with folks from Microsoft, Pinecone, Santiago and Alden (CEO CustomGPT.ai) on MCP, sounds interesting? Register.<p>The goal is to educate about MCP, answer questions, and cover use cases: RAG + MCP, IDEs + MCP, etc. We’ll have live demos, Pinecone folks talking about what they are up to, and much more fun!<p>If you have been early in the MCP race, this would surely be worth your time.<p>Why might this interest you?<p>Model Context Protocol (MCP) is a low-level JSON-RPC protocol for passing structured context and tools to an LLM. Instead of gluing prompts together, you expose one JSON endpoint for a tool (and it takes care of tons of API endpoints for that tool).<p>MCP is just REST for LLMs! It really is that simple!<p>We plan to show a live demo of a working MCP, preferably hosted one, setting up configs, with Claude.<p>We will also answer any questions!<p>Featured Speakers:<p>1.  Michael Kistler - Principal Program Manager at Microsoft<p>2.  Arjun Patel - Senior Developer Advocate at Pinecone<p>3. Santiago (<a href="https://www.linkedin.com/in/svpino/" rel="nofollow">https://www.linkedin.com/in/svpino/</a>) - Computer scientist and teaches hard-core Machine Learning; will walk you through Why do we need MCP?, Before MCP vs. After MCP, Architecture, Primitives, and Advantages.<p>4. Alden Do Rosario - will dissect the RAG + MCP pipeline we run in prod, live demo.<p>Format: - 3×10 min tech talks (protocol, integration, case study) - 10 min panel on lessons learned - 20 min open Q&A - bring tough questions<p>When: - Date: Sept 25, 02 PM ET<p>Registration (free, no spam): LINK <a href="http://customgpt.ai/mcp-ama-hn" rel="nofollow">http://customgpt.ai/mcp-ama-hn</a><p>Code sample, and infra diagrams will be posted after the session. AMA during and after the call - hope to see HN folks there.</p>
]]></description><pubDate>Mon, 22 Sep 2025 21:37:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=45339828</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=45339828</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45339828</guid></item><item><title><![CDATA[New comment by pkhodiyar in "[dead]"]]></title><description><![CDATA[
<p>We've been using CustomGPT.ai's RAG API for client projects but kept rebuilding the same UI components. Today we're open-sourcing our complete implementation (think ChatGPT interface open sourced).<p>No vendor lock-in. No telemetry. No "premium" features.<p>We built this because we needed it. We're sharing it because you need it too.<p>What does it mean to you?
1. Free and Ready to use UI like ChatGPT with Voice
2. 100% Customizable for you to build on top of it
3. Dev community to fix bugs and feature requests
4. Why create RAG from scratch when you can use free templates?<p>Technical details:
1. Next.js 14 + TypeScript + Zustand for state
2. Proxy architecture keeps API keys server-side
3. Proper SSE streaming with cleanup and error boundaries
4. Voice: OpenAI Whisper STT + TTS (6 voices)
5. Three deployment modes: widget.js bundle, iframe, or standalone
6. PWA support with service worker
7. Dark mode + full mobile responsiveness.<p>Interesting challenges solved:
1. Concurrent message streams without memory leaks
2. Widget state isolation when multiple instances on the same page, 100% customizable. 
3. CORS handling for cross-domain embedding
4. Citation preview just like ChatGPT<p>Deployment options:
1. Vercel/Netlify (one-click)
2. Railway/Render
3. Docker
4. Google Apps Script (single file, 20k req/day free for select social RAG AI bots)<p>Also includes 9 social platform bots (Slack, Discord, Telegram, etc.) that connect to the same CustomGPT.ai backend.<p>Code: github.com/Poll-The-People/customgpt-starter-kit<p>Demo: starterkit.customgpt.ai (10-min free trial or BYO key)<p>MIT licensed. No telemetry. No premium tiers.<p>We built this for ourselves but figured others might find it useful. Feedback welcome.</p>
]]></description><pubDate>Thu, 04 Sep 2025 15:45:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=45128565</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=45128565</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45128565</guid></item><item><title><![CDATA[Awesome-RAG GitHub]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/Poll-The-People/awesome-rag">https://github.com/Poll-The-People/awesome-rag</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44525514">https://news.ycombinator.com/item?id=44525514</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 10 Jul 2025 20:56:27 +0000</pubDate><link>https://github.com/Poll-The-People/awesome-rag</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=44525514</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44525514</guid></item><item><title><![CDATA[New comment by pkhodiyar in "[dead]"]]></title><description><![CDATA[
<p>quick tldr; We are doing a live 60 minutes AMA with 3 industry experts on MCP, sounds interesting? Register.<p>The goal is to educate about MCP, answer questions, and cover use cases: RAG + MCP, IDEs + MCP, etc. We’ll have live demos, Pinecone folks talking about what they are up to, and much more fun!<p>If you have been early in the MCP race, this would surely be worth your time.<p>Why might this interest you?<p>Model Context Protocol (MCP) is a low-level JSON-RPC protocol for passing structured context and tools to an LLM. Instead of gluing prompts together, you expose one JSON endpoint for a tool (and it takes care of tons of API endpoints for that tool).<p>MCP is just REST for LLMs! It really is that simple!<p>We plan to show a live demo of a working MCP, preferably hosted one, setting up configs, with Claude.<p>We will also answer any questions!<p>Featured Speakers:
1. Santiago (<a href="https://www.linkedin.com/in/svpino/" rel="nofollow">https://www.linkedin.com/in/svpino/</a>) - Computer scientist and teaches hard-core Machine Learning; will walk you through Why do we need MCP?, Before MCP vs. After MCP, Architecture, Primitives, and Advantages.<p>2. Alden Do Rosario (CustomGPT.ai CEO) - will dissect the RAG + MCP pipeline we run in prod, live demo.<p>3. Roy Miara, (<a href="https://www.linkedin.com/in/roy-miara-73776a56/" rel="nofollow">https://www.linkedin.com/in/roy-miara-73776a56/</a>) Director of Machine Learning, Pinecone, will talk about what Pinecone is upto with MCP.<p>Format:
- 3×10 min tech talks (protocol, integration, case study)
- 10 min panel on lessons learned
- 20 min open Q&A - bring tough questions<p>When: 
- Date: May 29, 01 PM ET | | May 30 At 1:30 AM IST | Thu May 29 At  8:00 PM UTC
- Registration (free, no spam): 
LINK <a href="https://lu.ma/gr6eqznl" rel="nofollow">https://lu.ma/gr6eqznl</a><p>Code sample, and infra diagrams will be posted after the session. AMA during and after the call - hope to see HN folks there.</p>
]]></description><pubDate>Thu, 22 May 2025 17:55:48 +0000</pubDate><link>https://news.ycombinator.com/item?id=44064741</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=44064741</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44064741</guid></item><item><title><![CDATA[101x Airbyte, 11x Estuary, Postgres to Iceberg]]></title><description><![CDATA[
<p>Hi HN, we've been developing OLake, an open-source connector specifically designed for replicating data from PostgreSQL into Apache Iceberg. We recently ran some detailed benchmarks comparing its performance and cost against several popular data movement tools: Fivetran, Debezium (using the memiiso setup mentioned), Estuary, and Airbyte.<p>We wanted to share the results, as they show OLake performing very competitively, often exceeding the speed of both open-source and commercial alternatives, while offering the cost advantages of a self-hosted open-source solution.<p>The benchmarks covered both full initial loads and Change Data Capture (CDC) on a large dataset (billions of rows for full load, tens of millions of changes for CDC) over a 24-hour window.<p>Link to entire benchmark postgres - https://olake.io/docs/connectors/postgres/benchmarks<p>For full loads, OLake achieved throughput of around 46,262 rows/sec, processing over 4 billion rows in 24 hours.<p>This was essentially on par with Fivetran (46,395 RPS) and significantly faster than Debezium (14,839 RPS - 3.1x slower), Estuary (3,982 RPS - 11.6x slower on a smaller processed dataset), and Airbyte (457 RPS - 101x slower before it failed the long test).<p>The most striking results were in CDC performance.<p>For processing 50 million changes, OLake completed the task in 22.5 minutes at 36,982 rows/sec. Fivetran took 31 minutes (1.4x slower), Debezium took 60 minutes (2.7x slower), Estuary took 4.5 hours (12x slower), and Airbyte took 23 hours (63x slower).<p>This indicates OLake delivers significantly lower latency for propagating changes from PostgreSQL to Iceberg.<p>On the cost side, OLake is open source and self-hosted. The cost is simply the infrastructure. Running the benchmarks on a substantial VM (64 vcpus, 128 GiB memory) for 24 hours cost less than $75.<p>Comparing this to the vendor list prices for the data synced in the tests: Fivetran's full load cost $7,446 ($1.86/M rows), Estuary's full load cost $4,462 ($12.97/M rows), Airbyte Cloud's partial full load cost $5,560 ($438.8/M rows).<p>For CDC, Fivetran cost $2,257 ($45.14/M rows), Estuary cost $22.72 ($0.45/M rows), and Airbyte Cloud cost $148.95 ($2.98/M rows).<p>While Estuary shows a low per-row cost for CDC in this specific test, the overall picture strongly favors the predictable, infra-based cost of self-hosted OLake, especially for large-scale replication.<p>In summary, these benchmarks suggest OLake can match or exceed the speed of leading proprietary tools for PostgreSQL to Iceberg replication, offers superior CDC latency compared to all tested alternatives, and provides a significantly lower and more predictable cost structure due to being open source and self-hosted.<p>You can find more details on the benchmarks and the tool itself in our documentation.<p>Happy to discuss the results and our approach.</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43925173">https://news.ycombinator.com/item?id=43925173</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 08 May 2025 11:40:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=43925173</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=43925173</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43925173</guid></item><item><title><![CDATA[Show HN: We launched hosted MCP and RAG (no infra, we host)]]></title><description><![CDATA[
<p>Hey HN,<p>We just put a fully managed Hosted MCP Server in front of our production-grade RAG stack. It solves the two things (will talk about that at end) that kept biting us (and most dev teams) when wiring agents to private data.<p>How does it work? (# 30-second flow)<p>→ CustomGPT Console → Deploy → MCP → Enable → Grab the generated endpoint + JSON schema → Add to MCP aware client<p>Point any MCP-aware tool (e.g. dozens of Agentic AI and workflow tools like n8n and Zapier; IDEs like Cursor; ChatGPT w/ MCP plugin, Anthropic’s Claude, etc.) at the endpoint.<p>So it's basically bringing RAG to MCP.<p>Back to the 2 things I talked about above are:<p>1. Agent Answer Accuracy a.k.a RAG accuracy – we benchmark at the top of public leaderboards for “business-doc” retrieval & no hallucination.<p>2. Ops drag – no k8s, patch cycles, or 3 a.m. TLS renewals. We host, autoscale, and watch the graphs.<p>Included in every CustomGPT.ai plan (free-trial friendly). Happy to share perf metrics or answer architecture questions.<p>Ask me anything!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43896154">https://news.ycombinator.com/item?id=43896154</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 05 May 2025 15:30:39 +0000</pubDate><link>https://customgpt.ai/hosted-mcp-servers-for-rag-powered-agents/</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=43896154</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43896154</guid></item><item><title><![CDATA[Debezium to olake.io – PhysicsWallah switch for CDC]]></title><description><![CDATA[
<p>We recently hosted a small online meetup at OLake where a Data Engineer at PhysicsWallah, walked through why his team dropped Debezium and moved to OLake’s “MongoDB → Iceberg” pipeline.<p>Video (29 min): https://www.youtube.com/watch?v=qqtE_BrjVkM<p>If you are someone who prefer text, here’s the quick TLDR;<p>Why Debezium became a drag for them:
1. Long full loads on multi-million-row MongoDB collections, and any failure meant restarting from scratch
2. Kafka and Connect infrastructure felt heavy when the end goal was “Parquet/Iceberg on S3”
3. Handling heterogeneous arrays required custom SMTs
4. Continuous streaming only; they still had to glue together ad-hoc batch pulls for some workflows
5. Ongoing schema drift demanded extra code to keep Iceberg tables aligned<p>What changed with OLake?
-> Writes directly from MongoDB (and friends) into Apache Iceberg, no message broker in between<p>-> Two modes: full load for the initial dump, then CDC for ongoing changes — exposed by a single flag in the job config
-> Automatic schema evolution: new MongoDB fields appear as nullable columns; complex sub-docs land as JSON strings you can parse later<p>-> Resumable, chunked full loads: a pod crash resumes instead of restarting<p>-> Runs as either a Kubernetes CronJob or an Airflow task; config is one YAML/JSON file.<p>Their stack in one line: MongoDB → OLake writer → Iceberg on S3 → Spark jobs → Trino / occasional Redshift, all orchestrated by Airflow and/or K8s.<p>Posting here because many of us still bolt Kafka onto CDC just to land files. If you only need Iceberg tables, a simpler path might exist now. Curious to hear others’ experiences with broker-less CDC tools.<p>(Disclaimer: I work on OLake and hosted the meetup, but the talk is purely technical.)<p>Check out github repo - https://github.com/datazip-inc/olake</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43844411">https://news.ycombinator.com/item?id=43844411</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 30 Apr 2025 12:44:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=43844411</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=43844411</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43844411</guid></item><item><title><![CDATA[New comment by pkhodiyar in "Your OpenAI Project and RAG"]]></title><description><![CDATA[
<p>hey folks, Priyansh this side, I just put together a list of all the tools I could find that are openai endpoint compatible, and so now this gives you the power use your openai based project and add RAG functionality.<p>if I missed some tools, feel free to jot them below</p>
]]></description><pubDate>Fri, 25 Apr 2025 16:43:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=43795778</link><dc:creator>pkhodiyar</dc:creator><comments>https://news.ycombinator.com/item?id=43795778</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43795778</guid></item></channel></rss>