<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: jackienotchan</title><link>https://news.ycombinator.com/user?id=jackienotchan</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 16 Apr 2026 05:28:09 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=jackienotchan" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by jackienotchan in "Show HN: Build Web Automations via Demonstration"]]></title><description><![CDATA[
<p>Why is this not a Launch YC (or at least mention it?) since you seem to be part of the current batch?<p>The record/replay is definitely and interesting direction. The browser automation space is getting super crowded though (even within YC), so curious to hear how you differentiate from:<p>- BrowserUse<p>- Browserbase<p>- BrowserBook<p>- Skyvern</p>
]]></description><pubDate>Wed, 28 Jan 2026 16:29:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=46797545</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=46797545</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46797545</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: BrowserBook (YC F24) – IDE for deterministic browser automation"]]></title><description><![CDATA[
<p>Congrats! Could this also be used to generate e2e test automations?<p>For scraping, how do you handle Cloudflare and Captchas? Do you respect robots.txt instructions of websites?</p>
]]></description><pubDate>Thu, 11 Dec 2025 19:19:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=46235872</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=46235872</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46235872</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: Webhound (YC S23) – Research agent that builds datasets from the web"]]></title><description><![CDATA[
<p>AI crawlers have lead to a big surge in scraping activity, and most of these bots don't respect any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits, user agents, etc.).<p>This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported here on HN (and experienced myself).<p>Does Webhound respect robots.txt directives and do you disclose the identity of your crawlers via user-agent header?</p>
]]></description><pubDate>Thu, 25 Sep 2025 16:23:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=45374848</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=45374848</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45374848</guid></item><item><title><![CDATA[New comment by jackienotchan in "996"]]></title><description><![CDATA[
<p>The first two quotes are from founders of:<p>- BrowserUse - Founded 2024<p>- Greptile - Founded 2023<p>The third quote is from a VC who has never founded a startup himself and has a clear interest in pushing founders to trade work-life balance for his own quick returns.<p>So none of these people worked on anything longer than 2 years. I wonder what will happen if we check back in 5–10 years. Will they still be doing and promoting 996, or will they be burned out and have changed their minds? Make your bets.</p>
]]></description><pubDate>Sat, 06 Sep 2025 15:20:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=45150062</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=45150062</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45150062</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast"]]></title><description><![CDATA[
<p>I saw your recent $24M series A and was kind of surprised to only see you launching now, congrats!<p>YC seems to fund quite many document extraction companies, even within the same batch:<p>- Pulse (YC W24): <a href="https://www.ycombinator.com/companies/pulse-3">https://www.ycombinator.com/companies/pulse-3</a><p>- OmniAI (YC W24): <a href="https://www.ycombinator.com/companies/omniai">https://www.ycombinator.com/companies/omniai</a><p>- Extend (YC W23): <a href="https://www.ycombinator.com/companies/extend">https://www.ycombinator.com/companies/extend</a><p>How do you differentiate from these? And how do you see the space evolving as LLMs commoditize PDF extraction?</p>
]]></description><pubDate>Mon, 23 Jun 2025 16:55:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=44357691</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=44357691</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44357691</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: Exa (YC S21) – The web as a database"]]></title><description><![CDATA[
<p>AI crawlers have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN (and experienced myself).<p>Do you have any built-in features that address these issues?</p>
]]></description><pubDate>Tue, 06 May 2025 19:15:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=43908630</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=43908630</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43908630</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: Browser Use (YC W25) – open-source web agents"]]></title><description><![CDATA[
<p>AI agents have lead to a big surge in scraping/crawling activity on the web, and many don't use proper user agents and don't stick to any scraping best practices that the industry has developed over the past two decades (robots.txt, rate limits). This comes with negative side effects for website owners (costs, downtime, etc.), as repeatedly reported on HN.<p>Do you have any built-in features that address these issues?</p>
]]></description><pubDate>Tue, 25 Feb 2025 17:16:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=43174648</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=43174648</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43174648</guid></item><item><title><![CDATA[YC funds AI-powered Reddit marketing bot]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.ycombinator.com/launches/Mmb-casixty-your-reddit-marketing-agent-for-technical-audiences">https://www.ycombinator.com/launches/Mmb-casixty-your-reddit-marketing-agent-for-technical-audiences</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43027023">https://news.ycombinator.com/item?id=43027023</a></p>
<p>Points: 2</p>
<p># Comments: 5</p>
]]></description><pubDate>Wed, 12 Feb 2025 16:40:06 +0000</pubDate><link>https://www.ycombinator.com/launches/Mmb-casixty-your-reddit-marketing-agent-for-technical-audiences</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=43027023</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43027023</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: Simplex: Automate browser workflows using code and natural language"]]></title><description><![CDATA[
<p>Would you mind sharing the story behind your pivot from on-demand photorealistic vision datasets[0] to browser automation?<p>[0] <a href="https://www.ycombinator.com/launches/Lbx-simplex-on-demand-photorealistic-vision-datasets">https://www.ycombinator.com/launches/Lbx-simplex-on-demand-p...</a></p>
]]></description><pubDate>Wed, 15 Jan 2025 06:39:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=42708045</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=42708045</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42708045</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: DataFuel.dev – Turn websites into LLM-ready data"]]></title><description><![CDATA[
<p>I'm noticing a big increase in crawling activity on the sites I manage, likely from bots collecting data for LLMs. Most of them don't use proper user agents and of course don't stick to any scraping best practices that the industry has developed over the past two decades.<p>This trend is creating a lot of headaches for developers responsible for maintaining heavily scraped sites.<p>related:<p>- "Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?" - <a href="https://news.ycombinator.com/item?id=41109926">https://news.ycombinator.com/item?id=41109926</a><p>- "Multiple AI companies bypassing web standard to scrape publisher sites" <a href="https://news.ycombinator.com/item?id=40750182">https://news.ycombinator.com/item?id=40750182</a></p>
]]></description><pubDate>Fri, 13 Dec 2024 06:41:45 +0000</pubDate><link>https://news.ycombinator.com/item?id=42406408</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=42406408</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42406408</guid></item><item><title><![CDATA[PearAI (YC F24) forks OS repo and rebrands it with mass-replacing references]]></title><description><![CDATA[
<p>Article URL: <a href="https://twitter.com/CodeFryingPan/status/1840203597478539477">https://twitter.com/CodeFryingPan/status/1840203597478539477</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=41689217">https://news.ycombinator.com/item?id=41689217</a></p>
<p>Points: 41</p>
<p># Comments: 3</p>
]]></description><pubDate>Sun, 29 Sep 2024 18:14:35 +0000</pubDate><link>https://twitter.com/CodeFryingPan/status/1840203597478539477</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=41689217</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41689217</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: I'm making an AI scraper called FetchFox"]]></title><description><![CDATA[
<p>You have LinkedIn and Twitter examples, where you're very likely violating their TOS as they prohibit any scraping.<p>I also assume you don't check the robots.txt of websites?<p>I'm all for automating tedious work, but with all this (mostly AI-related) scraping, things are getting out of hand and creating a lot of headaches for developers maintaining heavily scraped sites.<p>related:<p>- "Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?" - <a href="https://news.ycombinator.com/item?id=41109926">https://news.ycombinator.com/item?id=41109926</a><p>- "Multiple AI companies bypassing web standard to scrape publisher sites" <a href="https://news.ycombinator.com/item?id=40750182">https://news.ycombinator.com/item?id=40750182</a></p>
]]></description><pubDate>Wed, 04 Sep 2024 05:39:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=41442254</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=41442254</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41442254</guid></item><item><title><![CDATA[New comment by jackienotchan in "Launch HN: Trellis (YC W24) – AI-powered workflows for unstructured data"]]></title><description><![CDATA[
<p>This was the top comment for quite a while but suddenly dropped to the bottom. Was it automatically downranked for mentioning an OS alternative?<p>How many upvotes does your comment have?</p>
]]></description><pubDate>Tue, 13 Aug 2024 20:48:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=41239679</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=41239679</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41239679</guid></item><item><title><![CDATA[New comment by jackienotchan in "Dear AI Companies, instead of scraping OpenStreetMap, how about a $10k donation?"]]></title><description><![CDATA[
<p>Affected companies are becoming increasingly frustrated with the army of AI crawlers out there as they won't stick to any scraping best practices (respect robot.txt, use public APIs, no peak load). 
It's not necessarily about copyright, but the heavy scraping traffic also leads to increased infra costs.<p>What's the endgame here? AI can already solve captchas, so the arms race for bot protection is pretty much lost.</p>
]]></description><pubDate>Tue, 30 Jul 2024 15:54:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=41110456</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=41110456</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41110456</guid></item><item><title><![CDATA[Ask HN: How to write effective cold emails?]]></title><description><![CDATA[
<p>I'm a tech founder looking to improve my sales skills, specifically in reaching out to potential clients. This isn't about mass-spamming people, but rather about reaching out to relevant profiles and asking for genuine feedback.<p>What kind of email would you respond to? Any tips or examples would be greatly appreciated!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=40739512">https://news.ycombinator.com/item?id=40739512</a></p>
<p>Points: 1</p>
<p># Comments: 2</p>
]]></description><pubDate>Thu, 20 Jun 2024 15:02:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=40739512</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40739512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40739512</guid></item><item><title><![CDATA[New comment by jackienotchan in "Suno has raised $125M to build a future where anyone can make music"]]></title><description><![CDATA[
<p>Random anecdote: I've created a Suno song as an anniversary gift for my girlfriend. She was absolutely mindblown by it as it's an earworm song with many of our memories.<p>Always good to be aware of our small tech bubble here and that things we take for granted already, might not even be close to adoption :)</p>
]]></description><pubDate>Tue, 21 May 2024 18:54:28 +0000</pubDate><link>https://news.ycombinator.com/item?id=40432352</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40432352</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40432352</guid></item><item><title><![CDATA[New comment by jackienotchan in "PaliGemma: Open-Source Multimodal Model by Google"]]></title><description><![CDATA[
<p>Everybody trashed Google yesterday, but I actually think they are catching up in the AI race.<p>They will now start fully leveraging their distribution advantage across products and platforms.</p>
]]></description><pubDate>Wed, 15 May 2024 20:04:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=40371696</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40371696</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40371696</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: Tarsier – Vision utilities for web interaction agents"]]></title><description><![CDATA[
<p>Looking at OpenAdapt, I'm wondering why they didn't integrate Tarsier into AgentGPT, which is their flagship github repo but doesn't seem to be under active development anymore.</p>
]]></description><pubDate>Wed, 15 May 2024 20:01:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=40371651</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40371651</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40371651</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: Tarsier – Vision utilities for web interaction agents"]]></title><description><![CDATA[
<p>Did you delete it or the mods?</p>
]]></description><pubDate>Wed, 15 May 2024 19:09:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=40371037</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40371037</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40371037</guid></item><item><title><![CDATA[New comment by jackienotchan in "Show HN: Tarsier – Vision utilities for web interaction agents"]]></title><description><![CDATA[
<p>Why was the Show HN text removed? Too much self promotion? You're a YC company, so I'm surprised the mods would do that.<p><a href="https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=true&query=tarsier&sort=byDate&type=story" rel="nofollow">https://hn.algolia.com/?dateRange=pastYear&page=0&prefix=tru...</a><p>> Hey HN! I built a tool that gives LLMs the ability to understand the visual structure of a webpage even if they don't accept image input. We've found that unimodal GPT-4 + Tarsier's textual webpage representation consistently beats multimodal GPT-4V/4o + webpage screenshot by 10-20%, probably because multimodal LLMs still aren't as performant as they're hyped to be.
Over the course of experimenting with pruned HTML, accessibility trees, and other perception systems for web agents, we've iterated on Tarsier's components to maximize downstream agent/codegen performance.<p>Here's the Tarsier pipeline in a nutshell:<p>1. tag interactable elements with IDs for the LLM to act upon & grab a full-sized webpage screenshot<p>2. for text-only LLMs, run OCR on the screenshot & convert it to whitespace-structured text (this is the coolest part imo)<p>3. map LLM intents back to actions on elements in the browser via an ID-to-XPath dict<p>Humans interact with the web through visually-rendered pages, and agents should too. We run Tarsier in production for thousands of web data extraction agents a day at Reworkd (<a href="https://reworkd.ai">https://reworkd.ai</a>).<p>By the way, we're hiring backend/infra engineers with experience in compute-intensive distributed systems!<p>reworkd.ai/careers</p>
]]></description><pubDate>Wed, 15 May 2024 19:02:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=40370957</link><dc:creator>jackienotchan</dc:creator><comments>https://news.ycombinator.com/item?id=40370957</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40370957</guid></item></channel></rss>