Hacker News: mateuszbuda

New comment by mateuszbuda in "Show HN: ScrapingDuck a Web Scraping API"

mateuszbuda — Sat, 06 Sep 2025 10:29:12 +0000

Welcome to the zoo! ScrapingFish :)

New comment by mateuszbuda in "Ask HN: Do people actually pay for small web tools?"

mateuszbuda — Tue, 20 May 2025 07:08:57 +0000

Here you can find statistics based on data scraped from Indie Hackers, but it’s not split by the tool size: https://scrapingfish.com/blog/indie-hackers-revenue

New comment by mateuszbuda in "Abusive AI Web Crawlers: Get Off My Lawn"

mateuszbuda — Wed, 02 Apr 2025 16:25:52 +0000

There are many different methods used by proxy providers to unethically source their IPs: https://scrapingfish.com/how-ips-for-web-scraping-are-source...

New comment by mateuszbuda in "Ingesting PDFs and why Gemini 2.0 changes everything"

mateuszbuda — Wed, 05 Feb 2025 22:39:25 +0000

There’s AWS Bedrock Knowledge Base (Amazon proprietary RAG solution) which can digest PDFs and, as far as I tested it on real world documents, it works pretty well and is cost effective.

New comment by mateuszbuda in "Using eSIMs with devices that only have a physical SIM slot via a 9eSIM SIM car"

mateuszbuda — Mon, 20 Jan 2025 20:40:13 +0000

You can use a USB hub (example with 20 ports: https://www.sipolar.com/product/a-805p-20-ports-usb-2-0-hub/) and attach multiple USB dongles to it. This blog post describes a setup for web scraping: https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...

New comment by mateuszbuda in "SrsRAN: Open-Source 4G/5G"

mateuszbuda — Mon, 06 Jan 2025 18:35:55 +0000

Do you have an idea what would be my external IP address? On my phone connected to a mobile network, I get assigned mobile IP address which is my external IP address. It's not attached to the SIM card because it changes when I reconnected. Is it handles by the BTS software? Do I get assigned an IP address and BTS communicates on my behalf using that address which comes from the mobile network operators pool?

New comment by mateuszbuda in "SrsRAN: Open-Source 4G/5G"

mateuszbuda — Mon, 06 Jan 2025 11:12:12 +0000

Can I use this to run my own mobile network? Is there something like a blank SIM card which I could use for it? I don't need global coverage but is it possible to create my own BTS on a PC (with some antenna connected to it) and then have my own SIM card which I can insert into regular phone/device and have it connected to my BTS and connect to the Internet?

New comment by mateuszbuda in "Looking for 1M+ links/day webscraping K8s solution"

mateuszbuda — Thu, 19 Dec 2024 13:10:40 +0000

What exactly doesn't work well? Did you consider playwright?

New comment by mateuszbuda in "Ask HN: Founders, what was the major sourcing channel for your first 100 users?"

mateuszbuda — Sat, 19 Oct 2024 06:53:28 +0000

I don’t really know. I don’t write posts to optimize for SEO (include FAQ at the end or something like that) and hope it’s just good content people will share.

There are also SEO pages which do not have any useful content. I think I should have more of them because my competitors have only SEO pages but I don’t have time for it as I have to focus on the product and customer support. Probably a good mix between useful content blog posts (maybe with SEO filling) and strictly SEO pages is best to bring traffic.

New comment by mateuszbuda in "Ask HN: Founders, what was the major sourcing channel for your first 100 users?"

mateuszbuda — Sat, 19 Oct 2024 06:27:04 +0000

Content marketing - blog posts with useful content posted around the Internet. Most traffic from HN and Reddit.

New comment by mateuszbuda in "Web scraping with GPT-4o: powerful but expensive"

mateuszbuda — Tue, 03 Sep 2024 11:04:26 +0000

I think that LLM costs, even GPT-4o, are probably lower compared to proxy costs usually required for web scraping at scale. The cost of residential/mobile proxies is a few $ per GB. If I were to process cleaned data obtained using 1GB of residential/mobile proxy transfer, I wouldn't pay more for LLM.

New comment by mateuszbuda in "[dead]"

mateuszbuda — Sun, 21 Jul 2024 12:22:13 +0000

I agree that web scraping is a shady business in many cases but there is definitely a difference between setting up a few mobile proxies for yourself and using devices and networks which belong to other people without them even knowing this until they cannot access some websites because there was a bot detected in their network.

New comment by mateuszbuda in "[dead]"

mateuszbuda — Sun, 21 Jul 2024 08:50:34 +0000

Here are some insights into how proxies are sourced: https://scrapingfish.com/how-ips-for-web-scraping-are-source...

There's also an option to build your own mobile proxy pool which gives you very good reputation IPs for web scraping and doesn't harm other people: https://scrapingfish.com/blog/byo-mobile-proxy-for-web-scrap...

New comment by mateuszbuda in "Collection of Dark Patterns and Unethical Design"

mateuszbuda — Thu, 18 Jul 2024 13:02:51 +0000

I would also include deceptive credits systems used by SaaS which have usage-based like subscriptions. It’s a bait and switch variant. First, you think one call to the API is one credit but it always turns out that you need calls which consume 20 or 50 credits instead and you have to move to a more expensive plan and buy millions of credits every month. Second, unused credits do not roll over to the next month so your effective cost per call is orders of magnitude larger compared to what you expected.

New comment by mateuszbuda in "How I overcame my addiction to sugar"

mateuszbuda — Sat, 29 Jun 2024 13:57:14 +0000

I tried not buying food with added sugar but it’s surprisingly difficult. Here is an interesting analysis I did some time ago which shows that for half of the food items, sugar is the main ingredient: https://scrapingfish.com/blog/scraping-walmart

New comment by mateuszbuda in "Ask HN: SaaS Subscription or Usage-Based Pricing?"

mateuszbuda — Sat, 18 May 2024 21:41:24 +0000

At https://scrapingfish.com/ we have both options, usage based https://scrapingfish.com/buy and subscriptions (monthly unlimited requests plan) https://scrapingfish.com/unlimited. Despite subscriptions being cheaper option per request, usage based is way more popular. Only less than 10% of our users have subscribed to unlimited monthly plan. I guess usage based plans give users more control over how much they spend or maybe they simply don't want to subscribe to another service.

New comment by mateuszbuda in "What is the most useful project you have worked on?"

mateuszbuda — Sat, 06 Apr 2024 20:54:01 +0000

I’m still working on a web scraping API (https://scrapingfish.com/). For some people it’s evil bot but for others it’s enabler for public data access. I think it’s useful.

New comment by mateuszbuda in "Ask HN: What is the current (Apr. 2024) gold standard of running an LLM locally?"

mateuszbuda — Mon, 01 Apr 2024 12:17:56 +0000

Anyone can share experience with https://ollama.com/ ?

New comment by mateuszbuda in "Ask HN: What non-AI products are you working on?"

mateuszbuda — Tue, 26 Mar 2024 22:45:23 +0000

We keep working on web scraping API with custom-made mobile proxy pool: https://scrapingfish.com/

There is no AI in it so far but we consider adding support for parsing the result to extract data using LLM.

New comment by mateuszbuda in "Ask HN: How much to charge for an API call?"

mateuszbuda — Wed, 13 Mar 2024 10:16:34 +0000

Two main differentiators. 1. Pricing. We charge for requests and the cost of each successful request is the same as opposed to misleading API credits system used by others. Also, we sell request pack which are valid up to 1 year as opposed to monthly plans with expiring unused API credits. 2. We use our own high quality and ethically sourced mobile proxies as opposed to shared pools from large proxy providers (https://scrapingfish.com/how-ips-for-web-scraping-are-source...).