- It's Pronounced Data
- Posts
- Kadoa Interview
Kadoa Interview

I recently sat down with the founders of Kadoa to learn more about the company, team, and product. Below is the interview. Be sure to check out a video demo of Kadoa here. If you’re interested in learning more after reading, book time for a demo here or let me know if you’d like a warm intro to the team.
From Scrapers to Agents: How AI Is Changing Web Scraping
The internet is one of the oldest alternative data sources. By the late 1990s, funds were already scraping the web for investment signals. A classic example was tracking retail promotions at the end of a quarter as a proxy for missed sales targets. Over time, scraping expanded to use cases such as job postings, company filings, online prices, and location data.
Two decades later, teams still write scrapers largely the same way. Engineers create bespoke, rule-based scripts for each individual source. Tools have improved, but scrapers remain fragile, costly, and slow to maintain.
AI is now reshaping this process. Kadoa uses AI to automate web scraping at scale for investment firms. Analysts can self-serve web datasets in minutes, while engineers keep control over quality, data schemas, and audits. Data teams can do more with less, scaling public data sourcing faster, cheaper, and with higher accuracy.
In this interview, I speak with Kadoa’s founders about the shift from scripts to agents, why reliability beats hype, and how built-in compliance is a core differentiator.
This interview has been edited for clarity and length.
Attendees
Dan Entrup, It’s Pronounced Data
Adrian Krebs, Kadoa
Tavis Lochhead, Kadoa
Transcript
Dan Entrup: I’m joined by Tavis and Adrian from Kadoa. Let’s start with your backgrounds and the genesis of the company.
Tavis Lochhead: My background is in business roles at tech companies such as Cisco and eBay. I’ve always built projects on the side, and that’s how I met Adrian and Johannes.
Adrian Krebs: Johannes and I are engineers. We worked together in Switzerland at a bank and built side projects in our free time. When GPT-3 came out, we began experimenting with it for generating and maintaining web scraper code. We built a product search engine that scraped product data (i.e. prices and reviews) from across the web. It gained significant traction on Reddit and Hacker News. Tavis reached out, and we teamed up.
Companies began asking to use our crawling and scraping technology. We realized much of the industry was still rule-based: custom scrapers for each source and constant maintenance. We interviewed people in finance and found the same pains from 10-20 years ago. There hadn’t been a real leap forward in collecting public web data. That’s why we built Kadoa, sold our first licenses, and kept growing.
Dan Entrup: For readers who are new to this area, what are the layers in web data collection, and where do you sit?
Adrian Krebs: The market is old and fragmented. There are infrastructure providers such as proxies and SERP APIs. There are traditional web scraping tools that open a browser and pull raw data. But even with these, you still have to manually write a brittle script for each source.
Kadoa automates this process end to end. With AI, an analyst or researcher can build datasets in minutes. You point to a source, and we extract, transform, and load the data into a spreadsheet or a data warehouse. No coding is required.
Dan Entrup: How do you separate AI hype from production reality?
Adrian Krebs: We say, “AI gets attention, but reliability closes deals.” AI is on every agenda, but many tools fail at production scale because of non-determinism, cost, or latency. Customers don’t want generic AI. They want constrained agents that do one task exceptionally well. For us, that task is extracting data from unstructured sources. Building the AI is about 30 percent of the work. The other 70 percent is deploying to production at scale: orchestration, data validation, error handling, retry systems, and effective human-in-the-loop tooling.
Dan Entrup: Do clients feed Kadoa outputs into LLMs, or do they use them directly?
Adrian Krebs: Both. Some customers use Kadoa to ground in-house LLM answers with public data, such as company filings or websites. Others integrate the data directly in models, spreadsheets or data platforms like Snowflake.
Dan Entrup: Let’s cover compliance. How do customers operate compliant web scrapers?
Adrian Krebs: We built compliance into the platform. Compliance teams can set automated controls such as blocking sanctioned countries, banning PII collection, enforcing robots.txt and captcha policies, and managing source allow/deny lists. Everything is audited. You can see who scraped what, when, and which fields. Logs can be exported, and regular reports scheduled. This replaces manual case-by-case reviews and approvals with automated built-in controls. At a higher level, we maintain internal policies and aim for ethical automation. We want beneficial use without overloading sites or causing harm.
Dan Entrup: What is your view on Cloudflare’s metered scraping and paying site owners?
Adrian Krebs: That debate mainly affects publishers. Sites with ad-based business models feel pressure from AI assistants, so new models are being tested. In our data, about 95 percent of what customers collect is not publisher content. It’s things like investor relations pages, macroeconomic data, locations, retail prices, or industry insights. These aren’t paywalled news sites, so the metering debate rarely applies to our use cases. The challenge isn't stopping all bots, it's building systems that enable beneficial automation while protecting against abusive AI crawlers.
Dan Entrup: Tavis, what do you wish customers knew up front?
Tavis Lochhead: The best fit are firms already scraping. They know it’s strategic and understand how frustrating maintenance becomes. Most funds scrape, but few are specialists. Often, a very capable engineer with a full plate is asked to keep scrapers alive. Those teams want the data without the toil. We also enable scraping for firms that have shied away from scraping because of costs or engineering resources.
Dan Entrup: Anything that surprised you about the market from attending industry events?
Tavis Lochhead: Not surprised, but it stands out how smart the people are. It pushes us to be the best in the space. Unlike some markets that are price-driven, funds will pay for technology that works. It’s also a small and friendly community. We see the same faces across London, New York, Hong Kong, and other hubs.
Dan Entrup: What is on the roadmap?
Adrian Krebs: Reliability and data quality remain top priorities, with faster turnaround. We’re doubling down on agentic scraping, which lets users create scrapers in natural language. Agents will navigate sites, fetch files, extract KPIs from documents, and deliver results quickly and accurately. We apply advances from foundation models to our specific data challenges, with an emphasis on scale and accuracy.
Views here are the author’s own and not of any employers or companies. The author may be compensated for pieces in this newsletter.