API Hound was a search engine I built to allow developers to find APIs exposed by various web sites. The gold-standard for an API directory was (and still is, afaik) ProgrammableWeb.com founded by my friend, John Musser.
My one issue with PW is that the index of APIs is manually assembled. That means it’s a hand-curated index, just like Yahoo! was in the early days. Surely, we can create something better automatically – the “Google” of API directories if you will.
The first challenge was where to get a crawl of the web? This is not something you want to do yourself. Luckily, I found commoncrawl.org which exposes a huge index of pages in public Amazon S3 buckets.
Next – the processing! I wrote some basic code to scan web crawls and identify potential APIs (the “secret sauce”). Having found API-specific pages, I used another API and metadata to categorize the pages (e.g., Finance, Real Estate, Weather, etc). I used Amazon’s cloud (Map-Reduce) to process the millions of pages in relatively short order. The resulting data was made searchable through a simple front-end, backed by Apache Solr.
Alas, one thing I had not fully grasped was the market for an API search engine. I talked to John about his experiences with ProgrammableWeb, and began to realize the challenges. Search Engines in general are hard to monetize: a niche engine focused on developers looking for APIs would be even harder. Generic ads don’t receive enough pageviews to generate any income; the best path forward would be direct campaigns with dev-tool providers but that requires time and a dedicated sales team that I did not have. Although technically it was an interesting product to build, I decided financially it did not make sense, so I decommissioned the site in 2018.
Some of the basic code/experience from this project did make its way into a tool I created to monitor undocumented API usage by development teams at my job.