OSINT: how to find anything on the internet

"The best place to hide a dead body is page two of Google."

You’ve probably seen this meme floating around the internet a while ago. An ironic nod to a rather surprising fact: 90 percent of the clicks on search engines are concentrated on the first page of results.

Is it because we find everything immediately? (lol) Because we don't want to look any further? Lack of time, lack of curiosity? And yet, the web is full of digital rabbit holes that underline unlimited opportunities and endless discoveries that will leave you in awe. That’s where Open Source Intelligence (OSINT) comes in handy. The acronym represents information collected from public sources, such as those available on the internet. But just because open source is free or publicly available doesn't mean it is easy.

Everyone uses Google, but few use it well

Ah Google ... a universal tool for search in western countries. But, while everyone uses or has used it at some point, few know how to use it correctly, or at least efficiently. An example:  Looking for an email address? You will, like most internet users, opt for one of these three solutions:

1- A classic search such as "Last name + first name + email" and all its variants. For a business email, some will observe the way the address is structured in the company to determine a specific contact. Relatively effective, but not necessarily speedy.

2- A search with a little more thought like "Tool to find email addresses," and in this case, you will certainly run across a service like Hunter.

3- A smart search thanks to Google's operators/commands and wise tricks. If you type the query below and the email you are looking for has been published online, you will find it. Simple, efficient, fast. However, you must know the trick!

Small aside: Is your knowledge of search operators a little outdated? Here are two articles to catch-up. This one here (the entire blog is actually worth the read) or this one, more for the mainstream public.

On a personal note, these are the ones I use regularly:

To track specific pages within a website, (or how to use Google to search any other site):

site:reddit.com “social audio”

A variation that I like for photos (also for brands...):

site:facebook.com "image may contain dog"

To find similar sites:

Related:substack.com

To target Google Docs on a particular subject (another one of my fav’ hobbies ) :

site:docs.google.com "emerging trends."

For your information, there is a site that serves as a reference for Google Dorks, it can be found here. This term, coined in 2002 by security researcher Johnny Long, refers to queries that aim to reveal security holes.

Nothing can replace a sharp and disciplined mind

Something key to take into account: there are thousands of open-source search engines. The 2020 report produced by specialized consulting firm I-Intelligence (more than 500 pages on the subject) maps only a tiny part of this luxuriant landscape. On the "newsletter search engine" part alone, a good thirty are missing. 

Don't get me wrong: certainly monitoring the sector is a valid practice, but the good OSINT researcher is not the one who scans all the tools available on the market, at the risk of drowning in them. They must have methodological rigor, a strong analytical capacity, a form of creativity — the term "hacking" is often used to. And yes, if you search like everyone else, inevitably, you will find the same thing as everyone else. In some cases, that's enough, but if you're working on emerging sectors and trends, you won't hit the bull's eye. 

Let's illustrate my point. When I review pitch decks, I examine the competition/market slide thoroughly because I like to understand the specificities of each player. Very often, I dig a little deeper and discover players that have not been mentioned. It is then advisable to carry out your competition mapping using OSINT tools. Here, again, there are several ways of doing things:

  • If there is an industry body, an association that is a reference in the sector, you can start by looking at the company directory or do some research on Google. That's already good, but it won't necessarily provide visibility for newcomers. 

  • On the OSINT tools side, there are many tools like StartengineStartupBlinkStartupRanking, etc. Long story short: I've never found a gem this way. Of course, there can be more quality results with sites like Product Hunt BetalistHype UrlsLaunching Next, but this is not my first reflex. When I want to map an emerging technology sector, I still believe Twitter is the go-to place.  

    I search for keywords related to the sector I'm interested in on Twitter / Instagram bios, thanks to tools like Followerwonk or Searchmybio. Why do I do this? Often, when a company is created, very quickly, sometimes even before it has been officially launched, the founders save the Twitter / Instagram handle to be sure it won't be taken.

    If I know the company's website, I will type the URL as it is in the Twitter search engine or via its name. For two reasons: if it's an emerging actor (my favorite subject, weak signals, trends...), I want to know who shared the link, to determine which profiles are interested in the subject, perhaps these people specialize in the subject and worth watching closely. There are excellent tools for scraping Twitter data on profiles such as Twint and others to map the influence circles for a given account.  Second reason: Listing market players is recurrent on Twitter. Often, people will tweet lists, so if you find one, you often can easily identify several of them in this way. Example in the tweets below:

Two important points: if you spot someone who has already published a map, chances are, they are a practitioner of the technique. It is probably worth adding them to a dedicated list on Twitter because they will probably start again. For example, Selfdriving.fyi, Michael Bock’s website, presents itself as the most comprehensive database of companies working on autonomous vehicles and related technologies. Another example is Eric Peckham's public media investor database on Airtable. These initiatives should increase thanks to the development of the creator economy and the trend of low-code/no-code tools that greatly facilitate the creation and updating of databases on certain sectors.

Another very personal way to find inspiration on Twitter: I type the query "who is building" ... which brings out interesting entrepreneurial personalities and start-ups.

Some great tools worth knowing

I recently had an exchange with Stefanie Proto, who I've been following for a while, and with good reason: she's a living encyclopedia of OSINT! As she explains in her Twitter bio, she is "obsessed with niche search engines, search tools and discovering new ways to find information online." I asked her what her favorites are and why. She told me these three: 

  • Usersearch, according to her, an awesome tool to search usernames. It searches several social networking sites, dating sites and forums.  

  • Camhacker and Insecam, webcam search engines that spot unprotected security cameras online, along with many others.

  • Pimeyes and Pictriev, which use facial recognition technology to find similar "faces."

I then asked her what the most unusual niche search engines she encountered were. She mentioned two of them:

  • Lumendatabase, a searchable database of legal complaints and requests for removal of online material ... a good legal resource, but quite hysterical.

  • Millionshort, very useful because it allows the user to highlight uncommon sites that are normally buried in the middle of thousands of more popular ones.

For my part, I'm going to stick to the SOCMINT field, aka social media intelligence. 

  • I'm very interested in private messaging, how to detect groups on Telegram and Whatsapp in some of my favorite areas (start-up/trends). I've listed more than a dozen of them, which I haven't explored enough yet to recommend any in particular. If the subject interests you, this video should give you some very relevant tips.

  • I recently discovered StartupLynx, which allows you to identify similar start-ups around the world. I spoke with founder Karl Verger who explained to me that he has been working on this project for two years now (NB: StartupLynx has just opened a beta version with 100,000 companies, but should soon launch a new version with more than 400,000 companies.) His goal? To create THE reference in terms of technological and economic intelligence on start-ups. But sourcing so many companies is quite a challenge; qualifying them implies an AI engine trained on the users' experiences and based on their feedback. The whole thing is based on a Python FastAPI and MongoDB stack, helped strongly by NLP / machine learning techniques. Facing competition like Product Hunt, StartupLynx is unique in that it can detect similar start-ups to determine the actors who have a similar idea. The service has been enriched with a presentation of companies through time, whether they are active or dead. It is always interesting to check if an idea has already been launched but not completed. A pitch line, the name of a website and a few keywords are enough to find your way around thanks to the platform's semantic analysis approach.

So, passionate about OSINT? I hope so! I'll leave it there for now and I'll catch up with you soon. Until then, stay curious!

Marie


💎 Snippets & other curiosities


  • 🌟 Post-covid education. Read here

  • 👀 Exploring the dark consequences of our dependence on the Internet: "With attention as the dominant value, all other values are in flux." Read here

  • 👀 Startups, It’s Time to Think Like Camels — Not Unicorns. Read here

  • 📺 Ahrefs, SEO Intelligence platform, offers 5 hours of free courses with all the tips to make your blog a profitable business. Watch here

    This tweet because Wow…