Finding bugs using WayBackMachine


WayBackMachine is an archive of websites which contains over 330 billion web pages, all indexed for you to search through! WayBackMachine scrapes websites and saves a copy of it and you are able to go back numerous amounts of years & view what they use to look like. The Internet Archive is non profit so if you do discover a finding because of their data then make sure to say thanks and support them! https://archive.org/donate/

What should I look for?

My personal favourite is scraping old robots.txt data. Comapnies will list endpoints they don't want indexed by google inside this file and as time goes on they may add and remove endpoints. WayBackMachine has it all archived! Old endpoints which may of been removed in 2018 may still work and contain interesting functionality to play with.

A great tool to automate this process is WayBackRobots by mhmdiaa

——

WayBackMachine also has a list of every URL it has scraped for your chosen domain. For example you could search for every URL waybackmachine has archived for www.google.com. WayBackUrls by mhmdiaa again, , will extract urls archived by WayBackMachine for the domain you input.

An alternative in GO is TomNomNom's WayBackUrls

Manual testing

Since waybackmachine contains the raw HTML for websites you can begin hunting through these manually to extract things such as parameter names and begin testing these. They may still work! I don't know of a tool that will automatically extract parameter values from waybackmachine however if you know of one feel free to reach out via Twitter! You could however have BURP running whilst browsing WayBackMachine and extract information via Burp.

Aside from trying to discover old (but new to you!) parameters, you can even try current known parameters on old endpoints archived by WayBackMachine. Remember, developers like to re-use code!

——

Final Words

WayBackMachine has a lot more data archived than people realise, it's just about shifting through the data to discover the important parts. To sum up, you are primarily looking for:

  • Old endpoints that may still be live
  • Old parameters that may be re-used
  • Anything old that looks like it may of been interesting! (Creds exposed? old API keys?)