Finding bugs using WayBackMachine
WayBackMachine is an archive of websites which contains over 330 billion web pages, all indexed for you to search through! WayBackMachine scrapes websites and saves a copy of it and you are able to go back numerous amounts of years & view what they use to look like. The Internet Archive is non-profit so if you do discover a finding because of their data then make sure to say thanks and support them! https://archive.org/donate/
What should I look for?
My personal favourite is scraping old robots.txt data. Companies will list endpoints they don't want indexed by Google inside this file and as time goes on they may add and remove endpoints. WayBackMachine has it all archived! Old endpoints which may have been removed in 2018 may still work and contain interesting functionality to play with.
A great tool to automate this process is WayBackRobots by mhmdiaa
WayBackMachine also has a list of every URL it has scraped for your chosen domain. For example you could search for every URL WayBackMachine has archived for www.google.com. WayBackUrls by mhmdiaa again, will extract URLs archived by WayBackMachine for the domain you input.
An alternative in GO is TomNomNom's WayBackUrls
Since WayBackMachine contains the raw HTML for websites you can begin hunting through these manually to extract things such as parameter names and begin testing these. They may still work! I don't know of a tool that will automatically extract parameter values from WayBackMachine however if you know of one feel free to reach out via Twitter! You could however have BURP running whilst browsing WayBackMachine and extract information via Burp.
Aside from trying to discover old (but new to you!) parameters, you can even try current known parameters on old endpoints archived by WayBackMachine. Remember, developers like to re-use code!
WayBackMachine has a lot more data archived than people realise, it's just about shifting through the data to discover the important parts. To sum up, you are primarily looking for:
- Old endpoints that may still be live
- Old parameters that may be re-used
- Anything old that looks like it may have been interesting! (Creds exposed? Old API keys?)