Finding bugs using WayBackMachine
WayBackMachine is an archive of websites which contains over 330 billion web pages, all indexed for you to search through! WayBackMachine scrapes websites and saves a copy of it and you are able to go back numerous amounts of years & view what they use to look like. The Internet Archive is non-profit so if you do discover a finding because of their data then make sure to say thanks and support them! https://archive.org/donate/
What should I look for?
My personal favourite is scraping old
robots.txtdata. Companies will list endpoints they don't want indexed by Google inside this file and as time goes on they may add and remove endpoints. WayBackMachine has it all archived! Old endpoints which may have been removed in 2018 may still work and contain interesting functionality to play with.
One bug bounty program listed almost every endpoint on their website in robots.txt. Using waybackmachine I was able to gather the last 7 years worth of data and increased my recon from 400 endpoints to 7000+. This helped me net a LOT of bugs and bounties!.
WayBackMachine also has a list of every endpoint it has scraped for your chosen domain. For example you could search for every endpoint WayBackMachine has archived for www.google.com. Below are some tools to help automate this process.
This next approach does require some manual testing as I don't know of a tool that does this yet. WayBackMachine contains the raw HTML for websites, for example this means the last 7 years worth of HTML for
https://www.hackerone.com/are available. You can begin hunting through these manually to extract things such as parameter names used on each page. For example you may discover an old input,
<input type='hidden' name='ref' value=''>that was used 3 years ago. The parameter isn't in the HTML today, however the code for it may still be there! Try this old parameter and see if it still works.
Aside from trying to discover old (but new to you!) parameters, you can even try current known parameters on old endpoints archived by WayBackMachine. Remember, developers like to re-use code!
WayBackMachine has a lot more data archived than people realise, it's just about shifting through the data to discover the important parts. To sum up, you are primarily looking for:
- Old endpoints that may still work. Old code = bugs!
- Old parameters that may be re-used
- Old api keys referenced