In this methodology we are going to suppose that you are going to a attack a domain (or subdomain) and only that. So, you should apply this methodology to each discovered domain, subdomain or IP with undetermined web server inside the scope.
Start by identifying the technologies used by the web server. Look for tricks to keep in mind during the rest of the test if you can successfully identify the tech.
Any known vulnerability of the version of the technology?
Using any well known tech? Any useful trick to extract more information?
Any specialised scanner to run (like wpscan)?
Launch general purposes scanners. You never know if they are going to find something or if the are going to find some interesting information.
Start with the initial checks: robots, sitemap, 404 error and SSL/TLS scan (if HTTPS).
Start spidering the web page: It's time to find all the possible files, folders and parameters being used. Also, check for special findings.
Note that anytime a new directory is discovered during brute-forcing or spidering, it should be spidered.
Directory Brute-Forcing: Try to brute force all the discovered folders searching for new files and directories.
Note that anytime a new directory is discovered during brute-forcing or spidering, it should be Brute-Forced.
Backups checking: Test if you can find backups of discovered files appending common backup extensions.
Brute-Force parameters: Try to find hidden parameters.
Once you have identified all the possible endpoints accepting user input, check for all kind of vulnerabilities related to it.
Check if there are known vulnerabilities for the server version that is running.
The HTTP headers and cookies of the response could be very useful to identify the technologies and/or version being used. Nmap scan can identify the server version, but it could also be useful the tools whatweb,webtechor https://builtwith.com/:
Take into account that the same domain can be using different technologies in different ports, folders and subdomains.
If the web application is using any well known tech/platform listed before or any other, don't forget to search on the Internet new tricks (and let me know!).
Source Code Review
If the source code of the application is available in github, apart of performing by your own a White box test of the application there is some information that could be useful for the current Black-Box testing:
Is there a Change-log or Readme or Version file or anything with version info accessible via web?
How and where are saved the credentials? Is there any (accessible?) file with credentials (usernames or passwords)?
Are passwords in plain text, encrypted or which hashing algorithm is used?
Is it using any master key for encrypting something? Which algorithm is used?
Can you access any of these files exploiting some vulnerability?
Is there any interesting information in the github (solved and not solved) issues? Or in commit history (maybe some password introduced inside an old commit)?
At this point you should already have some information of the web server being used by the client (if any data is given) and some tricks to keep in mind during the test. If you are lucky you have even found a CMS and run some scanner.
Step-by-step Web Application Discovery
From this point we are going to start interacting with the web application.
Default pages with interesting info:
Check also comments in the main and secondary pages.
Web servers may behave unexpectedly when weird data is sent to them. This may open vulnerabilities or disclosure sensitive information.
Access fake pages like /whatever_fake.php (.aspx,.html,.etc)
Add "", "]]", and "[[" in cookie values and parameter values to create errors
Generate error by giving input as /~randomthing/%s at the end of URL
Try different HTTP Verbs like PATCH, DEBUG or wrong like FAKE
Launch some kind of spider inside the web. The goal of the spider is to find as much paths as possible from the tested application. Therefore, web crawling and external sources should be used to find as much valid paths as possible.
gospider (go): HTML spider, LinkFinder in JS files and external sources (Archive.org, CommonCrawl.org, VirusTotal.com, AlienVault.com).
hakrawler (go): HML spider, with LinkFider for JS files and Archive.org as external source.
dirhunt (python): HTML spider, also indicates "juicy files".
evine(go): Interactive CLI HTML spider. It also searches in Archive.org
meg (go): This tool isn't a spider but it can be useful. You can just indicate a file with hosts and a file with paths and meg will fetch each path on each host and save the response.
urlgrab (go): HTML spider with JS rendering capabilities. However, it looks like it's unmaintained, the precompiled version is old and the current code doesn't compile
gau go): HTML spider that uses external providers (wayback, otx, commoncrawl)
ParamSpider: This script will find URLs with parameter and will list them.
galer (go): HTML spider with JS rendering capabilities.
LinkFinder (python): HTML spider, with JS beautify capabilities capable of search new paths in JS files. It could be worth it also take a look to JSScanner, which is a wrapper of LinkFinder.
relative-url-extractor (ruby): Given a file (HTML) it will extract URLs from it using nifty regular expression to find and extract the relative URLs from ugly (minify) files.
JSFScan (bash, several tools): Gather interesting information from JS files using several tools.
page-fetch (go): Load a page in a headless browser and print out all the urls loaded to load the page.
Brute Force directories and files
Start brute-forcing from the root folder and be sure to brute-force all the directories found using this method and all the directories discovered by the Spidering (you can do this brute-forcing recursively and appending at the beginning of the used wordlist the names of the found directories).
Dirb / Dirbuster - Included in Kali, old (and slow) but functional. Allow auto-signed certificates and recursive search. Too slow compared with th other options.
Dirsearch (python): It doesn't allow auto-signed certificates but allows recursive search.
Gobuster (go): It allows auto-signed certificates, it doesn't have recursive search.
File Backups: Once you have found all the files, look for backups of all the executable files (".php", ".aspx"...). Common variations for naming a backup are: file.ext~, #file.ext#, ~file.ext, file.ext.bak, file.ext.tmp, file.ext.old, file.bak, file.tmp and file.old. You can also use the tool bfac.
Discover new parameters: You can use tools like Arjun,parameth,x8andParam Minerto discover hidden parameters. If you can, you could try to search hidden parameters on each executable web file.
Comments: Check the comments of all the files, you can find credentials or hidden functionality.
If you are playing CTF, a "common" trick is to hideinformation inside comments at the right of the page (using hundreds of spaces so you don't see the data if you open the source code with the browser). Other possibility is to use several new lines and hide information in a comment at the bottom of the web page.
If you find a .env information such as api keys, dbs passwords and other information can be found.
If you find API endpoints you should also test them. These aren't files, but will probably "look like" them.
JS files: In the spidering section several tools that can extract path from JS files were mentioned. Also, It would be interesting to monitor each JS file found, as in some ocations, a change may indicate that a potential vulnerability was introduced in the code. You could use for example JSMon.
You should also check discovered JS files with RetireJS or JSHole to find if it's vulnerable.
If any page responds with that code, it's probably a bad configured proxy. If you send a HTTP request like: GET https://google.com HTTP/1.1 (with the host header and other common headers), the proxy will try to accessgoogle.comand you will have found a SSRF.
NTLM Authentication - Info disclosure
If the running server asking for authentication is Windows or you find a login asking for your credentials (and asking for domainname), you can provoke an information disclosure.
Send the header: “Authorization: NTLM TlRMTVNTUAABAAAAB4IIAAAAAAAAAAAAAAAAAAAAAAA=” and due to how the NTLM authentication works, the server will respond with internal info (IIS version, Windows version...) inside the header "WWW-Authenticate".
You can automate this using the nmap plugin "http-ntlm-info.nse".
HTTP Redirect (CTF)
It is possible to put content inside a Redirection. This content won't be shown to the user (as the browser will execute the redirection) but something could be hidden in there.
Web Vulnerabilities Checking
Now that a comprehensive enumeration of the web application has been performed it's time to check for a lot of possible vulnerabilities. You can find the checklist here: