The actual URL used to logout will differ by site. wget will crawl that link like any other, and the cookies that you set when you logged in will be wiped out. For example, loading /user/logout may end your session. One gotcha that I write about elsewhere is that wget can log you out by crawling certain links. ![]() The cookies will eventually expire, so if you want to mirror that site again, you might have to repeat the login step. ![]() You should now be able to mirror the entire site, including pages requiring authentication, like this: wget -mirror -load-cookies cookies.txt Now that you have your authentication cookie(s), you should be able to use wget to retrieve password-protected pages: wget -load-cookies cookies.txt Your values will all be different, including the cookie name. You can read more about the cookie format here, if you’re interested. You don’t really need to know what these values mean, as wget will take care of that for you. There may be additional cookies in there, as well. It should look something like this: cat cookies.txt. If your login was successful, your cookies.txt file should contain the cookies that will be used for subsequent GET requests. ![]() For example, the password pas$w/rd would appear as pas$w/rd. Warning: in both cases, the values for both username and password have to be percent-encoded. You can also save the values in a file and use the -post-file option, instead: cat > login.txt wget -save-cookies cookies.txt -post-file=login.txt \ If your username is and your password is password123, the wget command to login would be: wget -savecookies cookies.txt \ \ Then the endpoint you’ll post to is, and the names of the username and password form fields are (unsurprisingly) username and password. For example, if the login form for looks like this: Login You can do this by viewing the source of the login page and looking for the login form’s action, and the names of the username and password fields. Using wget To Loginįirst, you need to find the URL to post your credentials to. When you view the network traffic, you should be able to see the Set-Cookie header in the response headers when you login, and a Cookie header in the request headers of subsequent requests to the server. You can see this in action by using Firefox’s Web Developer tool ( Firefox > Tools > Web Developer > Network) or Chrome’s DevTools ( View > Developer > Developer Tools). In response, if the username and password are correct, the server response contains a Set-Cookie header that includes an authentication cookie to be presented in subsequent website requests. The button press triggers an HTTP POST request to the server with the username and password as data. Users enter their username and password into a login form’s text fields and click a “Login” button. Most website authentication works more or less the same way. If you’re using wget, you can use -exclude /big-files to exclude files in the /big-files directory.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |