Infinite loop using wget
Im working in a project where I have to simulate traffic to certain website sites. The solution had to be simple and while python would be the obvious choice bash was right there with wget to be used with less lines and libs than python.
Requirements
- Had to run forever
- URL iterations had to be random
- I want to use a different http-user-agent each iteration
- I want to wait a random space between each iteration
WGET Options
You probably know this, but wget allows you to access a website without download its content, along with many other options. Here is the ones Im using.
wget --spider --recursive --delete-after --no-check-certificate --timeout=30 --tries=1 --no-cache --level=3 --max-redirect=3 -nH -U "$uagent" $line
–spider
When invoked with this option, Wget will behave as a Web spider, which means that it will not download the pages, just check that they are there.
–recursive
This means that Wget first downloads the requested document, then the documents linked from that document, then the documents linked by them, and so on.
–delete-after
This option tells Wget to delete every single file it downloads, after having done so.
–no-check-certificate
Don’t check the server certificate against the available certificate authorities. Also don’t require the URL host name to match the common name presented by the certificate.
–timeout=30
Wait no longer than 30 seconds for a page to load (30 second is a lot!!)
–tries=1
Tries as much as 1 time then move on
–no-cache
Disable server-side cache. In this case, Wget will send the remote server an appropriate directive (‘Pragma: no-cache’) to get the file from the remote service, rather than returning the cached version
–level=3
Specify recursion maximum depth level depth, in this case 3.
–max-redirect=3
Specifies the maximum number of redirections to follow for a resource
-nH
Disable generation of host-prefixed directories. By default, invoking Wget with ‘-r http://fly.srk.fer.hr/’ will create a structure of directories beginning with fly.srk.fer.hr/. This option disables such behavior.
-U
This option allows you to change the User-Agent line issued by Wget. Use of this option is discouraged, unless you really know what you are doing.
The code
Random wait time is defined by the snippet
sleep $[( $RANDOM % 10) + 1]s
and the random http user-agent is set by
uagent=$(shuf $ua|head -n 1)
where $ua is the list of user agents you want.. mine I got from this website – https://developers.whatismybrowser.com/useragents/explore/
#!/bin/bash # Load files ua="user-agents.txt" urls="urls.txt" input="shuffled.txt" #Loops forever while true; do while IFS= read -r line; do shuf $urls > shuffled.txt # I want the list to be randomized each round uagent=$(shuf $ua|head -n 1) # I want to use a different user-agent each request. wget --spider --recursive --delete-after --no-check-certificate --timeout=30 --tries=1 --no-cache --level=3 --max-redirect=3 -nH -T 60 -U "$uagent" $line sleep $[( $RANDOM % 10) + 1]s done < $input done
Deixe um comentário