Wednesday, July 11, 2007

leech with wget

This afternoon, I tried to make duplication from a http server to my external harddrive using wget. I want to make a note at my blog, so everytime when I forget about the command, this note can help me. Here is my sintax for wget to get all of contents from a directory in a server recursively, including the structure of directories.

wget -rv -l 10 -t0 -I start_of_remote_directory http://host_name

Option -r is used to download all of content recursively. If you want to make more deep for the level of directory, you can specify in the -l option.
There are many ways to do like that, one project that may be appropriate for you is Leech based on PHP. I was used this tools, but I change it only with wget.
In another case, if you have a html file that contain all links that you want to download, you can use this command:

wget -rv --user-agent="you user agent" -i file.html -F

In some cases, some web servers has configured to received only known user agent. For this condition, you can pass most famous user agent, like : "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.10) Gecko/20070510 Fedora/1.5.0.10-6.fc6 Firefox/1.5.0.10".

1 comment:

Anonymous said...

Thanks! I needed that.