2008-01-22

Using wget to download a whole site

Found the following notes and I thought I want to try it.

Recursive options for wget.

wget -r -p -l 2 http://website.com


-r = wget recursively
-p = download all files (incl. images) necessary to render the html pages
-l 2 = descend maximum 2 levels (default is 5)

Another useful option is "-np" or "--no-parent". This will prevent wget from ascending to the parent directory, such as when you want to download

http://website.com/subdirectory
but not everything on

http://website.com

So here goes my attempt to download the whole Quran recited by Muhammad Ayyoub:

wget -r -p -l 2 http://www.versebyversequran.com/data/Muhammad_Ayyoub_128kbps/

The above method downloads everything! To keep just the MP3 files:

wget -r -l1 --no-parent -A.mp3 http://www.versebyversequran.com/data/Muhammad_Ayyoub_128kbps/

Seems to work. It's takes 24 hours just to get to Surah 26 on an 800 MHz G4 on static IP.

No comments: