wget mirror a website directory to a local directory



Notes on using wget to mirror a directory of a website to a local directory





I think that the following options are sufficient
$ wget -nH --mirror --no-parent --cut-dirs=2 --page-requisites --directory-prefix=2 --convert-links http://example.com/1/2/index.html



eg: To mirror everything from in the http://alog.ipduh.com/search/label/linux/ 'directory' and down to a local directory 'linux'
$ mkdir linux
$ wget -nH --mirror --no-parent --cut-dirs=3 --page-requisites --directory-prefix=linux --convert-links http://alog.ipduh.com/search/label/linux/
$ ls linux
index.html  robots.txt
You should get two files in the local linux directory.
Look at index.html with a web browser ( with mozilla file:///path/to/linux/index.html ).



Options explanation from the manual

`-nH' `--no-host-directories' Disable generation of host-prefixed directories.  By default, invoking Wget with `-r
       http://fly.srk.fer.hr/' will create a structure of directories beginning with `fly.srk.fer.hr/'.  This option disables such
       behavior.

`-m' `--mirror' Turn on options suitable for mirroring.  This option turns on recursion and time-stamping, sets infinite
       recursion depth and keeps FTP directory listings.  It is currently equivalent to `-r -N -l inf --no-remove-listing'.

`--no-parent' Do not ever ascend to the parent directory when retrieving recursively.  This is a useful option, since it
       guarantees that only the files _below_ a certain hierarchy will be downloaded.  see "Directory-Based Limits", for more d


`--cut-dirs=NUMBER' Ignore NUMBER directory components.  This is useful for getting a fine-grained control over the directory
       where recursive retrieval will be saved.

`-p' `--page-requisites' This option causes Wget to download all the files that are necessary to properly display a given HTML
       page.  This includes such things as inlined images, sounds, and referenced stylesheets.

`-P PREFIX' `--directory-prefix=PREFIX' Set directory prefix to PREFIX.  The "directory prefix" is the directory where all
       other files and subdirectories will be saved to, i.e. the top of the retrieval tree.  The default is `.' (the current
       directory).

`-k' `--convert-links' After the download is complete, convert the links in the document to make them suitable for local
       viewing.  This affects not only the visible hyperlinks, but any part of the document that links to external content, such as
       embedded images, links to style sheets, hyperlinks to non-HTML content, etc.






wget mirror a website section to a directory