PowerShell - a tool bundled on Windows has rich functions to extract elements in HTML pages easily.
I made a script to download images in a HTML page.
Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
Param( $url ) #Get the web page from the URL in the argument $page = Invoke-WebRequest $url #Download images foreach($linkElement in $page.Links.href) { # Set the file name as the last string of the path $fileName = Split-Path $linkElement -Leaf # Get extension from the file name $extension = [IO.Path]::GetExtension($filename) # In case of the extension is ".jpg" or ".png" if ( $extension -eq ".jpg" -or $extension -eq ".png") { # Get the file of the link and save it Invoke-WebRequest $linkElement -OutFile $fileName } } |
Get arguments on PowerShell script
In order to get arguments on PowerShell, "Param(arg1, arg2, ...)" is used.
At this time, The script receives 1 URL.
Param( $url )
Get a HTML page
“Invoke-WebRequest” is used to get HTML page by a PowerShell script.
An object of the page is set to $page.
$page = Invoke-WebRequest $url
Check links on the page
The page got by the script has links. The script need to check if the each link is for an image or not.
You can check each variable by foreach($variable in $array).
foreach($linkElement in $page.Links.href)
Escape from the loop after checking all the links.
Check file name from the link
In order to copy file name from the link, the script has to copy the last element of the path string.
“Split-Path” is used to do it. “Split-Path” has a option to get the last element(Leaf).
$fileName = Split-Path $linkElement -Leaf
More information is available on MSDN.
Get the extension from the file name
Path class in System.IO namespace is used to get the extension from the file name.
Path Class (System.IO)
https://msdn.microsoft.com/en-us/library/system.io.path(v=vs.110).aspx
[IO.Path]::GetExtension($string)
This Path class can not only getting extension but also do some operation to the string of the path.
Recognize whether the extension is image or not
If statement can be used to judge the extension. I thought that PowerShell has a function to classify extension type, but there is no such function.
if ( $extension -eq ".jpg" -or $extension -eq ".png”)
Downloading images
Invoke-WebRequest is used to download images.
Invoke-WebRequest $linkElement -OutFile $fileName
Images must be exported as file. Therefore, -OutFile option is used and the file name is used as an argument.
Save the PowerShell script and Run
PowerShell script file has “.ps1” extension.
I downloaded images as a trial.
Add “.\” at the head of the command to execute the script.
As the result, images were downloaded and saved.