¿Cómo rastrear y descargar todos los archivos pdf desde el enlace html?

This is my code to crawl all pdf links but it doesn't work. How to download from those links and save to a folder on my computer?

include 'simple_html_dom.php';

$url = 'http://example.com';
$html = file_get_html($url) or die ('invalid url');

//extrack pdf links
foreach($html->find('a[href=[^"]*\.pdf]') as $element)
echo $element->href.'<br>';

preguntado el 01 de febrero de 12 a las 22:02

it looks like you have a typo, in the foreach loop, $htnl should be $html. If that wasn't in your oriiginal code, what exactly is the error you're getting? -

@ggreiner in my ori code, there's no typo, sorry. i miss typo here. blank result in my web page -

3 Respuestas

foreach($htnl->find('a[href=[^"]*\.pdf]') as element)
           ^---typo. should be an 'm'        ^---typo. need a $ here

How does your code "not work", other than because of above typo?

Respondido 02 Feb 12, 02:02

ups, sorry, in my original code, there's no typo -.-. it doesn't work, blank result in my web page - puresmile

Have you looked into into phpquery? http://code.google.com/p/phpquery/

Respondido 02 Feb 12, 03:02

More simple solution here will be:

foreach ($html->find('a[href$=pdf]') as $element)


[attribute$=value] Matches elements that have the specified attribute and it ends with a certain value.

respondido 24 mar '21, 10:03

