Extraiga una parte particular de un gran bloque de código HTML almacenado en la variable PHP

I have an embeddeble code of a slide like below. this whole html is stored in a variable $embed_code.

I am printing this code in PHP. Now I want a piece of code from this HTML string.

The code is written below. I want the code between <object> tag only.

$embed_code = '
 <div style="width:425px" id="__ss_617490"><strong style="display:block;
 margin:12px 0 4px"><a href="http://www.slideshare.net/al.capone/funny-beer-babies-
 enginnering-rev-2-presentation" title="Funny beer babies enginnering rev. 
 2">Funny beer babies enginnering rev. 2</a></strong>


<object id="__sse617490" 
 width="425" height="355"><param name="movie" value="http://static.slidesharecdn.com
/swf/ssplayer2.swf?doc=becoming-an-engineer-1222340701618958-9&stripped_title=funny-  
 beer-babies-enginnering-rev-2-presentation&userName=al.capone" /><param  
 name="allowFullScreen" value="true"/><param name="allowScriptAccess" value="always"/>
 <embed name="__sse617490" src="http://static.slidesharecdn.com/swf/ssplayer2.swf?doc=
  becoming-an-engineer-1222340701618958-9&stripped_title=funny-beer-babies-enginnering-
  rev-2-presentation& userName=al.capone" type="application/x-shockwave-flash" 
   allowscriptaccess="always" allowfullscreen="true" width="425" height="355"></embed> 
  </object>




 <div style="padding:5px 0  12px">View more<a href="http://www.slideshare.net
  /"> presentations</a> from <a href="http://www.slideshare.net/al.capone">
  al.capone</a>.</div></div>';

Now I want this string only from <object id="....." to "</embed> </object> this whole HTML is generated dynamically so give me any idea for this.

How can I do this? Is there any PHP function that can extract html between of any tag?

preguntado el 08 de noviembre de 11 a las 16:11

You can use a regexp or a dom parser -

@soju: I'd +1 for suggesting a dom parser, but there's no way to -99999999 for suggesting regexes. So... +0 it is. -

Well, in this particular case, a simple regexp is enough -

HTML markup and "simple regex" are mutually exclusive terms! -

3 Respuestas

Use the DOMDocument classes.

$dom = new DomdDocument ();
$dom -> loadHtml ($embed_code);
$htmlObject = $dom -> getElementById ('__sse617490'); // Returns a DomElement

http://www.php.net/dom

respondido 08 nov., 11:20

+1; PHPQuery, which I mentioned in my answer simply wraps this with a nicer (in my opinion) API. - Treffynnon

but i said that this html is generated dynamically so the id of div will changed at every new slide. - Manish Jangir

In that case you need some way of consistantly identifying the <object> for every slide. If the <object> on the page is the only object tag then you can simply use getElementsByTagName(). If not, then you'll need to modify the code that generates the markup to make it possibly to make the object distinct from all other markup on the page, perhaps by adding a class. - GordonM

@rajzana You want $dom->getElementByTagName('object');. Ver: php.net/manual/en/domdocument.getelementsbytagname.php - Treffynnon

@GordonM He appears to be scraping Slideshare so I don't think he can change the markup. - Treffynnon

Me gusta usar PHPQuery to parse and extract data from HTML with PHP. It uses jQuerys simple CSS style selectors for traversing the code.

Entonces sería:

require('phpQuery/phpQuery.php');
$doc = phpQuery::newDocumentHTML($embed_code);
$div = pq('div#__ss_617490'); // select a DIV with the specified ID
var_dump($div->attr('style')); //To get the style attribute
var_dump($div->html()); // To get the inner html

// now to get the object tag like you desire.
$object_tag = pq('object');

// only get the first object
$object_tag = pq('object:first');

respondido 08 nov., 11:20

You could just use a regex to parse and extract it:

$embed_code = "blah blah <object ...>and other code here</object> blah blah";

$matches = array();
preg_match('#<object(\s*[^>])?>(.*)</object>#iU', $embed_code, $matches);

// $matches[0] = "<object ...>and other code here</object>"
// $matches[1] = "and other code here"

respondido 08 nov., 11:20

As discussed by @MarcB 8 minutes agao regex isn't the best or cleanest solution to an HTML parsing problem. - Treffynnon

@Treffynnon This depends on the context - sometimes creating a whole DOM structure in memory just to extract part of the text it contains is overkill and a regex is more efficient. - daiscog

Clarity is of more importance. Memory is cheap. Time wasted debugging code is expensive. - Treffynnon

Memory may be of importance in some circumstances. And what about processing time? Sometimes a regex will be quicker than a DOM parser. Again, it all boils down to context and considerations such as the level of control over the input (user/system generated, always well-formed?) should be taken into account. Hence why my post says "could" not "should". - daiscog

To be clear I did not down vote this answer. Someone else must feel even more strongly than me! - Treffynnon

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.