Cómo extraer parte del texto con php [cerrado]

I have following text and want to get 'canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8%40group.calendar.google.com' which is after src=.

Is REGEX is the way?

$text ='<iframe src="http://www.google.com/calendar/embed?src=canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8%40group.calendar.google.com&ctz=Asia/Tokyo" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"></iframe>';

Gracias de antemano.

preguntado el 23 de abril de 13 a las 13:04

2 Respuestas

Usa expresiones regulares.

preg_match("/\\?src=([^&\"]+)/i", $text, $results)
var_dump($results)

Respondido 23 Abr '13, 13:04

Won't get what he wants. - pezuña de marga

@Loamhoof, corrected, thanks. - Michal Rus

You could include another check too. If there are no other get parameters: [^&"] - pezuña de marga

La expresión regular es probablemente . camino:

$src = preg_replace('(.*?(?<==)([^&"]+).*)i', '\\1', $text);

However I would give the following hint as el camino: Divide an conquer. Divide a problem into smaller ones and solve the overall problem step-by-step then. This works for many problems. As an example:

  • First: Get the SRC attribute value from the string

There are a thousand ways to do this, incl. regular expressions. As a regex would assume the string is always formatted this way and extracting the URL attribute value is actually trivial, I am using a different function that supports regular expressions: sscanf:

$url = sscanf($text, '<iframe src="%[^"]')[0];

# string(126) "http://www.google.com/calendar/embed?src=canaca.../Tokyo"

So now the URL is already extracted. As this is an URL, it can be processed with standard URL functions. Let's see:

  • Second: Parse the query from the URL

To get the SRC value from the URL you could use a regular expression again. However, as PHP has functions that are specific to URL handling, I use those instead. I can exactly say what I need with parse_url. And this time I first of all need the pregunta part of the URL. That is the part that has the query variables after the question mark:

$query = parse_url($url, PHP_URL_QUERY);

# string(89) "src=canacad.ac.jp_dqrg6k9pg1s879somecodekj88.../Tokyo"

This is already one step further to the value we're looking for. So there is another step to do:

  • Third: Parse the SRC value from the query

Here again PHP has a function built in to do that. We can extract all variables in a query from an URL with the parse_str function. As it returns the results via a function parameter, this now needs two lines of code:

parse_str($query, $vars);
$src = $vars['src'];

# string(68) "canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8@group.calendar.google.com"

And now in the $src variable is the value you're looking for.

Here the whole code from above at a glance:

$text = '<iframe src="http://www.google.com/calendar/embed?src=canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8%40group.calendar.google.com&ctz=Asia/Tokyo" style="border: 0" width="800" height="600" frameborder="0" scrolling="no"></iframe>';


$url   = sscanf($text, '<iframe src="%[^"]')[0];
$query = parse_url($url, PHP_URL_QUERY);

parse_str($query, $vars);
$src  = $vars['src'];

var_dump($url, $query, $src);

The output is as follows, showing all three steps:

string(126) "http://www.google.com/calendar/embed?src=canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8%40group.calendar.google.com&ctz=Asia/Tokyo"
string(89) "src=canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8%40group.calendar.google.com&ctz=Asia/Tokyo"
string(68) "canacad.ac.jp_dqrg6k9pg1s879somecodekj88c8@group.calendar.google.com"

So regardless which functions you use in each of those steps: if you divide a problem into smaller parts you nearly always will be able to solve larger problems. And also if there is a problem in one of the sub-steps, you only need to fix a single step - not the whole operation. If you use a single regular expression to do all this work, you would have the single point of failure (and crafting a good regular expression in the world of HTML and URLs is non-trivial so it likely will break).

A perfect solution would use an HTML parser for the first step for example. For example with the Extensión ordenada or with the popular Extensión DOMDocument:

// Tidy (non error-checked):
$url = tidy_parse_string($text)->body()->child[0]->attribute['src'];

// DOMDocument (non error-checked):
$url = @DOMDocument::loadHTML($text)->getElementsByTagname('iframe')
             ->item(0)->getAttribute('src');

A HTML parser has the benefit that it understand the HTML elements. You can look for specific elements and attributes even if their position changes.

Respondido 23 Abr '13, 14:04

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.