Use la extracción de expresiones regulares en cada sección del artículo

The article segmentation have two kinds of cases:


 1. < p > the first paragraph < / p > < p > the second paragraph < / p >...
 2. < p > the first period of < br / > < br / > the second paragraph < br / > < br / > the third paragraph < / p >

I write the code as follows:


$body_arr = preg_split('/\<\/?p\>/',$body,-1,PREG_SPLIT_NO_EMPTY);
echo count($body_arr);
    if(count($body_arr)<4) 
    {
       $body_arr = preg_split('/(\<br\/?\>)\s*\\1/',$body,-1,PREG_SPLIT_NO_EMPTY);
       $body1 = $body2 = $body3 = '';
       $total = count($body_arr);
       $maxed = max(floor($total / 2), 3);
       foreach ($body_arr as $k => $v) 
       {
            if ($k == 0) 
            {
                $body1 = $v . "<br><br>";
            } 
            else if ($k < $maxed) 
            {
                $body2.=$v . "<br><br>";
            } 
            else 
            {
                $body3.=$v . "<br><br>"  ;
            }
       }
     }
  • Es el segundo

  • The result is wrong.

preguntado el 04 de julio de 12 a las 08:07

Can you please explain more precisely the issue you encounter (Cómo preguntar): - What do you want to do? - What is "the" article you mention in your post title? - What does not work with your code? If possible ask a question about what you would like to be helped with. -

1 Respuestas

You can split the text with a single regex using nested groups. You're starting with a p tag, followed by multiple paragraphs that end in either another close/open p tag, a pair of br tags, or a final close p tag.

The close/open p tag can be represented with the following:

<\s*//*\s*p\s*>[\s|\r|\n]*<\s*p\s*>

The double br tag can be represented with the following:

<\s*br\s*//*\s*>[\s|\r|\n]*<\s*br\s*//*\s*>

And the close p tag can be represented with the following:

<\s*//*\s*p\s*>

Note that I'm allowing for space between tags because you had it in your example, but remove the \s* if they're not necessary. Stitch that together using some nested groups and you end up with something like this:

<\s*p\s*>((?<Paragraph>[^<]*)((<\s*//*\s*p\s*>[\s|\r|\n]*<\s*p\s*>)|(<\s*br\s*//*\s*>[\s|\r|\n]*<\s*br\s*//*\s*>)|(<\s*//*\s*p\s*>)))*

I tested that with your examples and it works. From the example I'm assuming that you don't have tags in the middle of the paragraphs, but you'll have to use something fancier than not the start of a tag to capture the actual text if that isn't the case.

Respondido 23 Jul 12, 03:07

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.