PHP: cadena precisa dividida por palabras y etiquetas en una matriz

Task is to split string by 500 characters into array. I've done this with str_split, but I've got a problem. Ofcourse it must be spitted by words, or else this text is not readable. And more then that. This text comes with links, and links will be broken if I split them (infact any html) =) So I need to start splitting only if tag ended or even not started yet... same goes to the words. ±100 chars is not a problem.

I would really appreciate a piece of code to do that. I'm not very good with regexps.

EDIT: Ejemplo

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec ac diam non nisl interdum tempus. Nam id ipsum id nunc tempus varius. Suspendisse ut neque a velit elementum placerat. Curabitur lobortis, lorem sit <a href="#">amet tincidunt ultricies,</a> eros ante feugiat dui, sit amet lacinia metus risus a magna. Duis velit dui, sollicitudin at aliquet et, elementum at dui. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae;

Guión:

<?php

$str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href=\"http://example.com\">Phasellus condimentum
facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod id, pharetra eu libero.
Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu
augue aliquet dictum. Mauris at purus in lectus varius bibendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>,
at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend
ornare. Phasellus eget justo elit.";

$str = str_split($str, 200);

var_dump($str);

Salida:

    array(4) {
  [0]=>
  string(200) "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href="http://example.com">Phasellus condimentum 
facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod "
  [1]=>
  string(200) "id, pharetra eu libero. 
Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu 
augue aliquet dictum. Mauris at purus in lectus varius bi"
  [2]=>
  string(200) "bendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>, 
at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend 
ornare. Phasellus"
  [3]=>
  string(17) " eget justo elit."
}

It's a harsh character split, half of word comes to $str[1]. And if it was a link right by that place, it would be corrupted.

preguntado el 09 de enero de 11 a las 10:01

Have you tried explode(" ", $string) ? -

I would really appreciate some sample data :) -

¿Manejas verdaderamente need to keep the HTML tags intact? -

Slightly different but solvable with same approach as ¿Cómo reemplazar las URL de texto y excluir las URL en las etiquetas HTML?. If you can supply an example input and output string, people might be more willing to help. -

Edited. Every array element is like a "page number" to display =) That what I want. Normal readable content on every page without broken html and splitted-half-words. -

2 Respuestas

It would probably be best not to do this with regexes but with PHP's native XML/HTML parsing capabilities. Something like the following code may well do what you want:

<?php

$str = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. <a href=\"http://example.com\">Phasellus condimentum facilisis ipsum</a>, quis elementum urna ornare non. Cras nisi libero, dapibus sed euismod id, pharetra eu libero. Maecenas mi nulla, ultrices in congue in, viverra ac massa. Quisque <br/>at turpis nulla. Suspendisse semper urna eu augue aliquet dictum. Mauris at purus in lectus varius bibendum. <em>Fusce hendrerit <strong>posuere ante</strong></em>, at pellentesque odio lobortis at. Integer quis urna eget ipsum dictum volutpat quis et leo. Etiam hendrerit eleifend ornare. Phasellus eget justo elit.";

$dom = new DOMDocument;

$root = $dom->createDocumentFragment();
$root->appendXML($str);

$bits = array();

foreach ($root->childNodes as $node) {
    if ($node->nodeType == XML_TEXT_NODE) {
        $bits = array_merge($bits, explode(' ', $node->nodeValue));
    } elseif ($node->nodeType == XML_ELEMENT_NODE) {
        $dom->appendChild($newnode = $node->cloneNode(true));
        $bits[] = $dom->saveHTML();
        $dom->removeChild($newnode);
    }
}

var_dump($bits);

Respondido el 09 de enero de 11 a las 14:01

Added some example, maybe you can fix your code please =) Thnx you for this example btw =) - holms

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.