Cómo hacer coincidir exactamente dos líneas vacías

I have a question about regular expressions. I have a file and I need to parse it in such a way that I could distinguish some specific blocks of text in it. These blocks of text are separated by two empty lines (there are blocks which are separated by 3 or 1 empty lines but I need exactly 2). So I have a piece of code and this is \s*$^\s*$/ regular expression I think should match, but it does not. What is wrong?

$filename="yu";
open($in,$filename);
open(OUT,">>out.text");
while($str=<$in>)
{
unless($str = /^\s*$^\s*$/){
print "yes";
print OUT $str;
}
}
close($in);
close(OUT);

Cheers, Yuliya

preguntado el 08 de enero de 11 a las 19:01

What do you mean by blocks separated by two empty line(s)? What describes a valid block, can you give an example? -

^ y $ match start and end of cadenano, línea. To match start/end of line you need to add the /m regex modifier: $x =~ /^line1$^line2$/m -

Turned out to be trickier problem than I originally gave it credit for. Welcome to SO. -

4 Respuestas

Nueva respuesta

After having problems excluding >2 empty lines, and a good nights sleep here is a better method that doesn't even need to slurp.

#!/usr/bin/perl

use strict;
use warnings;    

my $file = 'yu';
my @blocks; #each element will be an arrayref, one per block
            #that referenced array will hold lines in that block

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 0;
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    $block_num++; # move on to next block
    $empty = 0;
  } else {
    $empty = 0;
  }

  push @{ $blocks[$block_num] }, $line;
}

#write out each block to a new file
my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out join("\n", @$block);
}

In fact rather than store and write later, you could simply write to one file per block as you go:

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';

open(my $fh, '<', $file);

my $empty = 0;
my $block_num = 1;
open(OUT, '>', $block_num . '.txt');
while (my $line = <$fh>) {
  chomp($line);
  if ($line =~ /^\s*$/) {
    $empty++;
  } elsif ($empty == 2) { #not blank and exactly 2 previous blanks
    close(OUT); #just learned this line isn't necessary, perldoc -f close
    open(OUT, '>', ++$block_num . '.txt');
    $empty = 0;
  } else {
    $empty = 0;
  }

  print OUT "$line\n";
}

close(OUT);

Respondido el 09 de enero de 11 a las 20:01

By default, Perl reads files a line at a time, so you won't see multiple new lines. The following code selects text terminated by a double new line.

    local $/ = "\n\n" ;

    while (<> ) {

      print "-- found $_" ;
    }

Respondido el 08 de enero de 11 a las 23:01

By this example, the whole file is a bunch of valid blocks. - user557597

End yet one related question. - yuliya

how can I write with perl an every text block to another text file? I mean if I have several blocks i would like to write them to another file (there could be 50 such blocks) - yuliya

Sorry, can't agree with this method anymore. This will not properly handle >2 empty lines and will not deal with lines that are "empty" but still contain whitespace, as the OP's regex indicates it might. Was a clever first thought though. - Joel Berger

use 5.012;

open my $fh,'<','1.txt';

#slurping file
local $/;
my $content = <$fh>;

close $fh;

for my $block ( split /(?<!\n)\n\n\n(?!\n)/,$content ) {
    say 'found:';
    say $block;
}

Respondido el 09 de enero de 11 a las 08:01

You are right to put in the negative look ahead / behind checks. Forgot the OP needed not to have 3 blank lines. - Joel Berger

Presumably only needs needs 5.012 or 5.010 for the say - just checking nothing Notre subtle going on. - justo a tiempo

Deprecated in favor of new answer

justintime's answer works by telling perl that you want to call the end of a line "\n\n", which is clever and will work well. One exception is that this must match exactly. By the regex you are using it makes it seem like there might be whitespace on the "empty" lines, in which case this will not work. Also his method will split even on more than 2 linebreaks, which was not allowed in the OP.

For completeness, to do it the way you were asking, you need to slurp the whole file into a variable (if the file is not so large as to use all your memory, probably fine in most cases).

I would then probably say to use the split function to split the block of text into an array of chunks. Your code would then look something like:

#!/usr/bin/perl

use strict;
use warnings;

my $file = 'yu';
my $text;

open(my $fh, '<', $file);
{
  local $/; enables slurp mode inside this block
  $text = <$fh>;
}
close($fh);

my @blocks = split( 
  /
  (?<!\n)\n #check to make sure there isn't another \n behind this one
  \s*\n #first whitespace only line
  \s*\n #second "
  (?!\n) #check to make sure there isn't another \n after this one
  /x, # x flag allows comments and whitespace in regex
  $text
);  

You can then do operations on the array. If I understand your comment to justintime's answer, you want to write each block out to a different file. That would look something like

my $file_num = 1;
foreach my $block (@blocks) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}

Notice that since you open $out lexically (with my) when it reaches the end of the foreach block, the $out variable dies (i.e. "goes out of scope"). When this happens to a lexical filehandle, the file is automatically closed. And you can do a similar thing to that with justintime's method as well:

local $/ = "\n\n" ;

my $file_num = 1;
while (<>) {
  open(my $out, '>', $file_num++ . ".txt");
  print $out $block;
}

Respondido el 10 de enero de 11 a las 18:01

@gangabass, you are right of course, I will switch it to the correct form which localizes $/ first. In my haste to post, I forgot you had to do that in scalar context, this would have worked if I had instead called in list context, but then would have had to join and split again. Corrected. - Joel Berger

Also I give up on trying to make the OP's regex work, replaced with mine. - Joel Berger

So I am having problems excluding three blank lines. Can anyone figure out why that is happening. Also I think to simplify it might be necessary to s/^\s*$//. - Joel Berger

I am leaving the post to show the method, and even to show some of the confusion and pitfalls it has. I have added a new method that seems to work much more efficiently, effectively and understandably - Joel Berger

No es la respuesta que estás buscando? Examinar otras preguntas etiquetadas or haz tu propia pregunta.