What do we expect out of "paragraph mode"?
We want to be able to answer questions like these:
How is paragraph mode documented in the the core distribution? In the Camel book?
Do we have tests for each kind of functionality so documented?
In tests in the core distribution (or other codebases surveyed, e.g., CPAN), what kind of "paragraphs" are found within the files we process in paragraph mode?
Can we imagine files whose "paragraphs" do not conform to, or are not limited to, those we are currently testing?
If so, does $/=""
DWIM on such files and "paragraphs"?
Of the following, pod/perlvar.pod is the most authoritative.
Used in code sample illustrating format
.
In documentation of chomp
built-in:
When in paragraph mode (C<$/ = ''>), [C<chomp>] removes all trailing newlines from the string.
Used in code samples explaining the \G assertion
Main documentation for $INPUT_RECORD_SEPARATOR
:
$INPUT_RECORD_SEPARATOR
$RS
$/ The input record separator, newline by default. This influences
Perl's idea of what a "line" is. Works like awk's RS variable,
including treating empty lines as a terminator if set to the
null string (an empty line cannot contain any spaces or tabs).
...
Setting to "" will treat two or more consecutive
empty lines as a single empty line.
...
However, in perlvar there are no specific examples of $/=''
and the discussion is heavily weighted toward $/=undef
(slurp mode).
Discussion in Programming Perl, 4th edition, p. 778 is very similar and, once again, contains no specific code samples for $/=''
.
Starting at line 1175:
How can I read in a file by paragraphs?
Use the $/ variable (see perlvar for details). You can either set it to
"" to eliminate empty paragraphs ("abc\n\n\n\ndef", for instance, gets
treated as two paragraphs and not three), or "\n\n" to accept empty
paragraphs.
Note that a blank line must have no blanks in it. Thus
"fred\n \nstuff\n\n" is one paragraph, but "fred\n\nstuff\n\n" is two.
Starting at line 81:
I'm having trouble matching over more than one line. What's wrong?
...
There are many ways to get multiline data into a string. If you want it
to happen automatically while reading input, you'll want to set $/
(probably to '' for paragraphs or "undef" for the whole file) to allow
you to read more than one line at a time.
Code samples starting at line 108:
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /\b([\w'-]+)(\s+\g1)+\b/gi ) { # word starts alpha
print "Duplicate $1 at paragraph $.\n";
}
}
...
$/ = ''; # read in whole paragraph, not just one line
while ( <> ) {
while ( /^From /gm ) { # /m makes ^ match next to \n
print "leading from in paragraph $.\n";
}
}
rs here means "record separator".
Starting at line 193:
# Does paragraph mode work?
$/ = '';
$bar = <FH>;
if ($bar ne "1234\n12345\n\n") {print "not ";}
print "ok $test_count # \$/ = ''\n";
$test_count++;
Believe it or not, that's the full extent of formal, explicit testing of paragraph mode in the core distribution.
Starting at line 61:
$_ = "f\n\n\n\n\n";
$/ = "";
$got = chomp();
is ($got, 5, 'check return value when chomp in paragraph mode on string ending with 5 newlines');
is ($_, "f", 'chomp in paragraph mode on string ending with 5 newlines');
$_ = "f\n\n";
$/ = "";
$got = chomp();
is ($got, 2, 'check return value when chomp in paragraph mode on string ending with 2 newlines');
is ($_, "f", 'chomp in paragraph mode on string ending with 2 newlines');
$_ = "f\n";
$/ = "";
$got = chomp();
is ($got, 1, 'check return value when chomp in paragraph mode on string ending with 1 newline');
is ($_, "f", 'chomp in paragraph mode on string ending with 1 newlines');
$_ = "f";
$/ = "";
$got = chomp();
is ($got, 0, 'check return value when chomp in paragraph mode on string ending with no newlines');
is ($_, "f", 'chomp in paragraph mode on string lacking trailing newlines');
Starting at line 56:
open my $readme, '<', '../README' or die "Opening README failed: $!";
# The copyright message is the first paragraph:
local $/ = '';
my $copyright_msg = <$readme>;
Starting at line 1011:
if ($Opts{glossary}) {
open(GLOS, '<', $Glossary) or die "Can't open $Glossary: $!";
}
...
$/ = '';
...
if ($Opts{glossary}) {
<GLOS>; # Skip the "DO NOT EDIT"
<GLOS>; # Skip the preamble
while (<GLOS>) {
process;
print CONFIG_POD;
}
...
Starting at line 239:
open(H, '<', "$file.html") ||
die "$0: error opening $file.html for input: $!\n";
$/ = "";
my @data = ();
while (<H>) {
last if m!<h1 id="NAME">NAME</h1>!;
$_ =~ s{href="#(.*)">}{
my $url = "$file/@{[anchorify(qq($1))]}.html" ;
$url = relativize_url( $url, "$file.html" )
if ( ! defined $Options{htmlroot} || $Options{htmlroot} eq '' );
"href=\"$url\">" ;
}egi;
push @data, $_;
}
close(H);
Starting at line 404:
sub splitpod {
my($pod, $poddir, $htmldir, $splitdirs) = @_;
my(@poddata, @filedata, @heads);
my($file, $i, $j, $prevsec, $section, $nextsec);
print "splitting $pod\n" if $verbose;
# read the file in paragraphs
$/ = "";
open(SPLITIN, '<', $pod) ||
die "$0: error opening $pod for input: $!\n";
@filedata = <SPLITIN>;
close(SPLITIN) ||
die "$0: error closing $pod: $!\n";
Starting at line 316:
local $/ = '';
local $_;
my $header;
my @headers;
my $for_item;
my $seen_body;
while (<POD_DIAG>) {
sub podset {
my ($pod, $file) = @_;
open my $fh, '<:raw', $file or my_die "Can't open file '$file' for $pod: $!";
...
seek $fh, 0, 0 or my_die "Can't rewind file '$file': $!";
local $/ = '';
while(<$fh>) {
tr/\015//d;
if (s/^=head1 (NAME)\s*/=head2 /) {
unhead1();
$OUT .= "\n\n=head2 ";
$_ = <$fh>;
# Remove svn keyword expansions from the Perl FAQ
s/ \(\$Revision: \d+ \$\)//g;
if ( /^\s*\Q$pod\E\b/ ) {
s/$pod\.pm/$pod/; # '.pm' in NAME !?
} else {
s/^/$pod, /;
}
}
elsif (s/^=head1 (.*)/=item $1/) {
unhead2();
$OUT .= "=over 4\n\n" unless $inhead1;
$inhead1 = 1;
$_ .= "\n";
}
elsif (s/^=head2 (.*)/=item $1/) {
unitem();
$OUT .= "=over 4\n\n" unless $inhead2;
$inhead2 = 1;
$_ .= "\n";
}
elsif (s/^=item ([^=].*)/$1/) {
next if $pod eq 'perldiag';
s/^\s*\*\s*$// && next;
s/^\s*\*\s*//;
s/\n/ /g;
s/\s+$//;
next if /^[\d.]+$/;
next if $pod eq 'perlmodlib' && /^ftp:/;
$OUT .= ", " if $initem;
$initem = 1;
s/\.$//;
s/^-X\b/-I<X>/;
}
else {
unhead1() if /^=cut\s*\n/;
next;
}
$OUT .= $_;
}
}
Starting at line 46:
for my $filename (@files) {
unless (open MOD, '<', $filename) {
warn "Couldn't open $filename: $!";
next;
}
my ($name, $thing);
my $foundit = 0;
{
local $/ = "";
while (<MOD>) {
next unless /^=head1 NAME/;
$foundit++;
last;
}
}
unless ($foundit) {
next if pod_for_module_has_head1_NAME($filename);
die "p5p-controlled module $filename missing =head1 NAME\n"
if $filename !~ m{^(dist/|cpan/)}n # under our direct control
&& $filename !~ m{/_[^/]+\z} # not private
&& $filename ne 'lib/meta_notation.pm' # no pod
&& $filename ne 'lib/overload/numbers.pm'; # no pod
warn "$filename missing =head1 NAME\n" unless $Quiet;
next;
}
my $title = <MOD>;
chomp $title;
close MOD
or die "Error closing $filename: $!";
($name, $thing) = split /\s+--?\s+/, $title, 2;
unless ($name and $thing) {
warn "$filename missing name\n" unless $name;
warn "$filename missing thing\n" unless $thing or $Quiet;
next;
}
$name =~ s/[^A-Za-z0-9_:\$<>].*//;
$name = $exceptions{$name} || $name;
$thing =~ s/^perl pragma to //i;
$thing = ucfirst $thing;
$title = "=item $name\n\n$thing\n\n";
if ($name =~ /[A-Z]/) {
push @mod, $title;
} else {
push @pragma, $title;
}
}
Starting at line 98:
sub pod_for_module_has_head1_NAME {
my ($filename) = @_;
(my $pod_file = $filename) =~ s/\.pm\z/.pod/ or return 0;
return 0 if !-e $pod_file;
open my $fh, '<', $pod_file
or die "Can't open $pod_file for reading: $!\n";
local $/ = '';
while (my $para = <$fh>) {
return 1 if $para =~ /\A=head1 NAME$/m;
}
return 0;
}
Applies to almost entire file.
Applies to almost entire file.
Tie-File does not implement paragraph mode:
"An undefined value is not permitted as a record separator. Perl's special "paragraph mode" semantics (à la $/ = ""
) are not emulated."
Starting at line 972:
foreach my $file (@files) {
open (my $fh, '<', $file) or die "cant open $file: $!\n";
$/ = "";
my @chunks = <$fh>;
print preamble (scalar @chunks);
foreach my $t (@chunks) {
print "\n\n=for gentest\n\n# chunk: $t=cut\n\n";
print OptreeCheck::gentest ($t);
}
}
Starting at line 168:
{
# [perl #35929] verify that works with $/ (i.e. test PerlIOScalar_unread)
my $s = <<'EOF';
line A
line B
a third line
EOF
open(F, '<', \$s) or die "Could not open string as a file";
local $/ = "";
my $ln = <F>;
close F;
is($ln, $s, "[perl #35929]");
}
Starting at line 8:
my @filedata;
{
local $/ = '';
@filedata = <DATA>;
}
Starting at line 79:
local *IN;
...
$file_is_ready = open(IN, 'eplist');
...
skip("cannot open file for reading: $!", 5) unless $file_is_ready;
my $file = do { local $/ = <IN> };
Starting at line 95:
$file_is_ready = open(IN, 'dasboot.bs');
ok( $file_is_ready, 'should have written a new .bs file' );
...
my $file = do { local $/ = <IN> };
Starting at line 129:
$file_is_ready = open(IN, 'dasboot.bs');
...
my $file = do { local $/ = <IN> };
Starting at line 697:
{
local $/ = ""; # paragraph mode
my $io = $UncompressClass->new($name);
is $., 0;
is $io->input_line_number, 0;
ok ! $io->eof;
my @lines = $io->getlines();
is $., 2;
is $io->input_line_number, 2;
ok $io->eof;
ok @lines == 2
or print "# Got " . scalar(@lines) . " lines, expected 2\n" ;
ok $lines[0] eq "This is an example\nof a paragraph\n\n\n"
or print "# $lines[0]\n";
ok $lines[1] eq "and a single line.\n\n";
}
Starting at line 882:
{
local $/ = ""; # paragraph mode
my $io = $UncompressClass->new($name);
ok ! $io->eof;
my @lines = $io->getlines;
is $., 2;
is $io->input_line_number, 2;
ok $io->eof;
ok @lines == 2
or print "# expected 2 lines, got " . scalar(@lines) . "\n";
ok $lines[0] eq "This is an example\nof a paragraph\n\n\n"
or print "# [$lines[0]]\n" ;
ok $lines[1] eq "and a single line.\n\n";
}
Starting at line 196:
{
local $/ = ""; # paragraph mode
my $io = $UncompressClass->new($name);
ok ! $io->eof;
my @lines = <$io>;
ok $io->eof;
ok @lines == 2
or print "# Got " . scalar(@lines) . " lines, expected 2\n" ;
ok $lines[0] eq "This is an example\nof a paragraph\n\n\n"
or print "# $lines[0]\n";
ok $lines[1] eq "and a single line.\n\n";
}
Starting at line 237:
{
local $/ = ""; # paragraph mode
my $io = $UncompressClass->new($name);
ok ! $io->eof;
my @lines = <$io>;
ok $io->eof;
ok @lines == 2
or print "# Got " . scalar(@lines) . " lines, expected 2\n" ;
ok $lines[0] eq "This is an example\nof a paragraph\n\n\n"
or print "# $lines[0]\n";
ok $lines[1] eq "and a single line.\n\n";
}
Starting at line 365:
{
local $/ = ""; # paragraph mode
my $io = $UncompressClass->new($name);
ok ! $io->eof;
my @lines = <$io>;
ok $io->eof;
ok @lines == 2
or print "# expected 2 lines, got " . scalar(@lines) . "\n";
ok $lines[0] eq "This is an example\nof a paragraph\n\n\n"
or print "# [$lines[0]]\n" ;
ok $lines[1] eq "and a single line.\n\n";
}
Starting at line 20:
$/ = q();
binmode(DATA, ":utf8") || die "can't binmode DATA to utf8: $!";
our @DATA = (
[ # paragraph 0
sub { die "there is no paragraph 0" }
],
{ # paragraph 1
OLD => { BYTES => 44, CHARS => 44, CHUNKS => 44, WORDS => 7, TABS => 3, LINES => 4 },
NEW => { BYTES => 44, CHARS => 44, CHUNKS => 44, WORDS => 7, TABS => 3, LINES => 4 },
},
{ # paragraph 2
OLD => { BYTES => 1766, CHARS => 1635, CHUNKS => 1507, WORDS => 275, TABS => 0, LINES => 2 },
NEW => { BYTES => 1766, CHARS => 1635, CHUNKS => 1507, WORDS => 275, TABS => 0, LINES => 24 },
},
{ # paragraph 3
OLD => { BYTES => 157, CHARS => 148, CHUNKS => 139, WORDS => 27, TABS => 0, LINES => 2 },
NEW => { BYTES => 157, CHARS => 148, CHUNKS => 139, WORDS => 27, TABS => 0, LINES => 3 },
},
{ # paragraph 4
OLD => { BYTES => 30, CHARS => 25, CHUNKS => 24, WORDS => 3, TABS => 4, LINES => 1 },
NEW => { BYTES => 30, CHARS => 25, CHUNKS => 24, WORDS => 3, TABS => 4, LINES => 1 },
},
);
Some CPAN distributions found in that search:
B-DeparseTree
BioPerl
CPAN-DistnameInfo
GnuPG
IO-Socket-SSL
IO-String
IO-stringy
Parse-RecDescent
PerlBench
Regexp-Grammars