Leaked source code of windows server 2003
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

1242 lines
40 KiB

  1. #############################################################################
  2. # Pod/Checker.pm -- check pod documents for syntax errors
  3. #
  4. # Copyright (C) 1994-2000 by Bradford Appleton. All rights reserved.
  5. # This file is part of "PodParser". PodParser is free software;
  6. # you can redistribute it and/or modify it under the same terms
  7. # as Perl itself.
  8. #############################################################################
  9. package Pod::Checker;
  10. use vars qw($VERSION);
  11. $VERSION = 1.2; ## Current version of this package
  12. require 5.005; ## requires this Perl version or later
  13. use Pod::ParseUtils; ## for hyperlinks and lists
  14. =head1 NAME
  15. Pod::Checker, podchecker() - check pod documents for syntax errors
  16. =head1 SYNOPSIS
  17. use Pod::Checker;
  18. $syntax_okay = podchecker($filepath, $outputpath, %options);
  19. my $checker = new Pod::Checker %options;
  20. $checker->parse_from_file($filepath, \*STDERR);
  21. =head1 OPTIONS/ARGUMENTS
  22. C<$filepath> is the input POD to read and C<$outputpath> is
  23. where to write POD syntax error messages. Either argument may be a scalar
  24. indicating a file-path, or else a reference to an open filehandle.
  25. If unspecified, the input-file it defaults to C<\*STDIN>, and
  26. the output-file defaults to C<\*STDERR>.
  27. =head2 podchecker()
  28. This function can take a hash of options:
  29. =over 4
  30. =item B<-warnings> =E<gt> I<val>
  31. Turn warnings on/off. I<val> is usually 1 for on, but higher values
  32. trigger additional warnings. See L<"Warnings">.
  33. =back
  34. =head1 DESCRIPTION
  35. B<podchecker> will perform syntax checking of Perl5 POD format documentation.
  36. I<NOTE THAT THIS MODULE IS CURRENTLY IN THE BETA STAGE!>
  37. It is hoped that curious/ambitious user will help flesh out and add the
  38. additional features they wish to see in B<Pod::Checker> and B<podchecker>
  39. and verify that the checks are consistent with L<perlpod>.
  40. The following checks are currently preformed:
  41. =over 4
  42. =item *
  43. Unknown '=xxxx' commands, unknown 'XE<lt>...E<gt>' interior-sequences,
  44. and unterminated interior sequences.
  45. =item *
  46. Check for proper balancing of C<=begin> and C<=end>. The contents of such
  47. a block are generally ignored, i.e. no syntax checks are performed.
  48. =item *
  49. Check for proper nesting and balancing of C<=over>, C<=item> and C<=back>.
  50. =item *
  51. Check for same nested interior-sequences (e.g.
  52. C<LE<lt>...LE<lt>...E<gt>...E<gt>>).
  53. =item *
  54. Check for malformed or nonexisting entities C<EE<lt>...E<gt>>.
  55. =item *
  56. Check for correct syntax of hyperlinks C<LE<lt>...E<gt>>. See L<perlpod>
  57. for details.
  58. =item *
  59. Check for unresolved document-internal links. This check may also reveal
  60. misspelled links that seem to be internal links but should be links
  61. to something else.
  62. =back
  63. =head1 DIAGNOSTICS
  64. =head2 Errors
  65. =over 4
  66. =item * empty =headn
  67. A heading (C<=head1> or C<=head2>) without any text? That ain't no
  68. heading!
  69. =item * =over on line I<N> without closing =back
  70. The C<=over> command does not have a corresponding C<=back> before the
  71. next heading (C<=head1> or C<=head2>) or the end of the file.
  72. =item * =item without previous =over
  73. =item * =back without previous =over
  74. An C<=item> or C<=back> command has been found outside a
  75. C<=over>/C<=back> block.
  76. =item * No argument for =begin
  77. A C<=begin> command was found that is not followed by the formatter
  78. specification.
  79. =item * =end without =begin
  80. A standalone C<=end> command was found.
  81. =item * Nested =begin's
  82. There were at least two consecutive C<=begin> commands without
  83. the corresponding C<=end>. Only one C<=begin> may be active at
  84. a time.
  85. =item * =for without formatter specification
  86. There is no specification of the formatter after the C<=for> command.
  87. =item * unresolved internal link I<NAME>
  88. The given link to I<NAME> does not have a matching node in the current
  89. POD. This also happend when a single word node name is not enclosed in
  90. C<"">.
  91. =item * Unknown command "I<CMD>"
  92. An invalid POD command has been found. Valid are C<=head1>, C<=head2>,
  93. C<=over>, C<=item>, C<=back>, C<=begin>, C<=end>, C<=for>, C<=pod>,
  94. C<=cut>
  95. =item * Unknown interior-sequence "I<SEQ>"
  96. An invalid markup command has been encountered. Valid are:
  97. C<BE<lt>E<gt>>, C<CE<lt>E<gt>>, C<EE<lt>E<gt>>, C<FE<lt>E<gt>>,
  98. C<IE<lt>E<gt>>, C<LE<lt>E<gt>>, C<SE<lt>E<gt>>, C<XE<lt>E<gt>>,
  99. C<ZE<lt>E<gt>>
  100. =item * nested commands I<CMD>E<lt>...I<CMD>E<lt>...E<gt>...E<gt>
  101. Two nested identical markup commands have been found. Generally this
  102. does not make sense.
  103. =item * garbled entity I<STRING>
  104. The I<STRING> found cannot be interpreted as a character entity.
  105. =item * Entity number out of range
  106. An entity specified by number (dec, hex, oct) is out of range (1-255).
  107. =item * malformed link LE<lt>E<gt>
  108. The link found cannot be parsed because it does not conform to the
  109. syntax described in L<perlpod>.
  110. =item * nonempty ZE<lt>E<gt>
  111. The C<ZE<lt>E<gt>> sequence is supposed to be empty.
  112. =item * empty XE<lt>E<gt>
  113. The index entry specified contains nothing but whitespace.
  114. =item * Spurious text after =pod / =cut
  115. The commands C<=pod> and C<=cut> do not take any arguments.
  116. =item * Spurious character(s) after =back
  117. The C<=back> command does not take any arguments.
  118. =back
  119. =head2 Warnings
  120. These may not necessarily cause trouble, but indicate mediocre style.
  121. =over 4
  122. =item * multiple occurence of link target I<name>
  123. The POD file has some C<=item> and/or C<=head> commands that have
  124. the same text. Potential hyperlinks to such a text cannot be unique then.
  125. =item * line containing nothing but whitespace in paragraph
  126. There is some whitespace on a seemingly empty line. POD is very sensitive
  127. to such things, so this is flagged. B<vi> users switch on the B<list>
  128. option to avoid this problem.
  129. =begin _disabled_
  130. =item * file does not start with =head
  131. The file starts with a different POD directive than head.
  132. This is most probably something you do not want.
  133. =end _disabled_
  134. =item * previous =item has no contents
  135. There is a list C<=item> right above the flagged line that has no
  136. text contents. You probably want to delete empty items.
  137. =item * preceding non-item paragraph(s)
  138. A list introduced by C<=over> starts with a text or verbatim paragraph,
  139. but continues with C<=item>s. Move the non-item paragraph out of the
  140. C<=over>/C<=back> block.
  141. =item * =item type mismatch (I<one> vs. I<two>)
  142. A list started with e.g. a bulletted C<=item> and continued with a
  143. numbered one. This is obviously inconsistent. For most translators the
  144. type of the I<first> C<=item> determines the type of the list.
  145. =item * I<N> unescaped C<E<lt>E<gt>> in paragraph
  146. Angle brackets not written as C<E<lt>ltE<gt>> and C<E<lt>gtE<gt>>
  147. can potentially cause errors as they could be misinterpreted as
  148. markup commands. This is only printed when the -warnings level is
  149. greater than 1.
  150. =item * Unknown entity
  151. A character entity was found that does not belong to the standard
  152. ISO set or the POD specials C<verbar> and C<sol>.
  153. =item * No items in =over
  154. The list opened with C<=over> does not contain any items.
  155. =item * No argument for =item
  156. C<=item> without any parameters is deprecated. It should either be followed
  157. by C<*> to indicate an unordered list, by a number (optionally followed
  158. by a dot) to indicate an ordered (numbered) list or simple text for a
  159. definition list.
  160. =item * empty section in previous paragraph
  161. The previous section (introduced by a C<=head> command) does not contain
  162. any text. This usually indicates that something is missing. Note: A
  163. C<=head1> followed immediately by C<=head2> does not trigger this warning.
  164. =item * Verbatim paragraph in NAME section
  165. The NAME section (C<=head1 NAME>) should consist of a single paragraph
  166. with the script/module name, followed by a dash `-' and a very short
  167. description of what the thing is good for.
  168. =back
  169. =head2 Hyperlinks
  170. There are some warnings wrt. malformed hyperlinks.
  171. =over 4
  172. =item * ignoring leading/trailing whitespace in link
  173. There is whitespace at the beginning or the end of the contents of
  174. LE<lt>...E<gt>.
  175. =item * (section) in '$page' deprecated
  176. There is a section detected in the page name of LE<lt>...E<gt>, e.g.
  177. C<LE<gt>passwd(2)E<gt>>. POD hyperlinks may point to POD documents only.
  178. Please write C<CE<lt>passwd(2)E<gt>> instead. Some formatters are able
  179. to expand this to appropriate code. For links to (builtin) functions,
  180. please say C<LE<lt>perlfunc/mkdirE<gt>>, without ().
  181. =item * alternative text/node '%s' contains non-escaped | or /
  182. The characters C<|> and C</> are special in the LE<lt>...E<gt> context.
  183. Although the hyperlink parser does its best to determine which "/" is
  184. text and which is a delimiter in case of doubt, one ought to escape
  185. these literal characters like this:
  186. / E<sol>
  187. | E<verbar>
  188. =back
  189. =head1 RETURN VALUE
  190. B<podchecker> returns the number of POD syntax errors found or -1 if
  191. there were no POD commands at all found in the file.
  192. =head1 EXAMPLES
  193. I<[T.B.D.]>
  194. =head1 INTERFACE
  195. While checking, this module collects document properties, e.g. the nodes
  196. for hyperlinks (C<=headX>, C<=item>) and index entries (C<XE<lt>E<gt>>).
  197. POD translators can use this feature to syntax-check and get the nodes in
  198. a first pass before actually starting to convert. This is expensive in terms
  199. of execution time, but allows for very robust conversions.
  200. =cut
  201. #############################################################################
  202. use strict;
  203. #use diagnostics;
  204. use Carp;
  205. use Exporter;
  206. use Pod::Parser;
  207. use vars qw(@ISA @EXPORT);
  208. @ISA = qw(Pod::Parser);
  209. @EXPORT = qw(&podchecker);
  210. use vars qw(%VALID_COMMANDS %VALID_SEQUENCES);
  211. my %VALID_COMMANDS = (
  212. 'pod' => 1,
  213. 'cut' => 1,
  214. 'head1' => 1,
  215. 'head2' => 1,
  216. 'over' => 1,
  217. 'back' => 1,
  218. 'item' => 1,
  219. 'for' => 1,
  220. 'begin' => 1,
  221. 'end' => 1,
  222. );
  223. my %VALID_SEQUENCES = (
  224. 'I' => 1,
  225. 'B' => 1,
  226. 'S' => 1,
  227. 'C' => 1,
  228. 'L' => 1,
  229. 'F' => 1,
  230. 'X' => 1,
  231. 'Z' => 1,
  232. 'E' => 1,
  233. );
  234. # stolen from HTML::Entities
  235. my %ENTITIES = (
  236. # Some normal chars that have special meaning in SGML context
  237. amp => '&', # ampersand
  238. 'gt' => '>', # greater than
  239. 'lt' => '<', # less than
  240. quot => '"', # double quote
  241. # PUBLIC ISO 8879-1986//ENTITIES Added Latin 1//EN//HTML
  242. AElig => '�', # capital AE diphthong (ligature)
  243. Aacute => '�', # capital A, acute accent
  244. Acirc => '�', # capital A, circumflex accent
  245. Agrave => '�', # capital A, grave accent
  246. Aring => '�', # capital A, ring
  247. Atilde => '�', # capital A, tilde
  248. Auml => '�', # capital A, dieresis or umlaut mark
  249. Ccedil => '�', # capital C, cedilla
  250. ETH => '�', # capital Eth, Icelandic
  251. Eacute => '�', # capital E, acute accent
  252. Ecirc => '�', # capital E, circumflex accent
  253. Egrave => '�', # capital E, grave accent
  254. Euml => '�', # capital E, dieresis or umlaut mark
  255. Iacute => '�', # capital I, acute accent
  256. Icirc => '�', # capital I, circumflex accent
  257. Igrave => '�', # capital I, grave accent
  258. Iuml => '�', # capital I, dieresis or umlaut mark
  259. Ntilde => '�', # capital N, tilde
  260. Oacute => '�', # capital O, acute accent
  261. Ocirc => '�', # capital O, circumflex accent
  262. Ograve => '�', # capital O, grave accent
  263. Oslash => '�', # capital O, slash
  264. Otilde => '�', # capital O, tilde
  265. Ouml => '�', # capital O, dieresis or umlaut mark
  266. THORN => '�', # capital THORN, Icelandic
  267. Uacute => '�', # capital U, acute accent
  268. Ucirc => '�', # capital U, circumflex accent
  269. Ugrave => '�', # capital U, grave accent
  270. Uuml => '�', # capital U, dieresis or umlaut mark
  271. Yacute => '�', # capital Y, acute accent
  272. aacute => '�', # small a, acute accent
  273. acirc => '�', # small a, circumflex accent
  274. aelig => '�', # small ae diphthong (ligature)
  275. agrave => '�', # small a, grave accent
  276. aring => '�', # small a, ring
  277. atilde => '�', # small a, tilde
  278. auml => '�', # small a, dieresis or umlaut mark
  279. ccedil => '�', # small c, cedilla
  280. eacute => '�', # small e, acute accent
  281. ecirc => '�', # small e, circumflex accent
  282. egrave => '�', # small e, grave accent
  283. eth => '�', # small eth, Icelandic
  284. euml => '�', # small e, dieresis or umlaut mark
  285. iacute => '�', # small i, acute accent
  286. icirc => '�', # small i, circumflex accent
  287. igrave => '�', # small i, grave accent
  288. iuml => '�', # small i, dieresis or umlaut mark
  289. ntilde => '�', # small n, tilde
  290. oacute => '�', # small o, acute accent
  291. ocirc => '�', # small o, circumflex accent
  292. ograve => '�', # small o, grave accent
  293. oslash => '�', # small o, slash
  294. otilde => '�', # small o, tilde
  295. ouml => '�', # small o, dieresis or umlaut mark
  296. szlig => '�', # small sharp s, German (sz ligature)
  297. thorn => '�', # small thorn, Icelandic
  298. uacute => '�', # small u, acute accent
  299. ucirc => '�', # small u, circumflex accent
  300. ugrave => '�', # small u, grave accent
  301. uuml => '�', # small u, dieresis or umlaut mark
  302. yacute => '�', # small y, acute accent
  303. yuml => '�', # small y, dieresis or umlaut mark
  304. # Some extra Latin 1 chars that are listed in the HTML3.2 draft (21-May-96)
  305. copy => '�', # copyright sign
  306. reg => '�', # registered sign
  307. nbsp => "\240", # non breaking space
  308. # Additional ISO-8859/1 entities listed in rfc1866 (section 14)
  309. iexcl => '�',
  310. cent => '�',
  311. pound => '�',
  312. curren => '�',
  313. yen => '�',
  314. brvbar => '�',
  315. sect => '�',
  316. uml => '�',
  317. ordf => '�',
  318. laquo => '�',
  319. 'not' => '�', # not is a keyword in perl
  320. shy => '�',
  321. macr => '�',
  322. deg => '�',
  323. plusmn => '�',
  324. sup1 => '�',
  325. sup2 => '�',
  326. sup3 => '�',
  327. acute => '�',
  328. micro => '�',
  329. para => '�',
  330. middot => '�',
  331. cedil => '�',
  332. ordm => '�',
  333. raquo => '�',
  334. frac14 => '�',
  335. frac12 => '�',
  336. frac34 => '�',
  337. iquest => '�',
  338. 'times' => '�', # times is a keyword in perl
  339. divide => '�',
  340. # some POD special entities
  341. verbar => '|',
  342. sol => '/'
  343. );
  344. ##---------------------------------------------------------------------------
  345. ##---------------------------------
  346. ## Function definitions begin here
  347. ##---------------------------------
  348. sub podchecker( $ ; $ % ) {
  349. my ($infile, $outfile, %options) = @_;
  350. local $_;
  351. ## Set defaults
  352. $infile ||= \*STDIN;
  353. $outfile ||= \*STDERR;
  354. ## Now create a pod checker
  355. my $checker = new Pod::Checker(%options);
  356. ## Now check the pod document for errors
  357. $checker->parse_from_file($infile, $outfile);
  358. ## Return the number of errors found
  359. return $checker->num_errors();
  360. }
  361. ##---------------------------------------------------------------------------
  362. ##-------------------------------
  363. ## Method definitions begin here
  364. ##-------------------------------
  365. ##################################
  366. =over 4
  367. =item C<Pod::Checker-E<gt>new( %options )>
  368. Return a reference to a new Pod::Checker object that inherits from
  369. Pod::Parser and is used for calling the required methods later. The
  370. following options are recognized:
  371. C<-warnings =E<gt> num>
  372. Print warnings if C<num> is true. The higher the value of C<num>,
  373. the more warnings are printed. Currently there are only levels 1 and 2.
  374. C<-quiet =E<gt> num>
  375. If C<num> is true, do not print any errors/warnings. This is useful
  376. when Pod::Checker is used to munge POD code into plain text from within
  377. POD formatters.
  378. =cut
  379. ## sub new {
  380. ## my $this = shift;
  381. ## my $class = ref($this) || $this;
  382. ## my %params = @_;
  383. ## my $self = {%params};
  384. ## bless $self, $class;
  385. ## $self->initialize();
  386. ## return $self;
  387. ## }
  388. sub initialize {
  389. my $self = shift;
  390. ## Initialize number of errors, and setup an error function to
  391. ## increment this number and then print to the designated output.
  392. $self->{_NUM_ERRORS} = 0;
  393. $self->{-quiet} ||= 0;
  394. # set the error handling subroutine
  395. $self->errorsub($self->{-quiet} ? sub { 1; } : 'poderror');
  396. $self->{_commands} = 0; # total number of POD commands encountered
  397. $self->{_list_stack} = []; # stack for nested lists
  398. $self->{_have_begin} = ''; # stores =begin
  399. $self->{_links} = []; # stack for internal hyperlinks
  400. $self->{_nodes} = []; # stack for =head/=item nodes
  401. $self->{_index} = []; # text in X<>
  402. # print warnings?
  403. $self->{-warnings} = 1 unless(defined $self->{-warnings});
  404. $self->{_current_head1} = ''; # the current =head1 block
  405. $self->parseopts(-process_cut_cmd => 1, -warnings => $self->{-warnings});
  406. }
  407. ##################################
  408. =item C<$checker-E<gt>poderror( @args )>
  409. =item C<$checker-E<gt>poderror( {%opts}, @args )>
  410. Internal method for printing errors and warnings. If no options are
  411. given, simply prints "@_". The following options are recognized and used
  412. to form the output:
  413. -msg
  414. A message to print prior to C<@args>.
  415. -line
  416. The line number the error occurred in.
  417. -file
  418. The file (name) the error occurred in.
  419. -severity
  420. The error level, should be 'WARNING' or 'ERROR'.
  421. =cut
  422. # Invoked as $self->poderror( @args ), or $self->poderror( {%opts}, @args )
  423. sub poderror {
  424. my $self = shift;
  425. my %opts = (ref $_[0]) ? %{shift()} : ();
  426. ## Retrieve options
  427. chomp( my $msg = ($opts{-msg} || "")."@_" );
  428. my $line = (exists $opts{-line}) ? " at line $opts{-line}" : "";
  429. my $file = (exists $opts{-file}) ? " in file $opts{-file}" : "";
  430. unless (exists $opts{-severity}) {
  431. ## See if can find severity in message prefix
  432. $opts{-severity} = $1 if ( $msg =~ s/^\**\s*([A-Z]{3,}):\s+// );
  433. }
  434. my $severity = (exists $opts{-severity}) ? "*** $opts{-severity}: " : "";
  435. ## Increment error count and print message "
  436. ++($self->{_NUM_ERRORS})
  437. if(!%opts || ($opts{-severity} && $opts{-severity} eq 'ERROR'));
  438. my $out_fh = $self->output_handle() || \*STDERR;
  439. print $out_fh ($severity, $msg, $line, $file, "\n")
  440. if($self->{-warnings} || !%opts || $opts{-severity} ne 'WARNING');
  441. }
  442. ##################################
  443. =item C<$checker-E<gt>num_errors()>
  444. Set (if argument specified) and retrieve the number of errors found.
  445. =cut
  446. sub num_errors {
  447. return (@_ > 1) ? ($_[0]->{_NUM_ERRORS} = $_[1]) : $_[0]->{_NUM_ERRORS};
  448. }
  449. ##################################
  450. =item C<$checker-E<gt>name()>
  451. Set (if argument specified) and retrieve the canonical name of POD as
  452. found in the C<=head1 NAME> section.
  453. =cut
  454. sub name {
  455. return (@_ > 1 && $_[1]) ?
  456. ($_[0]->{-name} = $_[1]) : $_[0]->{-name};
  457. }
  458. ##################################
  459. =item C<$checker-E<gt>node()>
  460. Add (if argument specified) and retrieve the nodes (as defined by C<=headX>
  461. and C<=item>) of the current POD. The nodes are returned in the order of
  462. their occurence. They consist of plain text, each piece of whitespace is
  463. collapsed to a single blank.
  464. =cut
  465. sub node {
  466. my ($self,$text) = @_;
  467. if(defined $text) {
  468. $text =~ s/\s+$//s; # strip trailing whitespace
  469. $text =~ s/\s+/ /gs; # collapse whitespace
  470. # add node, order important!
  471. push(@{$self->{_nodes}}, $text);
  472. # keep also a uniqueness counter
  473. $self->{_unique_nodes}->{$text}++ if($text !~ /^\s*$/s);
  474. return $text;
  475. }
  476. @{$self->{_nodes}};
  477. }
  478. ##################################
  479. =item C<$checker-E<gt>idx()>
  480. Add (if argument specified) and retrieve the index entries (as defined by
  481. C<XE<lt>E<gt>>) of the current POD. They consist of plain text, each piece
  482. of whitespace is collapsed to a single blank.
  483. =cut
  484. # set/return index entries of current POD
  485. sub idx {
  486. my ($self,$text) = @_;
  487. if(defined $text) {
  488. $text =~ s/\s+$//s; # strip trailing whitespace
  489. $text =~ s/\s+/ /gs; # collapse whitespace
  490. # add node, order important!
  491. push(@{$self->{_index}}, $text);
  492. # keep also a uniqueness counter
  493. $self->{_unique_nodes}->{$text}++ if($text !~ /^\s*$/s);
  494. return $text;
  495. }
  496. @{$self->{_index}};
  497. }
  498. ##################################
  499. =item C<$checker-E<gt>hyperlink()>
  500. Add (if argument specified) and retrieve the hyperlinks (as defined by
  501. C<LE<lt>E<gt>>) of the current POD. They consist of an 2-item array: line
  502. number and C<Pod::Hyperlink> object.
  503. =back
  504. =cut
  505. # set/return hyperlinks of the current POD
  506. sub hyperlink {
  507. my $self = shift;
  508. if($_[0]) {
  509. push(@{$self->{_links}}, $_[0]);
  510. return $_[0];
  511. }
  512. @{$self->{_links}};
  513. }
  514. ## overrides for Pod::Parser
  515. sub end_pod {
  516. ## Do some final checks and
  517. ## print the number of errors found
  518. my $self = shift;
  519. my $infile = $self->input_file();
  520. my $out_fh = $self->output_handle();
  521. if(@{$self->{_list_stack}}) {
  522. # _TODO_ display, but don't count them for now
  523. my $list;
  524. while(($list = $self->_close_list('EOF',$infile)) &&
  525. $list->indent() ne 'auto') {
  526. $self->poderror({ -line => 'EOF', -file => $infile,
  527. -severity => 'ERROR', -msg => "=over on line " .
  528. $list->start() . " without closing =back" }); #"
  529. }
  530. }
  531. # check validity of document internal hyperlinks
  532. # first build the node names from the paragraph text
  533. my %nodes;
  534. foreach($self->node()) {
  535. $nodes{$_} = 1;
  536. if(/^(\S+)\s+\S/) {
  537. # we have more than one word. Use the first as a node, too.
  538. # This is used heavily in perlfunc.pod
  539. $nodes{$1} ||= 2; # derived node
  540. }
  541. }
  542. foreach($self->idx()) {
  543. $nodes{$_} = 3; # index node
  544. }
  545. foreach($self->hyperlink()) {
  546. my ($line,$link) = @$_;
  547. # _TODO_ what if there is a link to the page itself by the name,
  548. # e.g. in Tk::Pod : L<Tk::Pod/"DESCRIPTION">
  549. if($link->node() && !$link->page() && $link->type() ne 'hyperlink') {
  550. my $node = $self->_check_ptree($self->parse_text($link->node(),
  551. $line), $line, $infile, 'L');
  552. if($node && !$nodes{$node}) {
  553. $self->poderror({ -line => $line || '', -file => $infile,
  554. -severity => 'ERROR',
  555. -msg => "unresolved internal link '$node'"});
  556. }
  557. }
  558. }
  559. # check the internal nodes for uniqueness. This pertains to
  560. # =headX, =item and X<...>
  561. foreach(grep($self->{_unique_nodes}->{$_} > 1,
  562. keys %{$self->{_unique_nodes}})) {
  563. $self->poderror({ -line => '-', -file => $infile,
  564. -severity => 'WARNING',
  565. -msg => "multiple occurence of link target '$_'"});
  566. }
  567. ## Print the number of errors found
  568. my $num_errors = $self->num_errors();
  569. if ($num_errors > 0) {
  570. printf $out_fh ("$infile has $num_errors pod syntax %s.\n",
  571. ($num_errors == 1) ? "error" : "errors");
  572. }
  573. elsif($self->{_commands} == 0) {
  574. print $out_fh "$infile does not contain any pod commands.\n";
  575. $self->num_errors(-1);
  576. }
  577. else {
  578. print $out_fh "$infile pod syntax OK.\n";
  579. }
  580. }
  581. # check a POD command directive
  582. sub command {
  583. my ($self, $cmd, $paragraph, $line_num, $pod_para) = @_;
  584. my ($file, $line) = $pod_para->file_line;
  585. ## Check the command syntax
  586. my $arg; # this will hold the command argument
  587. if (! $VALID_COMMANDS{$cmd}) {
  588. $self->poderror({ -line => $line, -file => $file, -severity => 'ERROR',
  589. -msg => "Unknown command '$cmd'" });
  590. }
  591. else { # found a valid command
  592. $self->{_commands}++; # delete this line if below is enabled again
  593. ##### following check disabled due to strong request
  594. #if(!$self->{_commands}++ && $cmd !~ /^head/) {
  595. # $self->poderror({ -line => $line, -file => $file,
  596. # -severity => 'WARNING',
  597. # -msg => "file does not start with =head" });
  598. #}
  599. # check syntax of particular command
  600. if($cmd eq 'over') {
  601. # check for argument
  602. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  603. my $indent = 4; # default
  604. if($arg && $arg =~ /^\s*(\d+)\s*$/) {
  605. $indent = $1;
  606. }
  607. # start a new list
  608. $self->_open_list($indent,$line,$file);
  609. }
  610. elsif($cmd eq 'item') {
  611. # are we in a list?
  612. unless(@{$self->{_list_stack}}) {
  613. $self->poderror({ -line => $line, -file => $file,
  614. -severity => 'ERROR',
  615. -msg => "=item without previous =over" });
  616. # auto-open in case we encounter many more
  617. $self->_open_list('auto',$line,$file);
  618. }
  619. my $list = $self->{_list_stack}->[0];
  620. # check whether the previous item had some contents
  621. if(defined $self->{_list_item_contents} &&
  622. $self->{_list_item_contents} == 0) {
  623. $self->poderror({ -line => $line, -file => $file,
  624. -severity => 'WARNING',
  625. -msg => "previous =item has no contents" });
  626. }
  627. if($list->{_has_par}) {
  628. $self->poderror({ -line => $line, -file => $file,
  629. -severity => 'WARNING',
  630. -msg => "preceding non-item paragraph(s)" });
  631. delete $list->{_has_par};
  632. }
  633. # check for argument
  634. $arg = $self->interpolate_and_check($paragraph, $line, $file);
  635. if($arg && $arg =~ /(\S+)/) {
  636. $arg =~ s/[\s\n]+$//;
  637. my $type;
  638. if($arg =~ /^[*]\s*(\S*.*)/) {
  639. $type = 'bullet';
  640. $self->{_list_item_contents} = $1 ? 1 : 0;
  641. $arg = $1;
  642. }
  643. elsif($arg =~ /^\d+\.?\s*(\S*)/) {
  644. $type = 'number';
  645. $self->{_list_item_contents} = $1 ? 1 : 0;
  646. $arg = $1;
  647. }
  648. else {
  649. $type = 'definition';
  650. $self->{_list_item_contents} = 1;
  651. }
  652. my $first = $list->type();
  653. if($first && $first ne $type) {
  654. $self->poderror({ -line => $line, -file => $file,
  655. -severity => 'WARNING',
  656. -msg => "=item type mismatch ('$first' vs. '$type')"});
  657. }
  658. else { # first item
  659. $list->type($type);
  660. }
  661. }
  662. else {
  663. $self->poderror({ -line => $line, -file => $file,
  664. -severity => 'WARNING',
  665. -msg => "No argument for =item" });
  666. $arg = ' '; # empty
  667. $self->{_list_item_contents} = 0;
  668. }
  669. # add this item
  670. $list->item($arg);
  671. # remember this node
  672. $self->node($arg);
  673. }
  674. elsif($cmd eq 'back') {
  675. # check if we have an open list
  676. unless(@{$self->{_list_stack}}) {
  677. $self->poderror({ -line => $line, -file => $file,
  678. -severity => 'ERROR',
  679. -msg => "=back without previous =over" });
  680. }
  681. else {
  682. # check for spurious characters
  683. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  684. if($arg && $arg =~ /\S/) {
  685. $self->poderror({ -line => $line, -file => $file,
  686. -severity => 'ERROR',
  687. -msg => "Spurious character(s) after =back" });
  688. }
  689. # close list
  690. my $list = $self->_close_list($line,$file);
  691. # check for empty lists
  692. if(!$list->item() && $self->{-warnings}) {
  693. $self->poderror({ -line => $line, -file => $file,
  694. -severity => 'WARNING',
  695. -msg => "No items in =over (at line " .
  696. $list->start() . ") / =back list"}); #"
  697. }
  698. }
  699. }
  700. elsif($cmd =~ /^head(\d+)/) {
  701. # check whether the previous =head section had some contents
  702. if(defined $self->{_commands_in_head} &&
  703. $self->{_commands_in_head} == 0 &&
  704. defined $self->{_last_head} &&
  705. $self->{_last_head} >= $1) {
  706. $self->poderror({ -line => $line, -file => $file,
  707. -severity => 'WARNING',
  708. -msg => "empty section in previous paragraph"});
  709. }
  710. $self->{_commands_in_head} = -1;
  711. $self->{_last_head} = $1;
  712. # check if there is an open list
  713. if(@{$self->{_list_stack}}) {
  714. my $list;
  715. while(($list = $self->_close_list($line,$file)) &&
  716. $list->indent() ne 'auto') {
  717. $self->poderror({ -line => $line, -file => $file,
  718. -severity => 'ERROR',
  719. -msg => "=over on line ". $list->start() .
  720. " without closing =back (at $cmd)" });
  721. }
  722. }
  723. # remember this node
  724. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  725. $arg =~ s/[\s\n]+$//s;
  726. $self->node($arg);
  727. unless(length($arg)) {
  728. $self->poderror({ -line => $line, -file => $file,
  729. -severity => 'ERROR',
  730. -msg => "empty =$cmd"});
  731. }
  732. if($cmd eq 'head1') {
  733. $self->{_current_head1} = $arg;
  734. } else {
  735. $self->{_current_head1} = '';
  736. }
  737. }
  738. elsif($cmd eq 'begin') {
  739. if($self->{_have_begin}) {
  740. # already have a begin
  741. $self->poderror({ -line => $line, -file => $file,
  742. -severity => 'ERROR',
  743. -msg => "Nested =begin's (first at line " .
  744. $self->{_have_begin} . ")"});
  745. }
  746. else {
  747. # check for argument
  748. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  749. unless($arg && $arg =~ /(\S+)/) {
  750. $self->poderror({ -line => $line, -file => $file,
  751. -severity => 'ERROR',
  752. -msg => "No argument for =begin"});
  753. }
  754. # remember the =begin
  755. $self->{_have_begin} = "$line:$1";
  756. }
  757. }
  758. elsif($cmd eq 'end') {
  759. if($self->{_have_begin}) {
  760. # close the existing =begin
  761. $self->{_have_begin} = '';
  762. # check for spurious characters
  763. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  764. # the closing argument is optional
  765. #if($arg && $arg =~ /\S/) {
  766. # $self->poderror({ -line => $line, -file => $file,
  767. # -severity => 'WARNING',
  768. # -msg => "Spurious character(s) after =end" });
  769. #}
  770. }
  771. else {
  772. # don't have a matching =begin
  773. $self->poderror({ -line => $line, -file => $file,
  774. -severity => 'ERROR',
  775. -msg => "=end without =begin" });
  776. }
  777. }
  778. elsif($cmd eq 'for') {
  779. unless($paragraph =~ /\s*(\S+)\s*/) {
  780. $self->poderror({ -line => $line, -file => $file,
  781. -severity => 'ERROR',
  782. -msg => "=for without formatter specification" });
  783. }
  784. $arg = ''; # do not expand paragraph below
  785. }
  786. elsif($cmd =~ /^(pod|cut)$/) {
  787. # check for argument
  788. $arg = $self->interpolate_and_check($paragraph, $line,$file);
  789. if($arg && $arg =~ /(\S+)/) {
  790. $self->poderror({ -line => $line, -file => $file,
  791. -severity => 'ERROR',
  792. -msg => "Spurious text after =$cmd"});
  793. }
  794. }
  795. $self->{_commands_in_head}++;
  796. ## Check the interior sequences in the command-text
  797. $self->interpolate_and_check($paragraph, $line,$file)
  798. unless(defined $arg);
  799. }
  800. }
  801. sub _open_list
  802. {
  803. my ($self,$indent,$line,$file) = @_;
  804. my $list = Pod::List->new(
  805. -indent => $indent,
  806. -start => $line,
  807. -file => $file);
  808. unshift(@{$self->{_list_stack}}, $list);
  809. undef $self->{_list_item_contents};
  810. $list;
  811. }
  812. sub _close_list
  813. {
  814. my ($self,$line,$file) = @_;
  815. my $list = shift(@{$self->{_list_stack}});
  816. if(defined $self->{_list_item_contents} &&
  817. $self->{_list_item_contents} == 0) {
  818. $self->poderror({ -line => $line, -file => $file,
  819. -severity => 'WARNING',
  820. -msg => "previous =item has no contents" });
  821. }
  822. undef $self->{_list_item_contents};
  823. $list;
  824. }
  825. # process a block of some text
  826. sub interpolate_and_check {
  827. my ($self, $paragraph, $line, $file) = @_;
  828. ## Check the interior sequences in the command-text
  829. # and return the text
  830. $self->_check_ptree(
  831. $self->parse_text($paragraph,$line), $line, $file, '');
  832. }
  833. sub _check_ptree {
  834. my ($self,$ptree,$line,$file,$nestlist) = @_;
  835. local($_);
  836. my $text = '';
  837. # process each node in the parse tree
  838. foreach(@$ptree) {
  839. # regular text chunk
  840. unless(ref) {
  841. my $count;
  842. # count the unescaped angle brackets
  843. # complain only when warning level is greater than 1
  844. my $i = $_;
  845. if($count = $i =~ tr/<>/<>/) {
  846. $self->poderror({ -line => $line, -file => $file,
  847. -severity => 'WARNING',
  848. -msg => "$count unescaped <> in paragraph" })
  849. if($self->{-warnings} && $self->{-warnings}>1);
  850. }
  851. $text .= $i;
  852. next;
  853. }
  854. # have an interior sequence
  855. my $cmd = $_->cmd_name();
  856. my $contents = $_->parse_tree();
  857. ($file,$line) = $_->file_line();
  858. # check for valid tag
  859. if (! $VALID_SEQUENCES{$cmd}) {
  860. $self->poderror({ -line => $line, -file => $file,
  861. -severity => 'ERROR',
  862. -msg => qq(Unknown interior-sequence '$cmd')});
  863. # expand it anyway
  864. $text .= $self->_check_ptree($contents, $line, $file, "$nestlist$cmd");
  865. next;
  866. }
  867. if($nestlist =~ /$cmd/) {
  868. $self->poderror({ -line => $line, -file => $file,
  869. -severity => 'ERROR',
  870. -msg => "nested commands $cmd<...$cmd<...>...>"});
  871. # _TODO_ should we add the contents anyway?
  872. # expand it anyway, see below
  873. }
  874. if($cmd eq 'E') {
  875. # preserve entities
  876. if(@$contents > 1 || ref $$contents[0] || $$contents[0] !~ /^\w+$/) {
  877. $self->poderror({ -line => $line, -file => $file,
  878. -severity => 'ERROR',
  879. -msg => "garbled entity " . $_->raw_text()});
  880. next;
  881. }
  882. my $ent = $$contents[0];
  883. my $val;
  884. if($ent =~ /^0x[0-9a-f]+$/i) {
  885. # hexadec entity
  886. $val = hex($ent);
  887. }
  888. elsif($ent =~ /^0\d+$/) {
  889. # octal
  890. $val = oct($ent);
  891. }
  892. elsif($ent =~ /^\d+$/) {
  893. # numeric entity
  894. $val = $ent;
  895. }
  896. if(defined $val) {
  897. if($val>0 && $val<256) {
  898. $text .= chr($val);
  899. }
  900. else {
  901. $self->poderror({ -line => $line, -file => $file,
  902. -severity => 'ERROR',
  903. -msg => "Entity number out of range " . $_->raw_text()});
  904. }
  905. }
  906. elsif($ENTITIES{$ent}) {
  907. # known ISO entity
  908. $text .= $ENTITIES{$ent};
  909. }
  910. else {
  911. $self->poderror({ -line => $line, -file => $file,
  912. -severity => 'WARNING',
  913. -msg => "Unknown entity " . $_->raw_text()});
  914. $text .= "E<$ent>";
  915. }
  916. }
  917. elsif($cmd eq 'L') {
  918. # try to parse the hyperlink
  919. my $link = Pod::Hyperlink->new($contents->raw_text());
  920. unless(defined $link) {
  921. $self->poderror({ -line => $line, -file => $file,
  922. -severity => 'ERROR',
  923. -msg => "malformed link " . $_->raw_text() ." : $@"});
  924. next;
  925. }
  926. $link->line($line); # remember line
  927. if($self->{-warnings}) {
  928. foreach my $w ($link->warning()) {
  929. $self->poderror({ -line => $line, -file => $file,
  930. -severity => 'WARNING',
  931. -msg => $w });
  932. }
  933. }
  934. # check the link text
  935. $text .= $self->_check_ptree($self->parse_text($link->text(),
  936. $line), $line, $file, "$nestlist$cmd");
  937. # remember link
  938. $self->hyperlink([$line,$link]);
  939. }
  940. elsif($cmd =~ /[BCFIS]/) {
  941. # add the guts
  942. $text .= $self->_check_ptree($contents, $line, $file, "$nestlist$cmd");
  943. }
  944. elsif($cmd eq 'Z') {
  945. if(length($contents->raw_text())) {
  946. $self->poderror({ -line => $line, -file => $file,
  947. -severity => 'ERROR',
  948. -msg => "Nonempty Z<>"});
  949. }
  950. }
  951. elsif($cmd eq 'X') {
  952. my $idx = $self->_check_ptree($contents, $line, $file, "$nestlist$cmd");
  953. if($idx =~ /^\s*$/s) {
  954. $self->poderror({ -line => $line, -file => $file,
  955. -severity => 'ERROR',
  956. -msg => "Empty X<>"});
  957. }
  958. else {
  959. # remember this node
  960. $self->idx($idx);
  961. }
  962. }
  963. else {
  964. # not reached
  965. die "internal error";
  966. }
  967. }
  968. $text;
  969. }
  970. # process a block of verbatim text
  971. sub verbatim {
  972. ## Nothing particular to check
  973. my ($self, $paragraph, $line_num, $pod_para) = @_;
  974. $self->_preproc_par($paragraph);
  975. if($self->{_current_head1} eq 'NAME') {
  976. my ($file, $line) = $pod_para->file_line;
  977. $self->poderror({ -line => $line, -file => $file,
  978. -severity => 'WARNING',
  979. -msg => 'Verbatim paragraph in NAME section' });
  980. }
  981. }
  982. # process a block of regular text
  983. sub textblock {
  984. my ($self, $paragraph, $line_num, $pod_para) = @_;
  985. my ($file, $line) = $pod_para->file_line;
  986. $self->_preproc_par($paragraph);
  987. # skip this paragraph if in a =begin block
  988. unless($self->{_have_begin}) {
  989. my $block = $self->interpolate_and_check($paragraph, $line,$file);
  990. if($self->{_current_head1} eq 'NAME') {
  991. if($block =~ /^\s*(\S+?)\s*[,-]/) {
  992. # this is the canonical name
  993. $self->{-name} = $1 unless(defined $self->{-name});
  994. }
  995. }
  996. }
  997. }
  998. sub _preproc_par
  999. {
  1000. my $self = shift;
  1001. $_[0] =~ s/[\s\n]+$//;
  1002. if($_[0]) {
  1003. $self->{_commands_in_head}++;
  1004. $self->{_list_item_contents}++ if(defined $self->{_list_item_contents});
  1005. if(@{$self->{_list_stack}} && !$self->{_list_stack}->[0]->item()) {
  1006. $self->{_list_stack}->[0]->{_has_par} = 1;
  1007. }
  1008. }
  1009. }
  1010. 1;
  1011. __END__
  1012. =head1 AUTHOR
  1013. Brad Appleton E<lt>bradapp@enteract.comE<gt> (initial version),
  1014. Marek Rouchal E<lt>marek@saftsack.fs.uni-bayreuth.deE<gt>
  1015. Based on code for B<Pod::Text::pod2text()> written by
  1016. Tom Christiansen E<lt>tchrist@mox.perl.comE<gt>
  1017. =cut