windows-server-2003/tools/perl/site/lib/html/element/traverse.pm


# This is a .pm just to (try to) make some CPAN document converters
#  convert it happily as part of the dist's documentation tree.
package HTML::Element::traverse;
 # Time-stamp: "2001-03-10 21:32:23 MST"
use HTML::Element ();
$VERSION = $VERSION = $HTML::Element::VERSION;
1;

__END__

=head1 NAME

HTML::Element traverse - discussion of HTML::Element's traverse method

=head1 SYNOPSIS

  # $element->traverse is unnecessary and obscure.
  #   Don't use it in new code.

=head1 DESCRIPTION

C<HTML::Element> provides a method C<traverse> that traverses the tree
and calls user-specified callbacks for each node, in pre- or
post-order.  However, use of the method is quite superfluous: if you
want to recursively visit every node in the tree, it's almost always
simpler to write a subroutine does just that, than it is to bundle up
the pre- and/or post-order code in callbacks for the C<traverse>
method.

=head1 EXAMPLES

Suppose you want to traverse at/under a node $tree and give elements
an 'id' attribute unless they already have one.

You can use the C<traverse> method:

  {
    my $counter = 'x0000';
    $start_node->traverse(
      [ # Callbacks;
        # pre-order callback:
        sub {
          my $x = $_[0];
          $x->attr('id', $counter++) unless defined $x->attr('id');
          return HTML::Element::OK; # keep traversing
        },
        # post-order callback:
        undef
      ],
      1, # don't call the callbacks for text nodes
    );
  }

or you can just be simple and clear (and not have to understand the
calling format for C<traverse>) by writing a sub that traverses the
tree by just calling itself:

  {
    my $counter = 'x0000';
    sub give_id {
      my $x = $_[0];
      $x->attr('id', $counter++) unless defined $x->attr('id');
      foreach my $c ($x->content_list) {
        give_id($c) if ref $c; # ignore text nodes
      }
    };
    give_id($start_node);
  }

See, isn't that nice and clear?

But, if you really need to know:

=head1 THE TRAVERSE METHOD

The C<traverse()> method is a general object-method for traversing a
tree or subtree and calling user-specified callbacks.  It accepts the
following syntaxes:

=over

=item $h->traverse(\&callback)

=item or $h->traverse(\&callback, $ignore_text)

=item or $h->traverse( [\&pre_callback,\&post_callback] , $ignore_text)

=back

These all mean to traverse the element and all of its children.  That
is, this method starts at node $h, "pre-order visits" $h, traverses its
children, and then will "post-order visit" $h.  "Visiting" means that
the callback routine is called, with these arguments:

    $_[0] : the node (element or text segment),
    $_[1] : a startflag, and
    $_[2] : the depth

If the $ignore_text parameter is given and true, then the pre-order
call I<will not> be happen for text content.

The startflag is 1 when we enter a node (i.e., in pre-order calls) and
0 when we leave the node (in post-order calls).

Note, however, that post-order calls don't happen for nodes that are
text segments or are elements that are prototypically empty (like "br",
"hr", etc.).

If we visit text nodes (i.e., unless $ignore_text is given and true),
then when text nodes are visited, we will also pass two extra
arguments to the callback:

    $_[3] : the element that's the parent
             of this text node
    $_[4] : the index of this text node
             in its parent's content list

Note that you can specify that the pre-order routine can
be a different routine from the post-order one:

    $h->traverse( [\&pre_callback,\&post_callback], ...);

You can also specify that no post-order calls are to be made,
by providing a false value as the post-order routine:

    $h->traverse([ \&pre_callback,0 ], ...);

And similarly for suppressing pre-order callbacks:

    $h->traverse([ 0,\&post_callback ], ...);

Note that these two syntaxes specify the same operation:

    $h->traverse([\&foo,\&foo], ...);
    $h->traverse( \&foo       , ...);

The return values from calls to your pre- or post-order 
routines are significant, and are used to control recursion
into the tree.

These are the values you can return, listed in descending order
of my estimation of their usefulness:

=over

=item HTML::Element::OK, 1, or any other true value

...to keep on traversing.

Note that C<HTML::Element::OK> et
al are constants.  So if you're running under C<use strict>
(as I hope you are), and you say:
C<return HTML::Element::PRUEN>
the compiler will flag this as an error (an unallowable
bareword, specifically), whereas if you spell PRUNE correctly,
the compiler will not complain.

=item undef, 0, '0', '', or HTML::Element::PRUNE

...to block traversing under the current element's content.
(This is ignored if received from a post-order callback,
since by then the recursion has already happened.)
If this is returned by a pre-order callback, no
post-order callback for the current node will happen.
(Recall that if your callback exits with just C<return;>,
it is returning undef -- at least in scalar context, and
C<traverse> always calls your callbacks in scalar context.)

=item HTML::Element::ABORT

...to abort the whole traversal immediately.
This is often useful when you're looking for just the first
node in the tree that meets some criterion of yours.

=item HTML::Element::PRUNE_UP

...to abort continued traversal into this node and its parent
node.  No post-order callback for the current or parent
node will happen.

=item HTML::Element::PRUNE_SOFTLY

Like PRUNE, except that the post-order call for the current
node is not blocked.

=back

Almost every task to do with extracting information from a tree can be
expressed in terms of traverse operations (usually in only one pass,
and usually paying attention to only pre-order, or to only
post-order), or operations based on traversing. (In fact, many of the
other methods in this class are basically calls to traverse() with
particular arguments.)

The source code for HTML::Element and HTML::TreeBuilder contain
several examples of the use of the "traverse" method to gather
information about the content of trees and subtrees.

(Note: you should not change the structure of a tree I<while> you are
traversing it.)

[End of documentation for the C<traverse()> method]

=head2 Traversing with Recursive Anonymous Routines

Now, if you've been reading
I<Structure and Interpretation of Computer Programs> too much, maybe
you even want a recursive lambda.  Go ahead:

  {
    my $counter = 'x0000';
    my $give_id;
    $give_id = sub {
      my $x = $_[0];
      $x->attr('id', $counter++) unless defined $x->attr('id');
      foreach my $c ($x->content_list) {
        $give_id->($c) if ref $c; # ignore text nodes
      }
    };
    $give_id->($start_node);
    undef $give_id;
  }

It's a bit nutty, and it's I<still> more concise than a call to the
C<traverse> method!

It is left as an exercise to the reader to figure out how to do the
same thing without using a C<$give_id> symbol at all.

It is also left as an exercise to the reader to figure out why I
undefine C<$give_id>, above; and why I could achieved the same effect
with any of:

    $give_id = 'I like pie!';
   # or...
    $give_id = [];
   # or even;
    $give_id = sub { print "Mmmm pie!\n" };

But not:

    $give_id = sub { print "I'm $give_id and I like pie!\n" };
   # nor...
    $give_id = \$give_id;
   # nor...
    $give_id = { 'pie' => \$give_id, 'mode' => 'a la' };

=head2 Doing Recursive Things Iteratively

Note that you may at times see an iterative implementation of
pre-order traversal, like so:

   {
     my @to_do = ($tree); # start-node
     while(@to_do) {
       my $this = shift @to_do;
       
       # "Visit" the node:
       $this->attr('id', $counter++)
        unless defined $this->attr('id');
       
       unshift @to_do, grep ref $_, $this->content_list;
        # Put children on the stack -- they'll be visited next
     }
   }

This can I<under certain circumstances> be more efficient than just a
normal recursive routine, but at the cost of being rather obscure.  It
gains efficiency by avoiding the overhead of function-calling, but
since there are several method dispatches however you do it (to
C<attr> and C<content_list>), the overhead for a simple function call
is insignificant.

=head2 Pruning and Whatnot

The C<traverse> method does have the fairly neat features of 
the C<ABORT>, C<PRUNE_UP> and C<PRUNE_SOFTLY> signals.  None of these
can be implemented I<totally> straightforwardly with recursive
routines, but it is quite possible.  C<ABORT>-like behavior can be
implemented either with using non-local returning with C<eval>/C<die>:

  my $died_on; # if you need to know where...
  sub thing {
    ... visits $_[0]...
    ... maybe set $died_on to $_[0] and die "ABORT_TRAV" ...
    ... else call thing($child) for each child...
    ...any post-order visiting $_[0]...
  }
  eval { thing($node) };
  if($@) {
    if($@ =~ m<^ABORT_TRAV>) {
      ...it died (aborted) on $died_on...
    } else {
      die $@; # some REAL error happened
    }
  }

or you can just do it with flags:

  my($abort_flag, $died_on);
  sub thing {
    ... visits $_[0]...
    ... maybe set $abort_flag = 1; $died_on = $_[0]; return;
    foreach my $c ($_[0]->content_list) {
      thing($c);
      return if $abort_flag;
    }
    ...any post-order visiting $_[0]...
    return;
  }

  $abort_flag = $died_on = undef;
  thing($node);
  ...if defined $abort_flag, it died on $died_on

=head1 SEE ALSO

L<HTML::Element>

=head1 COPYRIGHT

Copyright 2000,2001 Sean M. Burke

=head1 AUTHOR

Sean M. Burke, E<lt>sburke@cpan.orgE<gt>

=cut