-
Notifications
You must be signed in to change notification settings - Fork 240
More about parsing
In previous chapter we only took a glance at the
parse_doc
method, here we take a deeper look at the Scanner object,
that gets passed to the parse_doc
method.
The Scanner is similar to Ruby builtin StringScanner, it remembers the position of a scan pointer (a position inside the string we're parsing). The scanning itself is a process of advancing the scan pointer through the string a small step at a time.
Let's visualize how this works. Here's the state of the Scanner at
the time parse_doc
gets called.
* @license GNU General Public License v3
^
The scan pointer (denoted as ^
) has stopped at the beginning of the
word "GNU"
. At that point we could look ahead to see what's coming,
e.g. we could check if we're starting to parse a GNU license:
scanner.look(/GNU/) # --> true
The look
method performs a simple regex match starting at the
position of our scan pointer, returning true when the regex matches.
But just looking doesn't advance the scan pointer, that's what the
match
method is for:
scanner.match(/.*$/) # --> "GNU General Public License v3"
match
returns the actual string matching the regex and advances scan
pointer to the position where the match ended:
* @license GNU General Public License v3
^
The Scanner class contains a bunch of other useful methods for
parsing the docs, but look
and match
are really at the core of it,
and for our @license
tag purposes we have done enough of parsing and
successfully extracted the name of the license.