More about parsing

In previous chapter we only took a glance at the parse_doc method, here we take a deeper look at the Scanner object, that gets passed to the parse_doc method.

The Scanner is similar to Ruby builtin StringScanner, it remembers the position of a scan pointer (a position inside the string we're parsing). The scanning itself is a process of advancing the scan pointer through the string a small step at a time.

Let's visualize how this works. Here's the state of the Scanner at the time parse_doc gets called.

* @license GNU General Public License v3
           ^

The scan pointer (denoted as ^) has stopped at the beginning of the word "GNU". At that point we could look ahead to see what's coming, e.g. we could check if we're starting to parse a GNU license:

scanner.look(/GNU/)   # --> true

The look method performs a simple regex match starting at the position of our scan pointer, returning true when the regex matches. But just looking doesn't advance the scan pointer, that's what the match method is for:

scanner.match(/.*$/)  # --> "GNU General Public License v3"

match returns the actual string matching the regex and advances scan pointer to the position where the match ended:

* @license GNU General Public License v3
                                        ^

The Scanner class contains a bunch of other useful methods for parsing the docs, but look and match are really at the core of it, and for our @license tag purposes we have done enough of parsing and successfully extracted the name of the license.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More about parsing

Clone this wiki locally