feed

Jan  25  No actions in the grammar by Chris Poirier • in Discussions/Conceptspermalink

The first thing you should know about the RCC grammar description language is that action code is not embedded in the grammar.  This differs greatly from systems like yacc and ANTLR, which essentially overlay a macro layer on a source file in the output language.  I’m pretty big on separation of concerns — I think it’s a very important design strategy, especially when it comes to anything complex.  When you put too much stuff in one place, you start to lose the forest in the trees, and that restricts comprehension and usability.

The case against action code

Have a look at a real-world ANTLR grammar (thanks to Guillaume for the link).  Go ahead — I’ll wait.

Done?

Now, let’s say you are one of the developers responsible for that grammar, and you have a source file that isn’t parsing the way you expect.  Where the hell do you start in all that?  Frankly, I’m pretty good at finding bugs, just by reading the code; but finding a bug in that grammar?  Not something I’d want to have to do.

And, worse, what if you’re just a regular user of the language, trying to figure out if it is your understanding or the compiler that is wrong?

If your grammar contains actions, then your grammar is implementation.  It’s no longer just a specification.  And that’s a very bad thing, because it means you have to maintain a specification separately.  Eventually, inevitably, the two are going to get out of synch, and then which one do you believe?

But then how do you . . . ?

Of course, systems like yacc and ANTLR have a reason for putting action code right in the grammar: it is the easiest way to get custom processing into a generated parser.  And that custom processing can be very important — it generally does useful stuff like build an AST, or manage variable names.

RCC takes a different approach.  First, it attempts to generate a lot more stuff for you — including your AST.  This eliminates one of the primary uses of action code.  And second, it generates (in any language that will support it) an OO parser.  Hooks in the base class call methods at specific points during the parse, and all you have to do to “add” action code is to subclass the parser — in your implementation language — and fill in the methods you care about.  I first saw this used in SableCC, and I loved the idea, so I’ve totally stolen it for RCC.

That said, RCC does not support (and probably can’t even be action-ed to support) changing the parse based on the variable type of an identifier.  ANTLR’s action-code-in-the-parser can do this, and so this is a place RCC’s approach may prove less than ideal.  Even if it proves true (I’m not presently sure — I haven’t thought it out against the latest version of the code), it’s a tradeoff I’m willing to make, given that such a modification of the parse could significantly complicate (or just plain break) error-recovery and other useful features of the system.  In the end, I’m not convinced doing symbol table stuff during lexing and/or parsing is a great idea, but if you are, RCC may not be a good choice for you.

Conclusion

However, even with that trade off, I think RCC is still plenty capable.  And its grammars are certainly a lot easier to read than others I’ve seen.

Okay, I think that’s it for now.  Next time, I’ll go over the basic grammar syntax . . . .

Related Links

in Concepts:
in Discussions:
on site:

Discussion: No comments

Jump to comment form | comments rss | trackback uri

Leave a comment

Markdown: The kinds of formatting markup you'd use in an email will probably work here. For more details on what you can do, check out the Markdown docs.

Which is not a fruit? carrot, apple, banana (required)


Site copyright 2007-2008 Chris Poirier.       Powered by Wordpress.       Entries RSS Comments RSS Validate Log in