I’m going to take the opportunity to elaborate on something I mentioned in Introduction to RCC Grammars): RCC-generated lexers are context-sensitive.
Specifically, RCC generates a whole set of lexers for you, one for each parser state.
Why should you care? Well, because it means that RCC can handle declaratively what you might have had to do with custom code in the past; things like:
In other words, RCC makes it easy to embed sub-languages in your grammar. As long as you don’t expect the same string of characters to be lexed in two ways in the same position within your grammar, RCC will ensure that it is lexed in the way that is appropriate for the context. (And you can get past that other restriction by enabling backtracking within the generated parser.)
Consider this Ruby-like expression:
puts "Confirmed guests: {confirmed_guests.join(", ")}."
An RCC grammar like this could process it:
group expression
addition_expression => expression '+' expression @associativity=left
subtraction_expression => expression '-' expression @associativity=left
# . . . additional expression rules left as an exercise for the reader . . .
string_expression
end
section strings
strings
escape_sequence => '\\' [rtn\\"{]
general_text => [\u0000-\uffff]-["\\{]+
end
string_expression => ‘”‘ (escape_sequence|general_text|nested_expression):element* ‘”‘
nested_expression => ‘{’ expression ‘}’
end
It is safe to define a string like general_text (which would match just about any text in a source file), because it will only ever be used inside a string_expression (in this grammar, anyway). And because the opening { of a nested expression is excluded from general_text, any such { inside a string_expression will trigger a shift into nested_expression.
Finally, because RCC generates an LR parser, you can still use } and even string_expressions inside expression without causing a problem, because the parser only considers reductions at the top of the stack. In other words, a } will not cause a reduce of the nested_expression until the top of the stack matches '{' expression '}' in a state where nested_expression is expected.
| in Concepts: | |
| in Discussions: | |
| on site: |
Markdown: The kinds of formatting markup you'd use in an email will probably work here. For more details on what you can do, check out the Markdown docs.