feed

May  20  Status Update by Chris Poirier • in Announcementspermalink

Well, this latest plan to integrate lexing and parsing has proven to me that I’m not nearly as smart as I like to think I am.  However, I think I’m over the hump.  I bloody well hope.

In redesigning stuff, I was forced to deal with a number of design flaws that I’d kind of hoped to leave for later.  Also, just figuring out how to write the code proved difficult. 

At this point, I think I’ve got generalized discard working, which was a major hurdle.  During the redesign, I realized that discard handling was never really working.  I mean, it worked in the core cases, but there were a bunch of edge cases where the old code just wouldn’t work (for instance, if the discard of a reduce item’s context conflicted with the discard of a shift item in the same state).  As discard processing interacts heavily with lexing, I couldn’t let it slide any longer.  I’m pretty sure it’s all resolved, now.  The fact that it takes significantly less code the new way is one of the few joys I’ve had in this process.

While trying to diagnose a massive performance loss in state table generation once I integrated lexing, I realized some of my original assumptions on how to do LALR parsing were no longer useful.  As a result, I’ve been able to eliminate the performance problem and improve streaming support in the parser.  The most major change is that “reduce” operations no longer look ahead first, wherever possible.  Before, action selection was always tied to the type of the token on lookahead.  Unfortunately, this meant that the parser could never finish with one structure until it had read the first token from the next.  This obviously is not good for streaming.  The new code performs lookahead only when necessary, and never if reduce is the only possible action.  And the benefit for state table generation is that it eliminates a lot of lexical states altogether (for the RCC grammar, it cuts the size of the state table in half).  Of course, there are costs to this approach, including a significant complication of error recovery, in that errors will not be detected as early as they used to be.  However, I’m hopeful the benefits will outweigh the costs.

While banging my head against my desk over the last couple of weeks, I did also manage to find and fix a number of significant performance issues with the underlying data structures.  This is a good thing, since adding lexical stuff into the state table and properly dealing with discard has significantly increased the amount of work the system needs to do in order to generate a parser.  I’m hopeful the new code will at least break even on parser generation time.

Anyway, I’ve still got a ways to go with it before I can say for sure if this path is the right one.  However, with all the design and performance issues that have been resolved since my last commit to the repository, I’m not sure I’m in a position to walk away from the redesign, even if it doesn’t work out as well as I’m hoping.

In terms of a new target date, I’m really not sure.  To be honest, I’m feeling rather burned out with this project, and I’m considering taking a few weeks off from it.  We’ll see.

Related Links

in Announcements:
on site:

Discussion: No comments

Jump to comment form | comments rss | trackback uri

Leave a comment

Markdown: The kinds of formatting markup you'd use in an email will probably work here. For more details on what you can do, check out the Markdown docs.

Which is not a fruit? carrot, apple, banana (required)


Site copyright 2007-2008 Chris Poirier.       Powered by Wordpress.       Entries RSS Comments RSS Validate Log in