Saturday, January 23, 2010

Getting started with Pyparsing

Pyparsing is especially useful for those of us who don't use regexp enough to get any good at it! It is a deceptively simple parser for all kinds of text.

Here are a few resources and tutorials I found useful.

And here is my first program. The requirement is to decode an instruction into two chunks. An opening "o" or "-", followed by a mix of "?", "!", "&","^".

from pyparsing import *

def matching(list1, list2):

if len(list1) != len(list2):
return False

for key in list1.keys():
if (not list2.has_key(key)) or (list2.has_key(key) and list2[key] != list1[key]):
return False
return True

tests = [
('o?',{'status':'o', 'stype':'?'}),
('-!',{'status':'-', 'stype':'!'}),
('-!!!',{'status':'-', 'stype':'!'}),
('o?!',{'status':'o', 'stype':'?'}),

def handleStuff(string, location, tokens):

print 'string',string
print 'tokens', tokens, tokens[0][0]
return tokens[0][0]

status = Word("-o")
stype = Word("!?&^").setParseAction(handleStuff)

search = ZeroOrMore(status("status"))+ZeroOrMore(stype("stype"))

for (test, result) in tests:
print '---------------'
parsed = search.parseString(test,parseAll=True)
return_value = {'status':parsed.status, 'stype':parsed.stype}
print test, return_value, result, matching(return_value, result)

The bits I had to google a while for:

How to return named tokens.

If you did:

search = ZeroOrMore(status("myvar"))

and this is recommended above using setResultsName I believe.

then to get the return value:

parsed = search.parseString("My test string")
print parsed.myvar

The other question was what does setParseAction return and the answer is an updated token. See example above which returns the first character in the first token.

Results of the program are:

string o?
tokens ['?'] ?
o? {'status': 'o', 'stype': '?'} {'status': 'o', 'stype': '?'} True
string -!
tokens ['!'] !
-! {'status': '-', 'stype': '!'} {'status': '-', 'stype': '!'} True
string -!!!
tokens ['!!!'] !
-!!! {'status': '-', 'stype': '!'} {'status': '-', 'stype': '!'} True
string o?!
tokens ['?!'] ?
o?! {'status': 'o', 'stype': '?'} {'status': 'o', 'stype': '?'} True

1 comment:

ptmcg said...

Congrats on your first steps with pyparsing! Please stop by the wiki and post if you have questions, or want to brag on your latest pyparsing conquest!

-- Paul