Monday, January 25, 2010

Using Pyparsing to extract dates from text block

This is very much work in progress, but I thought I'd post in case it helps anyone.

I'm currently using two bits of code, one to handle relative dates like "today", "2 days ago" and the other to handle specific dates like "3rd November 2009". These bits of code are building on work already done by others. I've tweeked them and added tests for parsing using parseString - expecting whole text to be a date, eg. "13th December 2009" and scanning using scanString where the date or dates is buried in the text, eg. "projects starts on 12th Nov 09 and finishes 3/2/10".

Relative dates:

Original code is in the Examples - In development on the pyparsing site:

Actual dates:

1 comment:

Phil said...

Actually, my code is far from perfect - there are still some test cases that won't work. Further, it is slow when skimming through large text documents, but that is another story...

Good that my snippet was of some use to you :)