ACS Solutions Limited

Technical solutions for business needs
Home
Consultancy
Software Development
Recent Projects
Webservice New Starts
Webservice LR Requests
Internet Fax Integration
Document Data Extract
Document Scanning
Case Status Website
CallCredit AML
About Us
Contact Us
Blogs
Privacy Policy
Site Map
Fasttrac Solicitors: Document Data Extract
 
Brief
The project brief was to enhance the Electronic Land Registry Request system by providing automatic processing of the responses documents.
 
Findings
PDF response documents consist of one of two templates (one English, one English and Welsh) into which data is merged. The challenge was to extract all pertinent data with a very high degree of reliability and to save that data in the case management database.
 
Solution
In this solution we designed and developed a recursive regular expression (RET) tool to extract the data from the source documents. The SQL Server 2005 database contains a set of hierarchical "trees" of regular expressions. These are industry-standard text search strings, but the clever part is how we string them together and enable the output of a parent expression to be consumed by a child expression to enable us to "home in" on the data we need and to confidently exclude the rest if the text. The system is written in C# and Visual Studio 2008 and depends heavily on SQL Server.
 
The text extraction from PDF is run by a C# service which drives the Java PDFBox (www.pdfbox.org) open-source library running on the amazing IKVM java for .NET (www.ikvm.net) layer to extract the text from the PDF.
 
The RET service applies the expression tree to the document to produce a hierarchical output tree. The metadata in the expression tree includes constraints which must be met and also markers to indicate which data which should be returned.
 
Once the RET request is completed, the original workflow proceeds and a script running in Solcase returns the results back to the Solcase system. It posts all data from the RET to correct fields on the case and if errors have been found enqueues the case for manual attention.
 
Conclusion
As far as we know (and we have discussed this with the Land Registry), this is the most sophisticated and automated Land Registry document processing system in the country. It has not only contributed significantly to case processing time reductions and therefore cost reduction, but also to improvement in accuracy and therefore quality.