SPiZone is the SPi Global Platform for content extraction, normalization and transformation that works with both data PDF files as well as scanned images. SPiZone's data scraping feature can be used for digitization and content extraction from a wide range of PDF and image formats such as book/journal pages, customer invoices, and purchase orders.
Some of the key data scraping features include:
• Coordinate extraction along with content extraction for effective searching
• Automated entity or zone extraction based on rules/content analysis
• Content analysis and QA tools layered on OCR engines to improve the accuracy of extracted content