SPi Global

Structured and Unstructured Data Solutions– Drawing insights and sense from the Data

With ‘Data’ gaining much more prominence in today’s world and has a great deal in making an informed business decision. The enterprise needs to understand the forms in which ‘Data’ exits – Structured & unstructured data forms, to make sense out of it.

80% of data within the enterprise is ‘Unstructured’. The sources for unstructured data include human-generated content such as emails, PDFs, text files, social media conversations, multimedia content, and of course, business-critical reports, legal documents, and presentations.

Unstructured data analytics is shaping up the market and enterprises need to respond quickly, swiftly, and accurately to the volatility of the market. Thus, it is essential to comprehend the enormous challenge that unstructured data brings as well as its advantages

·  Gaining competitive advantage: Combining unstructured data with traditional structured data helps provide a single holistic view of the enterprise and its stakeholders

·  Unlock actionable insights: Analyzing unstructured data helps identify new trends and hard to spot patterns, improve customer satisfaction, and highlight new areas of opportunities

·  Cost efficiency: Processing unstructured data would help enterprises reduce the cost of operations by 30 to 60%

The key to meaningful insights – Structured and Unstructured data Solutions

Data analytics solutions enhances understanding and helps in making an accurate and informed decision. Unstructured data analytics is experiencing exponential growth of 55–60% every year. Advancement in data analytics and science is becoming essential tools for enterprises to sift unstructured data.

Moreover, to differentiate in the digital era and stay relevant to ever-changing customer demand, enterprises need to better leverage unstructured data by building solutions that acquire, manage, and deliver data and drive actionable insights.


Ambitious to remain at the forefront of cutting-edge technology, SPi Global continuous to invest heavily in Robotic Process Automation (RPA) Natural Language Processing (NLP) and Artificial Intelligence (AI). These technologies form the backbone of our platform-driven approach.

Our flagship SPARK platform is not only powered by technologies leveraging automation and Machine Learning (ML) but also combines knowledge management, business intelligence and lean operations principles to deliver productivity and efficiency improvements across your enterprise.

We follow a flexible, scalable, agile and customer-centric engagement model allowing SPARK to be hosted on customers’ in-house environments, or by SPi.


SPi’s flagship, intelligent, end-to-end technology platform, SPARK brings together modular AI-based components to address the diverse data management challenges faced by enterprises across multiple verticals.

SPARK is a one-stop solution, addressing data acquisition, data quality, and the transformation of analytics-ready data. The platform and its modules are especially powerful when processing unstructured data, an ongoing challenge for enterprises of all kinds.

The platform features three principal modules.

AI-based data management solutions – Structured vs Unstructured

Extraction Module
Built on an ML engine, SPARK’s extraction module leverages cognitive models to achieve industry-leading accuracy in extracting unstructured data from PDFs, documents, maps, emails and images.

The extraction module also uses SPi’s next-generation web harvesting platform, powered by AI and NLP accelerators, to extract information and monitor websites for raw, standardized data.

  • Technology-assisted source acquisition for a more systematic and thorough approach to acquiring the right data.
  • Nimble approach allows a single script to extract similar data from multiple sites.
  • Data cleansing, structuring and noise removal at the acquisition stage save significant storage costs downstream.
  • Web page monitoring tracks and captures website changes as they happen.
  • Server-based PDF extraction and OCR reduces load on user devices.
  • Automated identification of potential issues such as font details, uncertain spacing and soft hyphens within the extracted file, reduces the need for human curation.
  • Table, image and diagram extraction included as standard.
  • Small sample sizes for training high levels of automation through ML.

Transformation Module
Leveraging a quality framework built on our custom AI and ML models, and algorithms customized for each industry, SPARK’s transformation module creates and maintains a master dataset from all data fed into it.

  • Proprietary knowledge repositories and key de-duplication and standardization suites allow SPARK’s transformation module to scrub reference data such as company and product names, geographies, phone numbers and subject matter expert (SME) terminologies.
  • SPARK’s multi-level ML-based model helps disambiguate and link people, company and other datasets to create a single source of truth.

Enrichment Module
Leveraging SME knowledge across multiple domains, SPARK’s Enrichment Module offers information providers and other enterprises a unique opportunity to leverage existing datasets to their full potential.

This module includes entity extraction, taxonomy and classification, summarization, metadata creation and more.

  • Customized rules and knowledge repositories for a wide range of domains including finance, legal, real estate, science, engineering, medicine, social sciences and humanities, and inter-disciplinary information.
  • Software and SMEs working together ensure scalability and accuracy, supporting stringent SLAs.
  • Keyword extraction, indexing and concept identification, along with SEO-optimized summaries.
  • In-built NLP modules facilitate concept extraction, editorial and readability checks.

SPi Labs

A unique conceptualization and development hub to foster innovation and experimentation, SPi Labs leverages next-generation technologies and open source capabilities. The facility brings together software engineers, SMEs, researchers, data analyst and technology specialists to expand SPi’s capabilities, accelerating the development of innovative solutions that enable organizations and society to make better use of big data and domain knowledge.

SPi Labs success stories include solutions developed for concept extraction and the automated business relevancy discovery in the health and science domain, and semantic analysis and NLP research to understand terminology in legal documents.

As part of this initiative SPi continues to invest in proprietary cognitive technology to target the complex problems commonly encountered in invoice processing and handwritten text recognition.


It is key for businesses today to become increasingly data-driven, and eventually to create data economies, enabling data-driven insights to boost revenue, accelerate time to delivery and facilitate smarter business decision-making.

However, data which is fragmented, of low quality or not curated cannot be optimally leveraged in these ways. The sheer volume of dark and untapped data held by many businesses exacerbates the challenges involved in leveraging all available information to best advantage.

We provide customers with comprehensive, accurate data to enable improved search, navigation and analysis through our Data Management and Data Enrichment services, supported by our operations and technology capabilities.

SPi’s specialists and data experts enable global enterprises to optimize their information systems in multiple languages, with data cleansing, normalization, aggregation, and abstraction capabilities. We support leading database products across multiple industries, including science, medicine, engineering, legal, financial and business information, risk and compliance, media and entertainment.

Data Management

The essential first step to good data is identifying the right data. We apply proven technology and processes in our Source Discovery and Analysis, Data Acquisition, Cleansing and Normalization, De-duplication and Disambiguation, and Metadata Management services.

Data Enrichment

If data is oil, enrichment is the refinery. SPi has wide experience in enriching data across multiple industries, our enrichment services including Mapping and Linking, Entity Extraction (indexing), Summarization, Knowledge Modeling and Annotation.

Platforms & Operations

Database Design & Maintenance

Our content support team handles the design of content and data structures and schemas, as well as workflows to implement new capabilities on your hosting platform.

We have rich experience in a wide range of proprietary and open source CMS platforms. Along with our in-depth knowledge of data products, this expertise enables us to provide you with a one-stop solution to optimize all your data processes.

Content Migration

SPi has teams highly experienced in migrating content from legacy platforms to new ones, with a detailed understand of how schemas need to be set up for unstructured data. Our full suite solution includes requirements analysis, content structure mapping, ETL process management, ingestion QA and testing. We have migrated millions of records for leading information providers in the risk and compliance, health and science, and legal domains.

Platform Build and Deploy

With SPARK, our workflow management system, MAGNUS, and custom platforms, SPi builds, customizes and deploys various platforms to help optimize data workflows. All our platforms are built with APIs which can be customized for any CMS and deployed in the cloud or on-premise.



Without context, data is meaningless. SPi’s analytics services assist companies in the creation of meaning and insight, driving action through the existing data in their workflows. We help organizations choose the right tools, integrate them with relevant functions, and train their users to understand the insights extracted and act accordingly.

SPi takes a customized approach to understand business problems and strategic goals, setting up systems for visualization, business analytics and insights, with constant feedback to maintain high levels of accuracy.

Visualization and Reporting

SPi’s team of data experts, including data engineers and data scientists, helps customers with visualization design and implementation, integration of visualization tools with existing workflows, and report building, scheduling and distribution.

Business Analytics

Focusing on an outcome-based, analytic approach, SPi helps companies use data to solve business problems. Using an iterative process in the creation of analytic models, we work with customer analytics, marketing analytics, product and sales analytics, operations and logistics analytics, and content analytics.

Case Studies

Accurate Store Location Information for the Geo-Location Technology Platform

The client helps retailers and brands launch products, increase sales, and drive traffic by offering innovative customer rewards programs through its shopping rewards app. The geo-location technology within the app guides shoppers to the nearest store location where a partner’s products are available. With its pay-for-performance model and customizable product campaigns, retailers and brands that partner with the client have seen a significant return on investment.

SPi Global provided end-to-end data collection and maintenance, including geo-tagging the location information of retailer storefronts. SPi Global leveraged cutting-edge technology to achieve scale as well as the expertise of in-house data analysts for maximum accuracy.

For this project, we developed a custom scraping script to collect and store location information from the official websites of the retailers. To improve data quality, our team then created an address-matching algorithm to identify new, relocated, and closed stores as well as any potential duplicates in the collected data.

After the initial build out, we periodically checked stores to ensure the latest data, taking into account any new or closed stores.

Ultimately SPi Global delivered a scalable and customizable solution with significant flexibility for the client to adapt data based on their product and promotion strategy, streamlining the onboarding of new retailers to the platform.

Improve Discoverability of Patents in Intellectual Property Database

As the world’s most comprehensive patent information database, the client serves researchers, students, engineers, designers, technicians, scientists, and research and development (R&D) professionals who need comprehensive information on patents filed in their field.

The client wanted a scalable and accurate solution that would create value-added abstracts for improved patent discoverability within their intellectual property database.

SPi Global deployed ConSCIse™, its proprietary platform-enabled content abstracting solution for the client. ConSCIse blends proprietary software with Subject Matter Expert (SME) curation to deliver high-quality abstracts.

As part of this solution, SPi Global developed a ConSCiSe module for engineering with NLP and ML models, facilitating contextual translation from 16 languages with integrated translation memory tools.

To ensure an optimal QA layer, we employed SMEs from diverse engineering domains (computers to mechanical).

The scaled solution was designed to cope with managing data from over 31 countries and reduced TAT from 21 days to 5 days.

Scaling and Streamlining Journal Transfers

A leading STM publisher was seeking a mechanism to scale journal transfers within their portfolio. This was offered when authors had listed alternate journals at the time of submission, but only a small percent of authors followed up on transfers. The use of multiple systems also created a challenge in building a system-independent workflow.

To support the client, SPi Global set up an initial set of pilot journals on the Transfer Desk application, then built a custom journal recommendation model based on AI that uses the content in the chapter combined with business rules to recommend potential transfer journals.

A function established within the team to facilitate author follow-through, by managing the transfers once an author has accepted a transfer.

SPi Global launched this system in 12 weeks and went live with 10 journals. The project scaled to the current management of 200+ journals.

Deliver high-quality business information on time

A leading global publisher, featuring over 100K publications across various content types including News, Business, Finance, and Legal.

This publisher wanted to publish new content quickly without sacrificing quality. It was important to monitor and ensure that all publications coming into their collection and conversion system were updated with high-quality content and on time.

SPi Global had end-to-end responsibility for this offer — from securing the content to posting it on the internet.
Additionally, we owned the data analysis, design, and creation of conversion programs for new products included into the client’s online platform. Monitoring support was run 24/7 for troubleshooting conversion, display, and functionality issues for online products under maintenance.

We addressed content licensing compliance issues and work with source providers to resolve relationship issues, while also maintaining direct contact with client’s licensing unit and publishers.

Finally, we worked with an IT vendor to transition additional content management tasks previously part of the enterprise platform.

Monitor drug pricing data for information changes

An integrated drug database with the most current, accurate, and technologically advanced drug data and drug decision support wanted to ensure the accuracy of their information.

A typical user relies on the client’s drug database to understand how much a drug costs across various states in the US. Therefore, having the most current drug pricing is essential for the client and their customers.

SPi Global delivered an all-encompassing automated website monitoring, data extraction, and time-saving mapping tool. The solution involved development of customized workflow applications leveraging ACQUIRE and SPiZone.

This offer actively monitored changes in websites owned by state authorities for State Maximum Allowable Costs (SMAC) and Average Actual Cost (AAC) of drugs, then downloaded the latest pricing data. Additionally, we extracted drug and price information from downloaded PDFs using entity recognition models.

With this tool, we achieved a high level of automated transformation and load process with matching algorithms that mapped the drug names extracted from the PDFs to drug databases owned by the client.

Finally, to optimize client utility and savings, we developed a monitoring schedule based on predictive modelling that ensures data update frequency without burning a hole into infrastructure costs.

Vehicle accident reporting automation saves time and money

The client has a leading platform that streamlines and secures the entire accident reporting process from data capture, storage, and access to analysis and distribution. The platform primarily serves law enforcement agencies, individuals, insurers, and authorized parties.

The client was looking to reduce costs by optimizing the current process through automation without sacrificing high data quality.


SPi Global worked to create an offer that captured critical data elements from police auto-accident report images – sourced from 64 state agencies, each containing 100 to 400 data fields – and optimized the reporting workflow.

By redesigning the traditional double keying method with a 2-phase iterative automation approach, we were able to deliver a functional technology solution to manage the variable workload. Our approach included:

  • Phase 1: Implementation of an auto-extraction service powered by OCR+ business rules, then design an intuitive UI layer with a workflow, dashboard, and reporting mechanism to create a golden set for training
  • Phase 2: Building on Phase 1 automation, increasing efficiency and accuracy of both extraction and QC through state-specific ML models

Scaling trademark and brand protection and search

A leading provider of brand protection services to IP firms, brand managers, and legal firms needed a scalable and cost-effective solution to enable seamless discovery and management of customer searches


Combining technology and SME knowledge, SPi Global created a tailor-made trademark database watch and management solution to deliver efficient and effective brand monitoring.

The offer utilized the latest computer vision technology to index and annotate the trademarks using Vienna code classifications, and created a knowledge repository of frequently requested trademarks for effortless and quick processing of trademark monitoring. It also included an intelligent workflow management module to direct routing based on SME specialization.

The system monitored nearly 400,000 trademarks annually for potential infringements with 99.985% accuracy, and progressively improved turnaround time from 14 to 3 days.

Pendo Systems


What do we do

The Pendo Systems data platform enables organizations to quickly turn unstructured documents into structured, addressable data at machine scale

Built specifically for financial services, the platform combines a set of proprietary algorithms with repeatable, controlled analysis of documents that enables classification of unstructured data and unlocks the insights trapped in millions of mission critical documents

The Pendo Systems Platform has processed over 250 Million unstructured documents to locate and extract the key terms from contracts. It is evidence-based, provides full data lineage and has been validated time and again for a variety of use cases that have included critical, regulatory challenges specifically, Matters Requiring Attention

How do we do it?

Unstructured data solutions – Data Analytics

Rapidly parse a wide variety of document types and instantly search and group them

data analysis and data solutions

Review your documents quickly and use pre-trained NLP and a query language

data analysis and actionable insight from unstructured data

data solutions – data analytics solutions

Constantly iterate, refine and improve your data prior to publishing

Managing LIBOR Transition Assesment using Pendo’s LIBOR Fallback Engine

The Problem:

LIBOR is the reference rate directly or indirectly for millions of contracts worth more than $400+ trillion, ranging from mortgage products to derivatives. The upcoming cessation of LIBOR poses a serious challenge for banks, investment firms and asset management companies.

One of the first steps financial institutions must undertake to prepare for LIBOR transition is to identify all the contracts which reference LIBOR and review the fallback language highlighting those which contain inadequate fallback provisions

These LIBOR fall back terms are buried in unstructured documents that are not-machine readable thereby impeding the companies’ ability to identify, prioritize and execute necessary actions to manage this transition with the needed transparency, speed and confidence to minimize operational risk and expense

How can Pendo help?

The Pendo LIBOR Fallback engine is a unique, automated capability that quickly and accurately digitizes unstructured source documents and surfaces key LIBOR fallback terms & conditions. Our fallback engine uses proprietary, domain specific language that enables engine to navigate thousands of different LIBOR contract types using a set of custom-designed utilities and functions that make rules & scripts more flexible and accurate

Why Pendo?

The Pendo Systems platform has been deployed for multiple LIBOR-specific engagements and has digitized over 70 different contract types including complex derivate products that included LIBOR Fallback language, helping consultants, law firms and financial institutions prepare for the LIBOR Transition process

Contact Us

Any questions? Get in touch with us