SPi Global

Data Solutions

SPi Global takes a holistic, consultancy-based approach to data solutions delivery. We work closely with you to understand your immediate and future needs, by gaining in-depth understanding of your workflows and processes, to identify potential scope for optimization.

We build solutions to meet your needs across data acquisition, management, enrichment, delivery and analytics. Bringing together technology components and expert intervention from our global team we deliver significant, tangible benefits across our customers’ organizations.


Ambitious to remain at the forefront of cutting-edge technology, SPi Global continuous to invest heavily in Robotic Process Automation (RPA) Natural Language Processing (NLP) and Artificial Intelligence (AI). These technologies form the backbone of our platform-driven approach.

Our flagship SPARK platform is not only powered by technologies leveraging automation and Machine Learning (ML) but also combines knowledge management, business intelligence and lean operations principles to deliver productivity and efficiency improvements across your enterprise.

We follow a flexible, scalable, agile and customer-centric engagement model allowing SPARK to be hosted on customers’ in-house environments, or by SPi.


SPi’s flagship, intelligent, end-to-end technology platform, SPARK brings together modular AI-based components to address the diverse data management challenges faced by enterprises across multiple verticals.

SPARK is a one-stop solution, addressing data acquisition, data quality, and the transformation of analytics-ready data. The platform and its modules are especially powerful when processing unstructured data, an ongoing challenge for enterprises of all kinds.

The platform features three principal modules.

Extraction Module
Built on an ML engine, SPARK’s extraction module leverages cognitive models to achieve industry-leading accuracy in extracting unstructured data from PDFs, documents, maps, emails and images.

The extraction module also uses SPi’s next-generation web harvesting platform, powered by AI and NLP accelerators, to extract information and monitor websites for raw, standardized data.

  • Technology-assisted source acquisition for a more systematic and thorough approach to acquiring the right data.
  • Nimble approach allows a single script to extract similar data from multiple sites.
  • Data cleansing, structuring and noise removal at the acquisition stage save significant storage costs downstream.
  • Web page monitoring tracks and captures website changes as they happen.
  • Server-based PDF extraction and OCR reduces load on user devices.
  • Automated identification of potential issues such as font details, uncertain spacing and soft hyphens within the extracted file, reduces the need for human curation.
  • Table, image and diagram extraction included as standard.
  • Small sample sizes for training high levels of automation through ML.

Transformation Module
Leveraging a quality framework built on our custom AI and ML models, and algorithms customized for each industry, SPARK’s transformation module creates and maintains a master dataset from all data fed into it.

  • Proprietary knowledge repositories and key de-duplication and standardization suites allow SPARK’s transformation module to scrub reference data such as company and product names, geographies, phone numbers and subject matter expert (SME) terminologies.
  • SPARK’s multi-level ML-based model helps disambiguate and link people, company and other datasets to create a single source of truth.

Enrichment Module
Leveraging SME knowledge across multiple domains, SPARK’s Enrichment Module offers information providers and other enterprises a unique opportunity to leverage existing datasets to their full potential.

This module includes entity extraction, taxonomy and classification, summarization, metadata creation and more.

  • Customized rules and knowledge repositories for a wide range of domains including finance, legal, real estate, science, engineering, medicine, social sciences and humanities, and inter-disciplinary information.
  • Software and SMEs working together ensure scalability and accuracy, supporting stringent SLAs.
  • Keyword extraction, indexing and concept identification, along with SEO-optimized summaries.
  • In-built NLP modules facilitate concept extraction, editorial and readability checks.

SPi Labs

A unique conceptualization and development hub to foster innovation and experimentation, SPi Labs leverages next-generation technologies and open source capabilities. The facility brings together software engineers, SMEs, researchers, data analyst and technology specialists to expand SPi’s capabilities, accelerating the development of innovative solutions that enable organizations and society to make better use of big data and domain knowledge.

SPi Labs success stories include solutions developed for concept extraction and the automated business relevancy discovery in the health and science domain, and semantic analysis and NLP research to understand terminology in legal documents.

As part of this initiative SPi continues to invest in proprietary cognitive technology to target the complex problems commonly encountered in invoice processing and handwritten text recognition.


It is key for businesses today to become increasingly data-driven, and eventually to create data economies, enabling data-driven insights to boost revenue, accelerate time to delivery and facilitate smarter business decision-making.

However, data which is fragmented, of low quality or not curated cannot be optimally leveraged in these ways. The sheer volume of dark and untapped data held by many businesses exacerbates the challenges involved in leveraging all available information to best advantage.

We provide customers with comprehensive, accurate data to enable improved search, navigation and analysis through our Data Management and Data Enrichment services, supported by our operations and technology capabilities.

SPi’s specialists and data experts enable global enterprises to optimize their information systems in multiple languages, with data cleansing, normalization, aggregation, and abstraction capabilities. We support leading database products across multiple industries, including science, medicine, engineering, legal, financial and business information, risk and compliance, media and entertainment.

Data Management

The essential first step to good data is identifying the right data. We apply proven technology and processes in our Source Discovery and Analysis, Data Acquisition, Cleansing and Normalization, De-duplication and Disambiguation, and Metadata Management services.

Data Enrichment

If data is oil, enrichment is the refinery. SPi has wide experience in enriching data across multiple industries, our enrichment services including Mapping and Linking, Entity Extraction (indexing), Summarization, Knowledge Modeling and Annotation.

Platforms & Operations

Database Design & Maintenance

Our content support team handles the design of content and data structures and schemas, as well as workflows to implement new capabilities on your hosting platform.

We have rich experience in a wide range of proprietary and open source CMS platforms. Along with our in-depth knowledge of data products, this expertise enables us to provide you with a one-stop solution to optimize all your data processes.

Content Migration

SPi has teams highly experienced in migrating content from legacy platforms to new ones, with a detailed understand of how schemas need to be set up for unstructured data. Our full suite solution includes requirements analysis, content structure mapping, ETL process management, ingestion QA and testing. We have migrated millions of records for leading information providers in the risk and compliance, health and science, and legal domains.

Platform Build and Deploy

With SPARK, our workflow management system, MAGNUS, and custom platforms, SPi builds, customizes and deploys various platforms to help optimize data workflows. All our platforms are built with APIs which can be customized for any CMS and deployed in the cloud or on-premise.



Without context, data is meaningless. SPi’s analytics services assist companies in the creation of meaning and insight, driving action through the existing data in their workflows. We help organizations choose the right tools, integrate them with relevant functions, and train their users to understand the insights extracted and act accordingly.

SPi takes a customized approach to understand business problems and strategic goals, setting up systems for visualization, business analytics and insights, with constant feedback to maintain high levels of accuracy.

Visualization and Reporting

SPi’s team of data experts, including data engineers and data scientists, helps customers with visualization design and implementation, integration of visualization tools with existing workflows, and report building, scheduling and distribution.

Business Analytics

Focusing on an outcome-based, analytic approach, SPi helps companies use data to solve business problems. Using an iterative process in the creation of analytic models, we work with customer analytics, marketing analytics, product and sales analytics, operations and logistics analytics, and content analytics.

Case Studies

Scaling Trademark and Brand Protection and Search

A leading provider of brand protection services to IP firms, brand managers, and legal firms needed a scalable and cost-effective solution to enable seamless discovery and management of customer searches


Combining technology and SME knowledge, SPi Global created a tailor-made trademark database watch and management solution to deliver efficient and effective brand monitoring.

The offer utilized the latest computer vision technology to index and annotate the trademarks using Vienna code classifications, and created a knowledge repository of frequently requested trademarks for effortless and quick processing of trademark monitoring. It also included an intelligent workflow management module to direct routing based on SME specialization.

The system monitored nearly 400,000 trademarks annually for potential infringements with 99.985% accuracy, and progressively improved turnaround time from 14 to 3 days.

Pendo Systems


What do we do

Pendo systems data platform enables organizations to quickly turn unstructured documents into structured, addressable data at machine scale

Built specifically for financial services, the platform combines a set of proprietary algorithms with repeatable, controlled analysis of documents that enables classification of unstructured data and unlocks the insights trapped in millions of mission critical documents

The Pendo Platform has processed over 250 Million unstructured documents to locate and extract the key terms from contracts. It is evidence-based, provides full data lineage and has been validated time and again for a variety of use cases that have included critical, regulatory challenges specifically, Matters Requiring Attention

How do we do it?


Rapidly parse a wide variety of document types and instantly search and group them


Review your documents quickly and use pre-trained NLP and a query language



Constantly iterate, refine and improve your data prior to publishing

Managing LIBOR Transition Assesment using Pendo’s LIBOR Fallback Engine

The Problem:

LIBOR is the reference rate directly or indirectly for millions of contracts worth more than $400+ trillion, ranging from mortgage products to derivatives. The upcoming cessation of LIBOR poses a serious challenge for banks, investment firms and asset management companies.

One of the first steps financial institutions must undertake to prepare for LIBOR transition is to identify all the contracts which reference LIBOR and review the fallback language highlighting those which contain inadequate fallback provisions

These LIBOR fall back terms are buried in unstructured documents that are not-machine readable thereby impeding the companies’ ability to identify, prioritize and execute necessary actions to manage this transition with the needed transparency, speed and confidence to minimize operational risk and expense

How can Pendo help?

The Pendo LIBOR Fallback engine is a unique, automated capability that quickly and accurately digitizes unstructured source documents and surfaces key LIBOR fallback terms & conditions. Our fallback engine uses proprietary, domain specific language that enables engine to navigate thousands of different LIBOR contract types using a set of custom-designed utilities and functions that make rules & scripts more flexible and accurate

Why Pendo?

The Pendo systems platform has been deployed for multiple LIBOR-specific engagements and has digitized over 70 different contract types including complex derivate products that included LIBOR Fallback language, helping consultants, law firms and financial institutions prepare for the LIBOR Transition process

Contact Us

Any questions? Get in touch with us