SPi Global

Data Solutions

SPi Global takes a holistic, consultancy-based approach to data solutions delivery. We work closely with you to understand your immediate and future needs, by gaining in-depth understanding of your workflows and processes, to identify potential scope for optimization.

We build solutions to meet your needs across data acquisition, management, enrichment, delivery and analytics. Bringing together technology components and expert intervention from our global team we deliver significant, tangible benefits across our customers’ organizations.

Technology

Ambitious to remain at the forefront of cutting-edge technology, SPi Global continuous to invest heavily in Robotic Process Automation (RPA) Natural Language Processing (NLP) and Artificial Intelligence (AI). These technologies form the backbone of our platform-driven approach.

Our flagship SPARK platform is not only powered by technologies leveraging automation and Machine Learning (ML) but also combines knowledge management, business intelligence and lean operations principles to deliver productivity and efficiency improvements across your enterprise.

We follow a flexible, scalable, agile and customer-centric engagement model allowing SPARK to be hosted on customers’ in-house environments, or by SPi.

SPARK

SPi’s flagship, intelligent, end-to-end technology platform, SPARK brings together modular AI-based components to address the diverse data management challenges faced by enterprises across multiple verticals.

SPARK is a one-stop solution, addressing data acquisition, data quality, and the transformation of analytics-ready data. The platform and its modules are especially powerful when processing unstructured data, an ongoing challenge for enterprises of all kinds.

The platform features three principal modules.

Extraction Module
Built on an ML engine, SPARK’s extraction module leverages cognitive models to achieve industry-leading accuracy in extracting unstructured data from PDFs, documents, maps, emails and images.

The extraction module also uses SPi’s next-generation web harvesting platform, powered by AI and NLP accelerators, to extract information and monitor websites for raw, standardized data.

  • Technology-assisted source acquisition for a more systematic and thorough approach to acquiring the right data.
  • Nimble approach allows a single script to extract similar data from multiple sites.
  • Data cleansing, structuring and noise removal at the acquisition stage save significant storage costs downstream.
  • Web page monitoring tracks and captures website changes as they happen.
  • Server-based PDF extraction and OCR reduces load on user devices.
  • Automated identification of potential issues such as font details, uncertain spacing and soft hyphens within the extracted file, reduces the need for human curation.
  • Table, image and diagram extraction included as standard.
  • Small sample sizes for training high levels of automation through ML.

Transformation Module
Leveraging a quality framework built on our custom AI and ML models, and algorithms customized for each industry, SPARK’s transformation module creates and maintains a master dataset from all data fed into it.

  • Proprietary knowledge repositories and key de-duplication and standardization suites allow SPARK’s transformation module to scrub reference data such as company and product names, geographies, phone numbers and subject matter expert (SME) terminologies.
  • SPARK’s multi-level ML-based model helps disambiguate and link people, company and other datasets to create a single source of truth.

Enrichment Module
Leveraging SME knowledge across multiple domains, SPARK’s Enrichment Module offers information providers and other enterprises a unique opportunity to leverage existing datasets to their full potential.

This module includes entity extraction, taxonomy and classification, summarization, metadata creation and more.

  • Customized rules and knowledge repositories for a wide range of domains including finance, legal, real estate, science, engineering, medicine, social sciences and humanities, and inter-disciplinary information.
  • Software and SMEs working together ensure scalability and accuracy, supporting stringent SLAs.
  • Keyword extraction, indexing and concept identification, along with SEO-optimized summaries.
  • In-built NLP modules facilitate concept extraction, editorial and readability checks.

SPi Labs

A unique conceptualization and development hub to foster innovation and experimentation, SPi Labs leverages next-generation technologies and open source capabilities. The facility brings together software engineers, SMEs, researchers, data analyst and technology specialists to expand SPi’s capabilities, accelerating the development of innovative solutions that enable organizations and society to make better use of big data and domain knowledge.

SPi Labs success stories include solutions developed for concept extraction and the automated business relevancy discovery in the health and science domain, and semantic analysis and NLP research to understand terminology in legal documents.

As part of this initiative SPi continues to invest in proprietary cognitive technology to target the complex problems commonly encountered in invoice processing and handwritten text recognition.

Services

It is key for businesses today to become increasingly data-driven, and eventually to create data economies, enabling data-driven insights to boost revenue, accelerate time to delivery and facilitate smarter business decision-making.

However, data which is fragmented, of low quality or not curated cannot be optimally leveraged in these ways. The sheer volume of dark and untapped data held by many businesses exacerbates the challenges involved in leveraging all available information to best advantage.

We provide customers with comprehensive, accurate data to enable improved search, navigation and analysis through our Data Management and Data Enrichment services, supported by our operations and technology capabilities.

SPi’s specialists and data experts enable global enterprises to optimize their information systems in multiple languages, with data cleansing, normalization, aggregation, and abstraction capabilities. We support leading database products across multiple industries, including science, medicine, engineering, legal, financial and business information, risk and compliance, media and entertainment.

Data Management

The essential first step to good data is identifying the right data. We apply proven technology and processes in our Source Discovery and Analysis, Data Acquisition, Cleansing and Normalization, De-duplication and Disambiguation, and Metadata Management services.

Data Enrichment

If data is oil, enrichment is the refinery. SPi has wide experience in enriching data across multiple industries, our enrichment services including Mapping and Linking, Entity Extraction (indexing), Summarization, Knowledge Modeling and Annotation.

Platforms & Operations

Database Design & Maintenance

Our content support team handles the design of content and data structures and schemas, as well as workflows to implement new capabilities on your hosting platform.

We have rich experience in a wide range of proprietary and open source CMS platforms. Along with our in-depth knowledge of data products, this expertise enables us to provide you with a one-stop solution to optimize all your data processes.

Content Migration

SPi has teams highly experienced in migrating content from legacy platforms to new ones, with a detailed understand of how schemas need to be set up for unstructured data. Our full suite solution includes requirements analysis, content structure mapping, ETL process management, ingestion QA and testing. We have migrated millions of records for leading information providers in the risk and compliance, health and science, and legal domains.

Platform Build and Deploy

With SPARK, our workflow management system, MAGNUS, and custom platforms, SPi builds, customizes and deploys various platforms to help optimize data workflows. All our platforms are built with APIs which can be customized for any CMS and deployed in the cloud or on-premise.

 

Analytics

Without context, data is meaningless. SPi’s analytics services assist companies in the creation of meaning and insight, driving action through the existing data in their workflows. We help organizations choose the right tools, integrate them with relevant functions, and train their users to understand the insights extracted and act accordingly.

SPi takes a customized approach to understand business problems and strategic goals, setting up systems for visualization, business analytics and insights, with constant feedback to maintain high levels of accuracy.

Visualization and Reporting

SPi’s team of data experts, including data engineers and data scientists, helps customers with visualization design and implementation, integration of visualization tools with existing workflows, and report building, scheduling and distribution.

Business Analytics

Focusing on an outcome-based, analytic approach, SPi helps companies use data to solve business problems. Using an iterative process in the creation of analytic models, we work with customer analytics, marketing analytics, product and sales analytics, operations and logistics analytics, and content analytics.

Case Studies

Improve Discoverability of Patents in Intellectual Property Database

As the world’s most comprehensive patent information database, the client serves researchers, students, engineers, designers, technicians, scientists, and research and development (R&D) professionals who need comprehensive information on patents filed in their field.

The client wanted a scalable and accurate solution that would create value-added abstracts for improved patent discoverability within their intellectual property database.

SOLUTION
SPi Global deployed ConSCIse™, its proprietary platform-enabled content abstracting solution for the client. ConSCIse blends proprietary software with Subject Matter Expert (SME) curation to deliver high-quality abstracts.

As part of this solution, SPi Global developed a ConSCiSe module for engineering with NLP and ML models, facilitating contextual translation from 16 languages with integrated translation memory tools.

To ensure an optimal QA layer, we employed SMEs from diverse engineering domains (computers to mechanical).

The scaled solution was designed to cope with managing data from over 31 countries and reduced TAT from 21 days to 5 days.

Scaling and Streamlining Journal Transfers

A leading STM publisher was seeking a mechanism to scale journal transfers within their portfolio. This was offered when authors had listed alternate journals at the time of submission, but only a small percent of authors followed up on transfers. The use of multiple systems also created a challenge in building a system-independent workflow.

SOLUTION
To support the client, SPi Global set up an initial set of pilot journals on the Transfer Desk application, then built a custom journal recommendation model based on AI that uses the content in the chapter combined with business rules to recommend potential transfer journals.

A function established within the team to facilitate author follow-through, by managing the transfers once an author has accepted a transfer.

SPi Global launched this system in 12 weeks and went live with 10 journals. The project scaled to the current management of 200+ journals.

Deliver high-quality business information on time

A leading global publisher, featuring over 100K publications across various content types including News, Business, Finance, and Legal.

This publisher wanted to publish new content quickly without sacrificing quality. It was important to monitor and ensure that all publications coming into their collection and conversion system were updated with high-quality content and on time.

SOLUTION
SPi Global had end-to-end responsibility for this offer — from securing the content to posting it on the internet.
Additionally, we owned the data analysis, design, and creation of conversion programs for new products included into the client’s online platform. Monitoring support was run 24/7 for troubleshooting conversion, display, and functionality issues for online products under maintenance.

We addressed content licensing compliance issues and work with source providers to resolve relationship issues, while also maintaining direct contact with client’s licensing unit and publishers.

Finally, we worked with an IT vendor to transition additional content management tasks previously part of the enterprise platform.

Monitor drug pricing data for information changes

An integrated drug database with the most current, accurate, and technologically advanced drug data and drug decision support wanted to ensure the accuracy of their information.

A typical user relies on the client’s drug database to understand how much a drug costs across various states in the US. Therefore, having the most current drug pricing is essential for the client and their customers.

SOLUTION
SPi Global delivered an all-encompassing automated website monitoring, data extraction, and time-saving mapping tool. The solution involved development of customized workflow applications leveraging ACQUIRE and SPiZone.

This offer actively monitored changes in websites owned by state authorities for State Maximum Allowable Costs (SMAC) and Average Actual Cost (AAC) of drugs, then downloaded the latest pricing data. Additionally, we extracted drug and price information from downloaded PDFs using entity recognition models.

With this tool, we achieved a high level of automated transformation and load process with matching algorithms that mapped the drug names extracted from the PDFs to drug databases owned by the client.

Finally, to optimize client utility and savings, we developed a monitoring schedule based on predictive modelling that ensures data update frequency without burning a hole into infrastructure costs.

Vehicle accident reporting automation saves time and money

The client has a leading platform that streamlines and secures the entire accident reporting process from data capture, storage, and access to analysis and distribution. The platform primarily serves law enforcement agencies, individuals, insurers, and authorized parties.

The client was looking to reduce costs by optimizing the current process through automation without sacrificing high data quality.

SOLUTION

SPi Global worked to create an offer that captured critical data elements from police auto-accident report images – sourced from 64 state agencies, each containing 100 to 400 data fields – and optimized the reporting workflow.

By redesigning the traditional double keying method with a 2-phase iterative automation approach, we were able to deliver a functional technology solution to manage the variable workload. Our approach included:

  • Phase 1: Implementation of an auto-extraction service powered by OCR+ business rules, then design an intuitive UI layer with a workflow, dashboard, and reporting mechanism to create a golden set for training
  • Phase 2: Building on Phase 1 automation, increasing efficiency and accuracy of both extraction and QC through state-specific ML models

Scaling trademark and brand protection and search

A leading provider of brand protection services to IP firms, brand managers, and legal firms needed a scalable and cost-effective solution to enable seamless discovery and management of customer searches

SOLUTION

Combining technology and SME knowledge, SPi Global created a tailor-made trademark database watch and management solution to deliver efficient and effective brand monitoring.

The offer utilized the latest computer vision technology to index and annotate the trademarks using Vienna code classifications, and created a knowledge repository of frequently requested trademarks for effortless and quick processing of trademark monitoring. It also included an intelligent workflow management module to direct routing based on SME specialization.

The system monitored nearly 400,000 trademarks annually for potential infringements with 99.985% accuracy, and progressively improved turnaround time from 14 to 3 days.

Pendo Systems

UNLOCK INSIGHTS & VALUE FROM UNSTRUCTURED DATA

What do we do

The Pendo Systems data platform enables organizations to quickly turn unstructured documents into structured, addressable data at machine scale

Built specifically for financial services, the platform combines a set of proprietary algorithms with repeatable, controlled analysis of documents that enables classification of unstructured data and unlocks the insights trapped in millions of mission critical documents

The Pendo Systems Platform has processed over 250 Million unstructured documents to locate and extract the key terms from contracts. It is evidence-based, provides full data lineage and has been validated time and again for a variety of use cases that have included critical, regulatory challenges specifically, Matters Requiring Attention

How do we do it?

Parse-Documents

Rapidly parse a wide variety of document types and instantly search and group them

Review-and-Query

Review your documents quickly and use pre-trained NLP and a query language

Label-and-train

Iterate-&-Learn

Constantly iterate, refine and improve your data prior to publishing

Managing LIBOR Transition Assesment using Pendo’s LIBOR Fallback Engine

The Problem:

LIBOR is the reference rate directly or indirectly for millions of contracts worth more than $400+ trillion, ranging from mortgage products to derivatives. The upcoming cessation of LIBOR poses a serious challenge for banks, investment firms and asset management companies.

One of the first steps financial institutions must undertake to prepare for LIBOR transition is to identify all the contracts which reference LIBOR and review the fallback language highlighting those which contain inadequate fallback provisions

These LIBOR fall back terms are buried in unstructured documents that are not-machine readable thereby impeding the companies’ ability to identify, prioritize and execute necessary actions to manage this transition with the needed transparency, speed and confidence to minimize operational risk and expense

How can Pendo help?

The Pendo LIBOR Fallback engine is a unique, automated capability that quickly and accurately digitizes unstructured source documents and surfaces key LIBOR fallback terms & conditions. Our fallback engine uses proprietary, domain specific language that enables engine to navigate thousands of different LIBOR contract types using a set of custom-designed utilities and functions that make rules & scripts more flexible and accurate

Why Pendo?

The Pendo Systems platform has been deployed for multiple LIBOR-specific engagements and has digitized over 70 different contract types including complex derivate products that included LIBOR Fallback language, helping consultants, law firms and financial institutions prepare for the LIBOR Transition process

Contact Us

Any questions? Get in touch with us