Select Page

Case Study

Designed and Developed Data Lake Platform using AWS S3 and Apache Spark

Client Background

A Big Data Platform that centralizes different sources of data from payers, providers, clinicians etc.

Client Need

  • Acquired more than 20 companies over the last decade. Multiple products with same use case running in silo environment and data being replicated in multiple location
  • Required a centralized data storage required for data consumption/analysis
  • Accelerate business growth by helping build new products, insights and enable AI and Machine Learning capabilities

Our Solution

  • Designed and Developed Data Lake Platform using AWS S3 and Apache Spark to ingest and process millions of transactions from various data types like Claims, Payments, Eligibility, Clinical, Imaging, etc…
  • Developed Pipeline Development Kit to create data pipelines with ease to quickly onboard tenant into the platform.
  • Created Orchestration Development Kit using Apache Airflow for scheduling AWS EMR data pipelines
  • Developed generic data pipelines for extracting and storing data that can be used by end users to search and retrieve their respective healthcare transactions using Elastic Search

Key Benefits

  • Diverse and Ubiquitous Data amounting to 4 Peta Bytesof Cross Enterprise Financial, Operational, Clinical
  • Cost Savings of $400K annually by opting to utilize S3 intelligent storage tier options and creating object’s life cycle rules
  • Build Once – Use Multiple Framework – Operational Efficiency, Rapid Dev
  • Faster On-Boarding  – Reduced time to market, New Growth Opportunities
  • Foundation for Integrated Products, Linked Cross Functional Data and enablement of Machine Learning and AI
  • Authoritative source of Large Data Sets

Tools & Technologies

AWS
Amazon S3 Bucket
Spark
Apache Airflow
Kafka
Elasticsearch
PostgreSQL
Dev Ops
GotLab
HashiCorp Terraform
Docker

Services

Digital Product Engineering

Cloud Services

Data & Analytics

AI and Automation
Cybersecurity
Modern Managed Services

Build Operate Transfer

Innova Orion GCC Services

Talent Solutions

Industries

Communications & Media

Government Solutions

Healthcare, Life Sciences,
and Insurance

Banking & Financial Services

Energy, Oil & Gas and Utilities

Hi-Tech

Retail & CPG
Manufacturing

Travel & Transportation and Hospitality

Partnerships

AWS

Automation Anywhere

Databricks

Google

IBM

Microsoft

Pega

Salesforce

SAP
ServiceNow

Snowflake

Uipath

Innovation @ Work

Blogs and Insights

Research and Whitepapers

Case Studies

Podcasts

Webinars & Tech Talks
US Employment Reports

Company

About Us

Leadership Team

Strategic Partnerships

Office Locations

Newsroom

Events

ESG

The Innova Foundation

Careers

Explore Open Positions

Life @ Innova Solutions

Candidate Resource Library

Let's Connect