Parth Shrivastava

Senior Data Engineer

About

Highly accomplished Senior Data Engineer with over 4 years of expertise in designing and optimizing robust, scalable data pipelines within cloud-native environments. Proven track record in developing real-time and batch ETL frameworks using PySpark, SQL, and Delta Lake on Azure Databricks and AWS, consistently reducing execution times by up to 60% and processing 1TB of data daily. Adept at leveraging advanced data engineering techniques to enhance performance, optimize costs, and deliver high-quality data solutions for complex analytical and business intelligence needs.

Work Experience

Software Engineer (Data Engineering)

Mediaocean

Jul 2022 - Present

Pune, Maharashtra, IN

Designed and optimized end-to-end ETL pipelines across cloud environments, ensuring efficient data processing and significant cost reduction.

  • Designed and optimized end-to-end ETL pipelines on Databricks using PySpark and Delta Lake, processing 1TB of records daily across Azure Data Lake Storage (ADLS) and AWS S3.
  • Optimized complex SQL queries and Spark jobs through partitioning, broadcast joins, and skew mitigation, cutting execution times by up to 60%.
  • Optimized storage and compute costs by implementing partitioning, bucketing, and Z-Ordering in Databricks, applying lifecycle policies in AWS S3 and Azure Blob Storage.
  • Collaborated with cross-functional teams to migrate critical datasets from on-prem to hybrid cloud environments (Azure), ensuring compliance with data governance and security standards.
  • Partnered with business stakeholders, analysts, and data scientists to translate reporting requirements into efficient data models and pipelines.

Software Engineering Intern

Connection Loops

May 2021 - May 2022

Contributed to the development of AI-powered solutions for biomedical signal processing, focusing on arrhythmia detection and classification.

  • Pre-processed and segmented over 50K ECG signals using NumPy and SciPy for arrhythmia detection, enhancing data readiness for analysis.
  • Built advanced deep learning models (1D CNN, Temporal Convolutional Networks) achieving a 94% F1-score on arrhythmia classification.
  • Deployed batch scoring pipelines to automate arrhythmia detection on new ECG data, improving diagnostic efficiency.

Education

Computer Engineering

Savitribai Phule Pune University

9.57/10 CGPA

May 2018 - May 2022

Pune, Maharashtra, IN

Courses

  • Advanced Data Structures
  • Embedded Systems and IoT
  • Artificial Intelligence and Robotics

Projects

AI-Powered Job Aggregator & Resume Tailoring Platform

Mar 2025 - May 2025

Developed an end-to-end data pipeline for a job aggregator and resume tailoring platform, leveraging NLP and cloud technologies to provide real-time job discovery and resume optimization.

Awards

Rising Star Award

Mediaocean Pvt.Ltd

Jan 2023

Recognized for outstanding performance and significant contributions at Mediaocean Pvt.Ltd.

All India Rank 12, Robocon 2019

IIT Delhi

Jan 2019

Achieved 12th rank in the national robotics competition Robocon 2019, representing Team Automatons at IIT Delhi with two robots.

Publications

Optimization of Multi Wavelength Drone Images Using Geo Reference Model

India - Intellectual Property

Jan 2023

Secured a patent for an innovative method to optimize multi-wavelength drone images using a geo-reference model.

Disruptive Developments in Biomedical applications

Taylor Francis

Jan 2023

Authored a chapter in the book 'Disruptive Developments in Biomedical applications', contributing to advanced research in the field.

Skills

Programming Languages

  • Python
  • SQL
  • Java

Data Frameworks & Tools

  • Databricks
  • Spark (PySpark)
  • Pandas
  • Delta Lake
  • Apache Airflow

Cloud & Infrastructure

  • AWS
  • Azure (Data Lake Gen2, Event Hubs, Blob Storage)
  • AWS S3
  • Azure Blob Storage

Operating Systems

  • Windows
  • Linux

Data & Analytics

  • ETL Pipelines
  • Data Lineage
  • Data Governance
  • Data Quality
  • Machine Learning
  • Automation
  • Data Testing

DevOps & CI/CD

  • Git
  • Docker
  • Jenkins
  • CI/CD Pipelines