Overview
Work History
Education
Skills
Certification
Research Projects
Timeline
Generic

Sanjay Sureshkumar

Chicago

Overview

3
3
years of professional experience
1
1
Certification

Work History

Data Engineer

CCC Intelligent Solutions
Chicago
03.2023 - Current
  • Architected and optimized Amazon Redshift for BI reporting by implementing materialized views on datasets over 10 TB, using star-schema fact and dimension modeling via AWS Glue.
  • Leveraged Airflow to orchestrate weekly refresh workflows, while configuring Redshift query priorities, workload management (WLM), and access controls to reduce refresh time from 3 days to under 24 hours, enabling more up-to-date data delivery to Tableau dashboards.
  • Owned and managed 4+ scalable Airflow pipelines executing Spark, Kafka, and Hudi jobs on EMR clusters across multiple Airflow and Hadoop versions, while resolving compatibility issues, optimizing performance, and monitoring workflows via Kafdrop, CloudWatch Dashboards, and Spark History UI.
  • Designed and configured an end-to-end AWS ecosystem for data processing, including Hadoop clusters, job schedulers, metadata databases, data lakes, and disaster recovery solutions, while staying on top of cost control, network security, access management, and data resiliency in compliance with best practices and Infrastructure as Code (IaC) frameworks.

Data Management Intern

Swiss Re American holding Corporation
Fort Wayne
05.2022 - 10.2022
  • Excelled in first-time hands-on processing of 300 million+ row real industry datasets using PySpark DataFrames, RDDs, and Spark SQL.
  • Conducted comparative analysis of Pandas and PySpark on identical datasets to evaluate performance metrics, query execution plans, and suitability for different data processing scenarios.
  • Learned to optimize distributed data workflows under challenging resource constraints, thereby obtaining data insights.
  • Evaluated actual vs. expected values using multiple analytical techniques while gaining hands-on experience with the Palantir Foundry platform.

Education

Master of Science - Computer Science

Purdue University
Fort Wayne, IN

Bachelor of Engineering - Computer Science and Engineering

College of Engineering, Guindy (CEG), Anna University
Chennai, TN, India

Skills

Spark

Python

Pandas

SQL

Kafka

Airflow

Tableau

Aws cloud

Terraform

Ansible

R

Bash/shell scripting

Certification

  • AWS Certified Data Engineer - Associate
  • Data Analytics - Google Professional Career Certificate

Research Projects

RNA Analysis Pipeline for Retinal Disease Detection Using R and HPC

  • Utilized parallel processing frameworks in R (parallel, doParallel, foreach) to distribute a time-intensive RNA gene expression correlation algorithm across Purdue University’s Gilbreth GPU cluster, achieving a 90% reduction in runtime through parallel execution.
  • This significantly accelerated the diagnosis of Retinitis Pigmentosa (a retinal degenerative disease), aiding the research efforts of biomedical scientists.

Timeline

Data Engineer

CCC Intelligent Solutions
03.2023 - Current

Data Management Intern

Swiss Re American holding Corporation
05.2022 - 10.2022

Master of Science - Computer Science

Purdue University

Bachelor of Engineering - Computer Science and Engineering

College of Engineering, Guindy (CEG), Anna University
Sanjay Sureshkumar