Summary
Overview
Work History
Education
Skills
Timeline
Generic

Raga Preethi Potu

Plano,TX

Summary

Experienced in design and deployment of Enterprise Application Development, Web Applications, Client-Server Technologies, Web Programming using Java and Big data technologies. Experience in the Design, Development and Implementation of Data warehousing Technology and Data Analysis. Possess comprehensive experience as a Data Engineer, Hadoop, Big Data & Analytics Developer. Analyzed credit data and financial statements to determine the lenders. Expertise on Hadoop architecture and ecosystem such as HDFS, MapReduce, Pig, Hive, Sqoop Flume. Complete Understanding on Hadoop daemons such as Job Tracker, Task Tracker, Name Node, Data Node and MRV1 and YARN architecture. Experience in installation, configuration, management, supporting and monitoring Hadoop cluster using various distributions such as Apache Hadoop, Cloudera Hortonworks, and various cloud services like AWS, GCP. Experience in Installation and Configuring Hadoop Stack elements MapReduce, HDFS, Hive, Pig, Sqoop, Flume, Oozie and Zookeeper. Expertise in writing custom Kafka consumer code and modifying existing producer code in Python to push data to Spark-streaming jobs. Ample knowledge on Apache Kafka, Apache Storm to build data platforms, pipelines, and storage systems; and search technologies such as Elastic search. Experience new features implemented by Azure to reproduce and troubleshoot Azure end-user issues and provide solutions to mitigate the issue. Knowledge in automated deployments leveraging Azure Resource Manager Templates, DevOps and Git repository for Automation and usage of Continuous Integration (CI/CD). Experienced in data processing and analysis using Spark, HiveQL, and SQL. Responsive expert experienced in monitoring database performance, troubleshooting issues and optimizing database environment. Possesses strong analytical skills, excellent problem-solving abilities, and deep understanding of database technologies and systems. Equally confident working independently and collaboratively as needed and utilizing excellent communication skills.

Overview

6
6
years of professional experience

Work History

Data Engineer

Eficens Systems LLC
02.2022 - Current
  • Involved in project life cycle including design, development, and implementation of verifying data received in data lake
  • Conducting data analysis and providing actionable insights through Tableau dashboards
  • Collaborating with stakeholders to define and understand their data visualization requirements
  • Optimizing Tableau workbooks and dashboards for improved performance and responsiveness
  • Configuring and optimizing EC2 instances to meet performance and scalability requirements
  • Developing and maintaining ETL workflows and processes using Snowflake and other related tools
  • Monitoring and optimizing performance of Snowflake data warehouse queries
  • Design and Develop ETL Processes in AWS Glue to migrate accidents data from external sources like S3, Text Files into AWS Redshift
  • Tuning and indexing tables to enhance query speed and overall system performance
  • Setting up role-based access controls (RBAC) and managing user permissions in Snowflake
  • Working closely with cross-functional teams, such as data scientists and analysts, to understand their data requirements and provide technical solutions
  • Designing and implementing data visualizations to effectively communicate insights and trends
  • Analyzed impact changes on existing ETL/ELT processes to ensure timely completion and availability of data in data warehouse for reporting use
  • Translated data access, transformation, and movement requirements into functional requirements and mapping designs
  • Developed, tested, and tuned performance of complex mappings, transforms, aggregations, joins, enrichment, validations for target data underpinnings
  • Used ranking functions and aggregation function in spark
  • Built real time streaming pipeline utilizing Kafka, Spark Streaming and Redshift
  • Developed logical and physical data flow models for Informatica ETL applications
  • Added support for AWS S3 and RDS to host static /media files and the database into amazon cloud
  • Worked on creation of customer Docker container images, tagging, and pushing of data images
  • Implemented and analyzed SQL query performance issues in databases
  • Responsible for design development of Spark SQL Scripts based on Functional Requirements and Environment: Snowflake, Tableau, Hadoop, Spark, Hive, Python, Kafka, AWS S3 Buckets, AWS Glue, NIFI, Postgress, Development toolkit (JIRA, Bitbucket/Git, Service now etc.,)

Data Engineer

Tecspirit
08.2020 - 01.2022
  • Implemented simple and complex spark jobs in python for Data Analysis across different data formats
  • Developed upgrade and downgrade scripts in SQL that filter corrupted and records with missing values along with identifying unique records based on different criteria
  • Implemented Azure Storage - Storage accounts, blob storage, and Azure SQL Server
  • Explored on the Azure storage accounts like Blob storage
  • Knowledge on the Azure DevOps and it process of creation of the tasks, pull requests, Git repositories
  • Experience in building, deploying, troubleshooting data extraction for huge amount of records using Azure Data Factory (ADF)
  • Fluency in Python with working knowledge of ML & Statistical libraries
  • Cleaned input text data using PySparkMachine learning feature exactions API
  • Used Pandas data frame for exploratory data analysis on sample dataset
  • Worked on Microsoft Azure services like HDInsight Clusters, BLOB, ADLS, Data Factory
  • Environment: Spark, Scala, Hadoop, Hive, Sqoop, Play framework, Apache Ranger.

Graduate Research assistant

UNCG
08.2019 - 07.2020
  • Implemented and used new tools like Globus API to transmit huge amounts of data with network security
  • Analyzed the research data by using the reporting tools like tableau
  • Optimized data collection procedures and generated reports accordingly
  • Used statistical techniques for hypothesis testing to validate data and interpretations
  • Involved in communications and design discussions with the client (gate city research network (gcrNet)).

Data Engineer Intern

Knowledge Matrix
05.2017 - 10.2017
  • Cryptographic algorithms are used in encryption of data and transferred with a safe key which involves data security and mobility
  • Applied SQL in querying, data extraction and data transformations
  • Relational databases like Oracle and SQL server gave good working experience
  • Using Dual controllers on various Business Projects for Dual Data Validation and Data consistency
  • Interacted with users, analyzing client processes, documenting the business requirements in the project
  • Good experience in identifying the root causes, Troubleshooting and submitting change controls.

Education

Master’s in computer science -

University of North Carolina
Greensboro, NC

Bachelor’s in computer science -

GITAM University
Hyderabad/India
06.2019

Skills

  • Python
  • Java
  • Scala
  • R
  • SQL
  • PL/SQL
  • UNIX
  • Linux
  • HBase
  • Spark-Redis
  • Cloudera Manager
  • Snowflake
  • Tableau
  • AWS
  • Jenkins
  • Airflow
  • Dagger
  • Postman
  • Workflows
  • Data Pipelines
  • Data analysis
  • Warehousing expertise
  • Staging tables
  • Structure designs

Timeline

Data Engineer

Eficens Systems LLC
02.2022 - Current

Data Engineer

Tecspirit
08.2020 - 01.2022

Graduate Research assistant

UNCG
08.2019 - 07.2020

Data Engineer Intern

Knowledge Matrix
05.2017 - 10.2017

Master’s in computer science -

University of North Carolina

Bachelor’s in computer science -

GITAM University
Raga Preethi Potu