Virtual Tech Gurus
Description
Details: Sandra
- As a Senior Data Engineer, you will play a hands-on
role in designing, building, and operating high-performance batch and streaming
data platforms. You will: - Design, develop, and maintain large-scale batch and streaming pipelines using
PySpark and Python. - Build real-time and near real-time streaming applications with stateful
processing, windowing, and checkpointing. - Develop production-grade Python microservices for complex data transformations
and business logic. - Design and manage modern data lake architectures using Apache Iceberg on AWS
S3, implementing schema evolution, partitioning, compaction, and time travel. - Develop and deploy pipelines across AWS services including S3, EMR, Glue,
Lambda, Athena, Redshift, and Aurora. - Optimize Spark workloads for performance, scalability, and cost efficiency.
- Implement monitoring, logging, alerting, and recovery mechanisms for robust
production operations. - Contribute to CI/CD pipelines, participate in architecture discussions, and
uphold engineering best practices.
What You’ll Bring
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related
discipline. - Over 10+ years of experience in IT and strong hands-on expertise in PySpark,
Spark SQL, and distributed data processing. - Advanced proficiency in Python for building scalable, production-grade data
solutions and microservices. - Proven experience building and running Kafka-based streaming applications in
production environments. - Deep understanding of streaming fundamentals, including stateful processing and
fault tolerance. - Hands-on experience with Apache Iceberg in production data lake environments.
- Solid experience with AWS data services (S3, EMR, Glue, Lambda, Redshift,
Aurora). - Advanced SQL skills and strong knowledge of data modeling and modern data lake
architectures. - Strong troubleshooting skills in distributed data systems with a focus on
reliability and performance.
JOBID: 12321
