Key Responsibilities
- Build and maintain cloud-based data lakes and warehouses following best practices.
- Collaborate with the development team to create and deliver back-end data pipeline components in line with industry standards and architectural guidelines.
- Design, develop, and maintain efficient ETL/ELT data pipelines from various internal and external sources.
- Gather business requirements to design data models that ensure data quality, integrity, and performance.
- Conduct comprehensive testing and validation of data pipelines to ensure accuracy and consistency.
- Work closely with data scientists and analysts to understand their needs and develop appropriate solutions.
- Troubleshoot data-related issues, perform root cause analysis, and implement preventive measures.
- Document architectures, data dictionaries, data mappings, and other technical information clearly.
Experience
- Experience in a Data Engineering role using SQL and Python.
- Strong understanding of data lake and data warehouse design principles.
- Hands-on experience with cloud-based ETL services (e.g., AWS, EMR, Airflow, Redshift, Glue).
- Experience in deploying and managing MLOps frameworks (e.g., AWS SageMaker).
- Familiarity with distributed computing systems (e.g., Spark, Hive, Hadoop).
- Experience with databases such as Postgres, MySQL, and Oracle.
- Strong communication skills in English, both spoken and written.
Desirable:
- Experience with other cloud platforms and hybrid cloud infrastructures (e.g., GCP, Azure).
- Understanding of Machine Learning and Deep Learning concepts.
- Proficiency in real-time and near real-time data streaming technologies (e.g., Kafka, Spark Streaming, Pub/Sub).