Description
• Weeks 1-2: Python and Rust for Data Engineering
– Goals: Master the use of Python for data processing tasks and introduction to Rust for performance-critical data engineering tasks.
– Milestone: Implement a data processing script in Python and a performance-critical module in Rust.
• Weeks 3-4: Data Ingestion with Apache Kafka and Airbyte
– Goals: Learn the fundamentals of streaming data architecture with Apache Kafka and utilize Airbyte for data ingestion from various sources.
– Milestone: Set up a streaming data pipeline with Kafka and configure Airbyte to ingest data into your pipeline.
• Weeks 5-6: Data Processing with Apache Spark
– Goals: Gain expertise in big data processing with Apache Spark, including RDDs, DataFrames, and Spark SQL for batch and real-time data processing.
– Milestone: Develop a data processing application using Spark to handle large datasets efficiently.
• Weeks 7-8: Data Storage and Caching with Redis and Mongo
– Goals: Learn about NoSQL databases with a focus on Redis for caching and MongoDB for document-oriented storage.
– Milestone: Implement a database solution that uses Redis for fast data retrieval and Mongo for storing processed data.
• Weeks 9-10: Scheduling and Orchestration with Airflow and Kubernetes
– Goals: Master workflow scheduling with Airflow and learn the basics of container orchestration with Kubernetes.
– Milestone: Deploy a data pipeline using Airflow for task scheduling and manage containers with Kubernetes for a scalable data engineering solution.
• Week 11: Containerization with Docker and Dashboarding with Grafana
– Goals: Learn to containerize applications with Docker for consistent deployment and use Grafana for creating interactive data visualizations.
– Milestone: Containerize your data processing application with Docker and create a dashboard in Grafana to visualize your data pipeline metrics.
• Week 12: Capstone Project and Extras
– Goals: Apply all learned skills to design and implement a comprehensive data engineering project. Explore additional tools and technologies such as Cassandra for wide-column storage, Talend for data integration, Collibra for data governance, and AWS services for cloud-based data engineering.
– Milestone: Present a scalable and efficient data engineering solution that incorporates ingestion, processing, storage, and visualization, showcasing the ability to work with both cloud-based and on-premise environments.
———
This curriculum is aimed at equipping students with the skills needed to build sophisticated data engineering pipelines that are scalable, efficient, and capable of handling complex data workflows. By covering a wide range of tools and technologies, students will be prepared to tackle various data engineering challenges and excel in the field.
Reviews
There are no reviews yet.