Radek Maciaszek presents his learnings from the migration of machine learning and big data processing pipelines to Apache Airflow.
He highlights how they use Airflow to power their company big data infrastructure where they analyze hundreds of terabytes of data. Examples will cover the building of the ETL pipeline and use of Airflow to manage the machine learning Spark pipeline workflow.
The talk covers the basic Airflow concepts and show real-life examples of how to define your own workflows in the Python code. It finishes with more advanced topics related to Apache Airflow, such as adding custom task operators, sensors and plugins as well as best practices and both the pros and cons of this tool.
As Lyft migrated its applications to Kubernetes, assumptions baked into the networking layer were tested. This talk discusses how Lyft used Envoy’s xDS protocol to design their own flexible service mesh and handle new challenges from a multi-cluster architecture such as:
- Routing across multiple Kubernetes clusters
- Handling Deployments
- Rapid scale-in and scale-out
- Service Discovery
- Active/Passive Health Checking
- Readiness in the service mesh
This talk will also go over changes that were made in the Envoy codebase to make this work.
This talk highlights the the evolution of Redis support in Envoy. Initially Envoy redis proxy only supported sharding to clusters of independent Redis nodes. Recent developments have enabled support for the open source Redis Cluster protocol as well as some unique features such as multicluster routing, flexible load balancing options, and traffic shadowing.
As the usage of Redis expanded different usage patterns emerged, requiring different availability, durability and consistency trade-offs. Henry Yang from Lyft and Mitch Sulaski from Workday discuss how the Envoy redis proxy was extended to support these new requirements in large scale environment(10+ Millions rps) at Lyft and Workday.
Lorenzo Rossi of City of Hope National Medical Center discusses supervised learning from the Electronic Health Records, covering cohort definition, data preparation and performance metrics.
With Kafka on the way to production/Kafka in produktion_ausblick at BMW
Learn how GoDataDriven is using Apache Airflow in best-practices.