How Traveloka's Runs Cloud-Scale Apache Spark in Production Since 2017

Traveloka's Data Engineering and Data Science team shares how the staff submit their cloud-scale Spark jobs today. The discussion highlights pros/cons, integration of Apache Spark with CI/CD components, Schedulers, Airflow, Key Management Systems (KMS), templates. The journey starts at historic event of a self-managed Spark cluster on-premise, and talk through adoption of AWS EMR, Qubole, Databricks, and Dataproc. How multiple back-end data sets has helped transform Traveloka from meta-search engine to fully integrated On-Line Travel Booking agency, and one of top Indonesian Unicorn startups!


« AI in practice: how we help cure diseases using Big Data and AI - Chen Admati @ Intel (Hebrew) Best Practices for Streaming IoT Data to Apache Kafka® & Kafka at BMW »