Neville Li, from Spotify, talks about their story of migrating their big data infrastructure to Apache Beam. Over the past year they have moved to Scio and were able to iterate at a much faster speed. This talk focuses on the technical aspect of Scio, a Scala API for Apache Beam and how it changed the way Spotify processes data.
A discussion on how Spotify uses Google BigQuery and Scio (a scala wrapper for Apache Beam/Cloud DataFlow) to power their data infrastructure. Scio integrates with BigQuery to stream results of a query, implements several join optimizations, has a REPL, and has several other advance features. The presentation also discusses a dataset diff tool for validating pipeline changes and ML models and Featran-a library for transforming features in a type safe way. It wraps up with a number of challenges that they've faced along the way.
Learn how and why Spotify has moved their data platform to the Google Cloud, replacing Hadoop/Hive with BigQuery, Kafka with Cloud Pub/Sub, and Storm/MapReduce with Dataflow.
Emilio Del Tessandoro from Spotify analyzes the use of Cassandra at Spotify.
A comparison of Apache streaming technologies, including Flume, NiFi, Gearpump, Apex, Kafka Streams, Spark Streaming, Storm, Flink, Samza, Ignite, and Beam.