Filter (clear filters)





Overview for spotify

Big Data Processing at Spotify: The Road to Scio

Neville Li, from Spotify, talks   about their story of migrating their big data infrastructure to Apache Beam. Over the past year they have moved to Scio and were able to iterate at a much faster speed. This talk focuses on the technical aspect of Scio, a Scala API for Apache Beam and how it changed the way Spotify processes data.


Sorry - How Bieber broke Google Cloud at Spotify

A discussion on how Spotify uses Google BigQuery and Scio (a scala wrapper for Apache Beam/Cloud DataFlow) to power their data infrastructure. Scio integrates with BigQuery to stream results of a query, implements several join optimizations, has a REPL, and has several other advance features. The presentation also discusses a dataset diff tool for validating pipeline changes and ML models and Featran-a library for transforming features in a type safe way. It wraps up with a number of challenges that they've faced along the way.


Spotify’s move to Google Cloud

Learn how and why Spotify has moved their data platform to the Google Cloud, replacing Hadoop/Hive with BigQuery, Kafka with Cloud Pub/Sub, and Storm/MapReduce with Dataflow.


An Overview of Apache Streaming Technologies

A comparison of Apache streaming technologies, including Flume, NiFi, Gearpump, Apex, Kafka Streams, Spark Streaming, Storm, Flink, Samza, Ignite, and Beam.