Filter (clear filters)





Overview for social

How and why we moved away from Kafka Mirror Maker to Brooklin- LinkedIn's story

See how Linkedin is using Brooklin Mirror Maker (BMM) to provide improved performance and stability at the same facilitating better management through finer control of data pipelines.


People You May Know: Fast Recommendations Over Massive Data

This discussion presents the evolution “People You May Know” (PYMK) to its current architecture. The focus is on various systems built along the way, with an emphasis on systems built for LinkedIn most recent architecture, namely Gaia, a real-time graph computing capability, and Venice an online feature store with scoring capability, and how LinkedIn integrates these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. 


Social Media Influencers Detection, Analysis and Recommendation

Learn how Socialbakers used Databricks for innovative research and large-scale data engineering including ML and the challenges they faced while deploying Apache Spark from the scratch and onboarding the teams to their new platform.


Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

This talk discusses how Pinterest designed and built a continuous database (DB) ingestion system for moving MySQL data into near-real-time computation pipelines with only 15 minutes of latency to support their dynamic personalized recommendations and search indices. Pinterest is moving towards real-time computation, they are facing a stringent service-level agreement requirement such as making the MySQL data available on S3/Hadoop within 15 minutes, and serving the DB data incrementally in stream processing. The data team has designed WaterMill: a continuous DB ingestion system to listen for MySQL binlog changes, publish the MySQL changelogs as an Apache Kafka® change stream and ingest and compact the stream into Parquet columnar tables in S3/Hadoop within 15 minutes.