Filter (clear filters)





Overview for linkedin

How and why we moved away from Kafka Mirror Maker to Brooklin- LinkedIn's story

See how Linkedin is using Brooklin Mirror Maker (BMM) to provide improved performance and stability at the same facilitating better management through finer control of data pipelines.


People You May Know: Fast Recommendations Over Massive Data

This discussion presents the evolution “People You May Know” (PYMK) to its current architecture. The focus is on various systems built along the way, with an emphasis on systems built for LinkedIn most recent architecture, namely Gaia, a real-time graph computing capability, and Venice an online feature store with scoring capability, and how LinkedIn integrates these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. 


More Data, More Problems: Scaling Kafka-Mirroring Pipelines at LinkedIn

LinkedIn has been using Kafka MirrorMaker for several years as the mirroring solution for copying data between Kafka clusters across data centers. That said, LinkedIn data has continued to grow, mirroring trillions of Kafka messages per day across data centers uncovered the scale limitations and operability challenges of Kafka MirrorMaker. To address such issues, LinkedIn has developed a new mirroring solution, built on top of their stream ingestion service, Brooklin. Brooklin MirrorMaker aims to provide improved performance and stability, while facilitating better management through finer control of data pipelines. In this talk you will learn the challenges LinkedIn has faced with Kafka MirrorMaker, how they tackled them with Brooklin MirrorMaker and their plans for iterating further on this new mirroring solution.


Metrics-Driven Tuning of Apache Spark at Scale

Tuning Apache Spark can be complex and difficult, since there are many different configuration parameters and metrics. As the Spark applications running on LinkedIn’s clusters become more diverse and numerous, it is no longer feasible for a small team of Spark experts to help individual users debug and tune their Spark applications. Users need to be able to get advice quickly and iterate on their development, and any problems need to be caught promptly to keep the cluster healthy.

In order to achieve this, LinkedIn automated the process of identifying performance issues and providing custom tuning advice to users, and made improvements for scaling to handle thousands of Spark applications per day.