SHOW

Filter (clear filters)

Domains

Companies

Technologies

Functions


Overview for pinterest

Control Plane for Large Mesh in a Heterogeneous Environment - Fuyuan Bie & Zhimeng Shi, Pinterest

Building service mesh in a heterogeneous environment of a large number of clusters is challenging. At Pinterest,  they have a complicated mixture of thousands of clusters ranging from IaaS to dockerized services to kubernetes; They are developed with C++/Java/Python/Node/Go/Elixir.Using open source go control plane as the interface to Envoy, the data engineering team at Pinterest meshed Pinterest services with a control plane namely tower they developed. From edge to backends, 100% services are managed by Tower. They use actor model and event sourcing to make it performant, reliable, scalable and extensible.

Links


Governance on K8s: How to Solve Ownership, Metering & Capacity Planning - Micheal Benedict & Yongwen Xu, Pinterest

Pinterest is a cloud first visual discovery engine that serves over 250MM users. To support this scale, there are thousands of services running on tens of thousands of hosts, processing 300+PB of data. Pinterest operates large kubernetes clusters across several availability zones, across regions. The cluster is auto scaled with support for pod level auto-scaling. Finally,to effectively utilize resources within the clusters, Pinterest operates heterogeneous workloads on a kitchen sink of instance types. 

Links



Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

This talk discusses how Pinterest designed and built a continuous database (DB) ingestion system for moving MySQL data into near-real-time computation pipelines with only 15 minutes of latency to support their dynamic personalized recommendations and search indices. Pinterest is moving towards real-time computation, they are facing a stringent service-level agreement requirement such as making the MySQL data available on S3/Hadoop within 15 minutes, and serving the DB data incrementally in stream processing. The data team has designed WaterMill: a continuous DB ingestion system to listen for MySQL binlog changes, publish the MySQL changelogs as an Apache Kafka® change stream and ingest and compact the stream into Parquet columnar tables in S3/Hadoop within 15 minutes. 

Links


Image Similarity Detection at Scale Using LSH and Tensorflow

Learning over images and understanding the quality of content play an important role at Pinterest. This talk presents a Spark based system responsible for detecting near (and far) duplicate images. The system is used to improve the accuracy of recommendations and search results across a number of production surfaces at Pinterest.

Links


Moving the needle of the pin: Streaming hundreds of terabytes of pins from MySQL to S3/Hadoop continuously

Learn how Pinterest solved the problem of moving hundreds of terabytes of MySQL data offline on a daily basis to power continuous computation.

Links