Filter (clear filters)





Overview for pinterest

Pinterest’s Story of Streaming Hundreds of Terabytes of Pins from MySQL to S3/Hadoop Continuously

This talk discusses how Pinterest designed and built a continuous database (DB) ingestion system for moving MySQL data into near-real-time computation pipelines with only 15 minutes of latency to support their dynamic personalized recommendations and search indices. Pinterest is moving towards real-time computation, they are facing a stringent service-level agreement requirement such as making the MySQL data available on S3/Hadoop within 15 minutes, and serving the DB data incrementally in stream processing. The data team has designed WaterMill: a continuous DB ingestion system to listen for MySQL binlog changes, publish the MySQL changelogs as an Apache Kafka® change stream and ingest and compact the stream into Parquet columnar tables in S3/Hadoop within 15 minutes. 


Image Similarity Detection at Scale Using LSH and Tensorflow

Learning over images and understanding the quality of content play an important role at Pinterest. This talk presents a Spark based system responsible for detecting near (and far) duplicate images. The system is used to improve the accuracy of recommendations and search results across a number of production surfaces at Pinterest.


Moving the needle of the pin: Streaming hundreds of terabytes of pins from MySQL to S3/Hadoop continuously

Learn how Pinterest solved the problem of moving hundreds of terabytes of MySQL data offline on a daily basis to power continuous computation.