Latest entries

Petastorm: A Light-Weight Approach to Building ML Pipelines @Uber

Data produced and managed by Big Data systems like Apache Spark and Hive cannot be directly consumed by Deep Learning systems like Tensorflow and PyTorch. Petastorm bridges this gap by enabling direct consumption of data in Apache Parqet format into Tensorflow and PyTorch. In this talk, Yevgeni Litvin a senior software engineer with Perception team at Uber Advanced Technology Group (ATG) describes how Petastorm facilitates tighter integration between Big Data and Deep Learning worlds; simplifies data management and data pipelines; and speeds up model experimentation.

Links


People You May Know: Fast Recommendations Over Massive Data

This discussion presents the evolution “People You May Know” (PYMK) to its current architecture. The focus is on various systems built along the way, with an emphasis on systems built for LinkedIn most recent architecture, namely Gaia, a real-time graph computing capability, and Venice an online feature store with scoring capability, and how LinkedIn integrates these individual systems to generate recommendations in a timely and agile manner, while still being cost-efficient. 

Links


Massive Scale Anomaly Detection Framework

PayPal analyzes billions of events every day in real-time across a wide range of services, devices and locations. In a collaboration between their Platform engineering team and data science teams, they have built a generic framework for developing robust and scalable anomaly detection streaming applications, focusing on flexibility to support different types of statistical and machine learning models. Inspired by the design of scikit-learn and Spark MLlib, the data team has designed a simple pipeline-based API on top of Spark Structured Streaming, that captures common patterns of the anomaly detection domain. 

Links


Applying Deep Learning To Airbnb Search

This discussion is about the use of machine learning at Airbnb. It's a success story about how machine learning helps Airbnb's search ranking to find guests the best possible options while rewarding the most deserving hosts. Ranking at Airbnb is a quest to understand the needs of the guests and the quality of the hosts to strike the best match possible. The talk discusses the work done in applying neural networks in an attempt to break out of that plateau and also focuses on the elements found useful in applying neural networks to a real life product. 

Links


How Verizon is Accelerating Cloud Adoption and Migration with the AWS Service Catalog Connector for ServiceNow

Learn how Verizon uses the AWS Service Catalog Connector for ServiceNow to create a robust self-service computing environment while meeting Verizon’s governance and security controls and achieving their goal of migrating 30% of their applications onto AWS in a short timeframe.

Links


Changing the Way the Intelligence Community Moves Data

Learn how NGA uses AWS Snowball Edge to support War Fighter, utilizing imagery from NGA’s Open Data Store and implementing geospatial applications on the edge. AWS Snowball Edge allows NGA to directly support its mission, providing products and services to decision makers, warfighters, and first responders when they need it most. Enabling the edge changes NGA’s ability to share critical resources, data to facilitate user access meets NGA’s mission needs, and support the IC and Department of Defense as a whole.

Links