Experience Of Optimizing Spark SQL When Migrating from MPP Database

Learn how eBay is migrating its 30 PB MPP database to Apache Spark. Currently, 15000+ ETL jobs have been running on a 1000+ nodes Spark cluster each day, processing PB scale data and these numbers are increasing quickly. Optimization is critical during the migration, because the cluster resource is usually very stressful, well-optimized system can hold more jobs in the limited resource. Yuming Wang and Yucai Yu both from eBay talk about the top performance challenges eBay encountered and how these problems were addressed. 


Analytical DBMS to Apache Spark Auto Migration Framework

eBay has been using Analytical DBMS (ADBMS) data warehouse solution for over a decade, there are millions of batch queries running every day against 6000+ key DW tables, which contains over 22PB data (compressed) and still keeps booming every year. Based upon that, data services and products enables eBay business decisions and site features, so it has to be always available and accurate.

Lipeng Zhu and Edward Zhang from Ebay discuss how eBay has been working on migrating ADBMS batch workload to Spark. 


Moving eBay’s Data Warehouse Over to Apache Spark – Spark as Core ETL Platform at eBay

Learn how eBay moved their ETL computation from conventional RDBMS environment over to Spark. This was a journey which led to an implementation of a 1000+ node Spark Cluster running 10,000+ ETL jobs daily, all done in a span of less than 6 months, by a team with limited Spark experience.


eBay ShopBot: Graph-powered Conversational Commerce

Ajinkya Kale of eBay discusses their use of Neo4j as a backend to the AI technology in eBay's virtual shopping assistant: eBay ShopBot. The team discusses how they used Neo4j as a probabilistic graph model to drive conversations based upon their Knowledge Graph. They also touch upon the key learnings for deployment and scalability in Google Cloud Platform, and touch upon the application oriented learnings of using Neo4j for a year in production. 


Role of Spark in transforming eBay’s Enterprise Data Platform

eBay has one of the most mature Enterprise Data Platform’s in the industry with over 200PBs of data stored in Hadoop and Teradata Warehouses. On average 30 TB of transactional and behavioral data is extracted on a daily basis and thousands of metrics are computed, analyzed and monitored for decision making and detecting anomalies. eBay has embarked on an ambitious project to transform the batch oriented ETL processes which could take 24 to 48 hour for metric computation to near real time infrastructure based on Kafka for messaging, Spark Streaming for stream processing and Spark SQL for data preparation.