Filter (clear filters)





Overview for other

GE Aviation Spark Application – Experience Porting Analytics into PySpark ML Pipelines

GE is a renowned world leader manufacturer of commercial jet engines, offering products for many of the best-selling commercial airframes. With more than 33,000 engines in service, GE Aviation has a history of developing analytics for monitoring its commercial engines fleets. In this talk you will learn how analytic tools such as SQL Server and MATLAB were used until recently, when GE’s data was moved to an Apache Spark environment. Consequently, GE Aviation's advanced analytics are now being migrated to Spark, where there should also be performance gains with bigger data sets. Dr Peter Knight a senior data scientist with the GE Aviation UK data science team and Honor Powrie a Director of Data and Analytics at GE Aviation share their experiences of converting advanced algorithms to custom Spark ML pipelines, as well as outlining various case studies.


Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Love to Scale

Luca Canali an engineer and team lead at the CERN Hadoop, Spark and database services at CERN shares the experience and lessons learned on setting up and running the Apache Spark service inside the database group at CERN. He covers the many aspects of this change with examples taken from use cases and projects at the CERN Hadoop, Spark, streaming and database services.


From “All-at-Once, Once-a-Day” to “A-Little-Each-Time, All-the-Time”

OLX produces about 50 millions messages daily to be delivered to 300+ millions users across the globe; via email, sms or push. The majority of these notifications relies on the processing of the billions of events generated by their web and mobile platforms to understand the users behaviour and to craft relevant messages designed to influence the customer journey positively.

In this presentation Emanuele Bardelli discusses the approach, challenges and learnings of migrating OLX's notification platform from a monolithic, batch system based on AWS Redshift, SQL and ETL pipelines to a micro-service, real-time system developed with Apache Spark and Python.