Latest entries

Running Massively Parallel Deep-Learning Inference Pipelines On Kubernetes Martin Abeleda & Suneeta Mall, Nearmap

Nearmap captures terabytes of aerial imagery daily. With the introduction of artificial intelligence (AI) capabilities, Nearmap has leveraged Kubernetes to generate AI content based on tens of petabytes of images effectively and efficiently. Martin Abeleda and Suneeta Mall from Nearmap discuss how using Kubernetes as the backbone of their AI infrastructure, allowed them build a fully automated deep-learning inferential pipeline that despite not being embarrassingly parallel is actually massively parallel. This talk also explains the architecture of this auto-scalable solution that has exhausted all K80 spot GPUs across all US data centres of AWS for weeks. 

Links



MetaConfig driven FeatureStore with Feature compute & Serving Platform powering Machine Learning @MakeMyTrip

MakeMyTrip is India’s #1 online travel platform having more than 70% of the traffic from mobile apps embarked on a journey to revolutionize its customer experience by building a scalable, personalized, machine learning based platform which powers onboarding, in-funnel and post-funnel engagement flows, such as ranking, dynamic pricing, persuasions, cross-sell and propensity models.

Links


Best Practices for Prototyping Machine Learning Models for Healthcare

Lorenzo Rossi of City of Hope National Medical Center discusses supervised learning from the Electronic Health Records, covering cohort definition, data preparation and performance metrics. 

Links


The Benefits of Running Spark on your own Docker

In this talk Shir Bromberg a Big Data team leader at Yotpo,discusses their open-source dockers for running Spark on Nomad servers. She highlights the following; 
* The issues they  had running spark on managed clusters and the solutions developed.
* How to build a spark docker.
* What to achieve by using Spark on Nomad.

Links


Optimizing Spark-based data pipelines - are you up for it?

Nielsen Marketing Cloud needs to ingest billions of events per day into their big data stores for their real time analytics. Etti Gur  the Senior Big Data developer and Itai Yaffe Tech Lead, Big Data group discuss how they significantly optimized Spark-based in-flight analytics daily pipeline, reducing its total execution time from over 20 hours down to 2 hours, resulting in a huge cost reduction.
Links