Filter (clear filters)





Overview for Spark + AI Summit Europe 2018

Migrating from RDBMS Data Warehouses to Apache Spark

Learn how DBS Bank implemented a Spark-based application which helps during the migration process from traditional RDBMS to BigData. The application embeds the Spark engine and offers a web UI to allow users to create, run, test and deploy jobs interactively. Jobs are primarily written in native SparkSQL, or other flavours of SQL (i.e. TDSQL).


Social Media Influencers Detection, Analysis and Recommendation

Learn how Socialbakers used Databricks for innovative research and large-scale data engineering including ML and the challenges they faced while deploying Apache Spark from the scratch and onboarding the teams to their new platform.


Apache Spark Based Reliable Data Ingestion in Datalake

In this talk Gagan Agrawal from Paytm talks about how they leveraged Sparks Dataframe abstraction for creating generic ingestion platform capable of ingesting data from varied sources with reliability, consistency, auto schema evolution and transformations support. He highlights how they developed spark based data sanity as one of the core components of this platform to ensure 100% correctness of ingested data and auto-recovery in case of inconsistencies found. This talk also focuses on how Hive table creation and schema modification was part of this platform and provided read time consistencies without locking while Spark Ingestion jobs were writing on the same Hive tables and how Paytm maintained different versions of ingested data to do any rollback if required and also allow users of this ingested data to go back in time and read snapshot of ingested data at that moment.


Lessons Learned Developing and Managing High Volume Apache Spark Pipelines in Production

Quby is the creator and provider of Toon, a leading European smart home platform and as a data driven company, Quby uses machine learning algorithms to generate actionable insights for it's end users.They developed data driven services to ensure that users do not needlessly waste energy and can receive real-time alerts about problems with their heating system. In this talk Erni Durdevic a Machine Learning Engineer at Quby describes their journey of productionizing data science algorithms.


Attribution Done Right

Thiago Rigo a software engineer with GetYourGuide takes you through how GetYourGuide developed a solution that cleans and structures logs from different data sources, applies rules to deal with channel assignment, and finally properly weights each channel’s contribution to total revenue generated. The business and technical challenges solved and how the solution was implemented at GetYourGuide using Spark and Databricks.


FlowSpec—Apache Spark Pipelines in Production

This talk is about the use of Spark pipelines in Danske bank. The data scientists in the organization use spark pipelines as tools to create uniformity in the features they generate and streamline the modelling process.  Subramaniam Ramasubramanian a software engineer with Danske bank focuses on how a simple prototype tool, FlowSpec, which took a couple of weeks to develop, helped reduce time to market for models, ensure data quality, created fair and clear separation of duties and offers a consolidated solution to recurrent problem scenarios in the arduous process of moving ml models from different teams and departments in a large organization to production.