SHOW

Filter (clear filters)

Domains

Companies

Technologies

Functions

Highlights


Overview for spark

Analyzing Movie Reviews using DataStax

In this talk, Amanda Moran, Technical Evangelist at DataStax uses sentiment analysis on Twitter data about the latest movie titles to answer that age old question: “Is that movie any good?” She explains how they built the solution using Apache Cassandra, Apache Spark and DataStax Enterprise Analytics.

Links


Migrating from RDBMS Data Warehouses to Apache Spark

Learn how DBS Bank implemented a Spark-based application which helps during the migration process from traditional RDBMS to BigData. The application embeds the Spark engine and offers a web UI to allow users to create, run, test and deploy jobs interactively. Jobs are primarily written in native SparkSQL, or other flavours of SQL (i.e. TDSQL).

Links


Social Media Influencers Detection, Analysis and Recommendation

Learn how Socialbakers used Databricks for innovative research and large-scale data engineering including ML and the challenges they faced while deploying Apache Spark from the scratch and onboarding the teams to their new platform.

Links


Apache Spark Based Reliable Data Ingestion in Datalake

In this talk Gagan Agrawal from Paytm talks about how they leveraged Sparks Dataframe abstraction for creating generic ingestion platform capable of ingesting data from varied sources with reliability, consistency, auto schema evolution and transformations support. He highlights how they developed spark based data sanity as one of the core components of this platform to ensure 100% correctness of ingested data and auto-recovery in case of inconsistencies found. This talk also focuses on how Hive table creation and schema modification was part of this platform and provided read time consistencies without locking while Spark Ingestion jobs were writing on the same Hive tables and how Paytm maintained different versions of ingested data to do any rollback if required and also allow users of this ingested data to go back in time and read snapshot of ingested data at that moment.

Links


Attribution Done Right

Thiago Rigo a software engineer with GetYourGuide takes you through how GetYourGuide developed a solution that cleans and structures logs from different data sources, applies rules to deal with channel assignment, and finally properly weights each channel’s contribution to total revenue generated. The business and technical challenges solved and how the solution was implemented at GetYourGuide using Spark and Databricks.

Links


FlowSpec—Apache Spark Pipelines in Production

This talk is about the use of Spark pipelines in Danske bank. The data scientists in the organization use spark pipelines as tools to create uniformity in the features they generate and streamline the modelling process.  Subramaniam Ramasubramanian a software engineer with Danske bank focuses on how a simple prototype tool, FlowSpec, which took a couple of weeks to develop, helped reduce time to market for models, ensure data quality, created fair and clear separation of duties and offers a consolidated solution to recurrent problem scenarios in the arduous process of moving ml models from different teams and departments in a large organization to production.

Links