Filter (clear filters)





Overview for cassandra

MetaConfig driven FeatureStore with Feature compute & Serving Platform powering Machine Learning @MakeMyTrip

MakeMyTrip is India’s #1 online travel platform having more than 70% of the traffic from mobile apps embarked on a journey to revolutionize its customer experience by building a scalable, personalized, machine learning based platform which powers onboarding, in-funnel and post-funnel engagement flows, such as ranking, dynamic pricing, persuasions, cross-sell and propensity models.


Analyzing Movie Reviews using DataStax

In this talk, Amanda Moran, Technical Evangelist at DataStax uses sentiment analysis on Twitter data about the latest movie titles to answer that age old question: “Is that movie any good?” She explains how they built the solution using Apache Cassandra, Apache Spark and DataStax Enterprise Analytics.


Designing a Horizontally Scalable Event-Driven Big Data Architecture with Apache Spark

Learn how Letgo uses Kafka / Kafka Connect for processing in streaming and batch with Spark. You will also learn how Letgo has used Spark Thrift Server / Hive Metastore as glue to exploit all ther data sources: HDFS, S3, Cassandra, Redshift, MariaDB … in a unified way from any point of their ecosystem, using technologies like: Jupyter, Zeppelin, Superset. Links

Strava Labs: Exploring a Billion Activity Dataset from Athletes with Apache Spark

Drew Robb a Staff Infrastructure Engineer at Strava discusses how they have extensively leveraged Apache Spark to explore data of over a billion activities, from tens of millions of athletes.



Large Scale Feature Aggregation Using Apache Spark

In this presentation, Pulkit Bhanot and Amit Nene  from Uber discuss how, using data stored in Hive and using Spark, they developed a highly scalable solution to carry out feature aggregation in an incremental way. By dividing data aggregation responsibility across the realtime access layer, and the batch computation components, they ensured that only entities for which there is actual value changes are dispersed to real-time access store (Cassandra). They share how they did data modeling in Cassandra using its native capabilities such as counters, and how they worked around some of the limitations of Cassandra.


Productionizing Behavioural Features for Machine Learning with Apache Spark Streaming

Learn how uses Spark Streaming for building online Machine Learning(ML) features that are used for real-time prediction of behaviour and preferences of their users, demand for hotels and improve processes in customer support.