Filter (clear filters)






Overview for spark-streaming

Journey of MIQ Towards Democratization of Data Analytics

Ramkumar Venkatesan and Manish Khandelwal from Media iQ (MiQ) discuss MIQ's journey towards democratization of data analytics.  MiQ is standardizing most of their data to be stored in the Apache Parquet format on S3. They use Spark to support real time BI types of queries using Spark SQL. Plus their machine learning jobs are migrating to spark.


Real-Time Market Data Analytics Using Kafka Streams

Bloomberg is building a streaming platform with Apache Kafka, Kafka Streams and Spark Streaming to handle high volume, real-time processing with rapid derivative market data. In this discussion you will learn how Bloomberg utilizes Kafka Streams Processor API to build pipelines that are capable of handling millions of market movements per second with ultra-low latency, as well as performing complex analytics like outlier detection, source confidence evaluation (scoring), arbitrage detection and other financial-related processing. 


Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark

Learn how CERN, the biggest physics laboratory in the world,has large volumes of data are generated every hour, stored and processed using scalable systems as Hadoop, Spark and HBase.


Productionizing Behavioural Features for Machine Learning with Apache Spark Streaming

Learn how uses Spark Streaming for building online Machine Learning(ML) features that are used for real-time prediction of behaviour and preferences of their users, demand for hotels and improve processes in customer support.


Relationship Extraction from Unstructured Text-Based on Stanford NLP with Spark

Learn how Capgemini developed an automated solution based on Spark integration of Stanford NLP that processes the semantic structure of the sentences, retrieves pieces of supply chain information, matches those to the pieces of the supply chain coming from other sentences in other reports and, finally, presents it to the final user in a form of a graph. The benefits of Spark implementation allowed to treat entire collection of the reports in memory, easily integrate external Stanford NLP libraries.


A Practical Approach to Building a Streaming Processing Pipeline for an Online Advertising Platform

Yelp’s ad platform handles millions of ad requests everyday. To generate ad metrics and analytics in real-time, they built they ad event tracking and analyzing pipeline on top of Spark Streaming. It allows Yelp to manage large number of active ad campaigns and greatly reduce over-delivery. It also enables them to share ad metrics with advertisers in a more timely fashion.