SHOW

Filter (clear filters)

Domains

Companies

Technologies

Functions


Overview for cern

Experience of Running Spark on Kubernetes on OpenStack for High Energy Physics Workloads

The physicists at CERN are using Spark to process large physics datasets in a distributed fashion with the aim of reducing time-to-physics with increased interactivity. In this talk Prasanth Kothuri and Piotr Mrowczynski Big Data Engineers for CERN focus on the design choices made and challenges faced while developing spark-as-a-service over kubernetes on openstack to simplify provisioning, automate management, and minimize the operating burden of managing Spark Clusters.

Links


Apache Spark for RDBMS Practitioners: How I Learned to Stop Worrying and Love to Scale

Luca Canali an engineer and team lead at the CERN Hadoop, Spark and database services at CERN shares the experience and lessons learned on setting up and running the Apache Spark service inside the database group at CERN. He covers the many aspects of this change with examples taken from use cases and projects at the CERN Hadoop, Spark, streaming and database services.

Links


CERN’s Next Generation Data Analysis Platform with Apache Spark

Learn how Spark is being used at CERN to process large physics datasets in a distributed fashion. The most widely used tool for high-energy physics analysis, ROOT, implements a layer on top of Spark in order to distribute computations across a cluster of machines. This makes it possible for physics analysis written in either C++ or Python to be parallelised on Spark clusters, while reading the input data from CERN’s mass storage system: EOS. On the other hand, another important use case of Spark at CERN has recently emerged. 

Links


Stateful Structure Streaming and Markov Chains Join Forces to Monitor the Biggest Storage of Physics Data

Learn how CERN, the biggest physics laboratory in the world processses and stores large volumes of data generated every hour. The storage group, which holds more than 200 petabytes, is an essential player to help the organisation overcoming this great challenge. ExDeMon, an open-sourced metrics monitor where stateful processing implemented with Spark Structured Streaming is playing a key role by applying machine learning techniques on collected logs and metrics. One of the machine learning techniques CERN aims to apply are Markov chains, a statistical model that was developed by Andrey Markov in the XIX century.

Links


CERN’s Next Generation Data Analysis Platform with Apache Spark

Learn how CERN uses Spark to process large physics datasets in a distributed fashion. The most widely used tool for high-energy physics analysis, ROOT, implements a layer on top of Spark in order to distribute computations across a cluster of machines. This makes it possible for physics analysis written in either C++ or Python to be parallelised on Spark clusters, while reading the input data from CERN’s mass storage system.

Links


Real-Time Detection of Anomalies in the Database Infrastructure using Apache Spark

Learn how CERN, the biggest physics laboratory in the world,has large volumes of data are generated every hour, stored and processed using scalable systems as Hadoop, Spark and HBase.

Links