Filter (clear filters)





Overview for cloud

The Need for Speed – Data Streaming in the Cloud with Kafka®

Running Kafka on Kubernetes is becoming more and more popular. Frank Pientka, Principal Software Architect, Materna Information & Communications SE introduces a setup, used components and recommendations from an own project with Kafka on Kubernetes.He shares the lessons learned from this still evolving field. 


Apache Beam meetup 7 at Datatonic: Beam at Lyft + datalake using Beam + schemas

See how Lyft and Datatonic are using Apache Flink, Apache beam and python in stream processing, machine learning and analytics.


The Latest in Apache Hive, Spark, Druid and Impala

See how Hortonworks and Cloudera is using the latest in Apache Hive, Spark, Druid and Impala in data warehousing, analytics and recommendations.


How LogMeIn Automates Governance and Empowers Developers at Scale

Learn how LogMeIn moves quickly and stays secure through the power of automation on AWS. You will learn the core AWS security building blocks, such as IAM, AWS CloudTrail, AWS Config, and Amazon CloudWatch.  The discussion goes on to approach LogMeIn’s approach for empowering developers on AWS while also meeting required security controls.


Avoiding Log Data Overload in a CI/CD System: Streaming 190 Billion Events and Batch Processing 40 TB/Hour

Learn how Pure Storage engineering manages streaming 190B log events per day and makes use of that deluge of data in their continuous integration (CI) pipeline. Pure Storage's test infrastructure runs over 70,000 tests per day creating a large triage problem that would require at least 20 triage engineers. Spark’s flexible computing platform allows the data engineering team to write a single application for both streaming and batch jobs to understand the state of our CI pipeline Pure Storage's team of 3 triage engineers. Using encoded patterns, Spark indexes log data for real-time reporting (Streaming), uses Machine Learning for performance modeling and prediction (Batch job), and finds previous matches for newly encoded patterns (Batch job).


Entity Linking @ Scale Using Elasticsearch

This talk is about the the use of Elasticsearch as a scalable entity linking/deduplication tool at Messagepoint Inc. Atif Khan the Vice President, AI & Data Science at Messagepoint presents the high level architecture and design of such a system and reviews its application in the context of two major use cases of data deduplication and attribute-based link discovery.