HELK/resources/README.md

12 KiB
Raw Permalink Blame History

Resources

Helpful resources to learn a little bit more about the HELK and its components. They all inspired me to build the HELK!!

Goals

  • Gather as many resources as I can about the components of the HELK to share them with the community all at once.
  • Share interesting/valuable resources that helped me and , hopefully, could help others to learn more about ELK, Spark, Kafka, Jupyter, etc.

Kafka

Presentations

Session Title Description Speaker
ETL Is Dead, Long Live Streams: real-time streams w/ Apache Kafka Neha Narkhede talks about the experience at LinkedIn moving from batch-oriented ETL to real-time streams using Apache Kafka and how the design and implementation of Kafka was driven by this goal of acting as a real-time platform for event data @nehanarkhede
Building Realtime Data Pipelines with Kafka Connect and Spark Streaming Building Realtime data pipelines with Kafka and Spark Ewen Cheslack @confluentinc

ElasticStack

Presentations

Session Title Description Speaker
The Quieter You Become, the More Youre Able to (H)ELK This presentation covered the importances of data transformation for your data pipeline. It goes over several challenges and quick affordable solutions to take your elastic stack to the next level. @Cyb3rWard0g & @neu5ron
Kibana Custom Graphs with Vega Short demo of how Vega can be used to create interactive Kibana graphs @nyuriks
Kibana Scatter Plot Chart via Vega Tutorial on how to create a scatter plot chart in Kibana using Vega visualization (available since 6.2) or the Vega Kibana plugin by Yuri Astrakhan Tim Roes

Blog Posts

Name Description Author
Setting up a Pentesting... I mean, a Threat Hunting Lab - Part 5 Installation of an ELK stack. The Debian Way. @Cyb3rWard0g
Building a Sysmon Dashboard with an ELK Stack Step by step on how to create a basic dashboard with Kibana. @Cyb3rWard0g
Custom Vega Visualizations in Kibana 6.2 Step by step on how to create a basic dashboard with Kibana. @elastic
Advanced Sysmon filtering using Logstash Basic Sysmon configs and Logstash. @PabloSyspanda

Documentation

Name Description Author
Logstash Installation Different Ways to install logstash. @elastic
Logstash Input Plugins An input plugin enables a specific source of events to be read by Logstash. @elastic
Logstash Filter Plugins A filter plugin performs intermediary processing on an event. Filters are often applied conditionally depending on the characteristics of the event. @elastic
Logstash Output Plugins An output plugin sends event data to a particular destination. Outputs are the final stage in the event pipeline. @elastic
Deploying and Scaling Logstash The goal of this document is to highlight the most common architecture patterns for Logstash and how to effectively scale as your deployment grows. @elastic
Elasticsearch Installation Different Ways to install Elasticsearch. @elastic
Elasticsearch Production Deployment This chapter is not meant to be an exhaustive guide to running your cluster in production, but it covers the key things to consider before putting your cluster live. @elastic
Kibana Installation Different Ways to install Kibana. @elastic
Kibana Plugins Add-on functionality for Kibana is implemented with plug-in modules. You use the bin/kibana-plugin command to manage these modules. @elastic
Kibana Vega vs VegaLite Details about Vega and VegaLite @elastic

others

Name type
Kibana import/export dashboard api Elastic Forums
How to pull data data from 2 kafka topics using logstash and index the data in two separate index in elasticsearch Elastic Forums

Spark

Presentations

Session Title Description Speaker
Building Robust ETL Pipelines with Apache Spark In this talk, we'll take a deep dive into the technical details of how Apache Spark "reads" data and discuss how Spark 2.2's flexible APIs; support for a wide variety of datasources; state of art Tungsten execution engine; and the ability to provide diagnostic feedback to users, making it a robust framework for building end-to-end ETL pipelines Xiao Li @databricks

Blog Posts

Name Description Author
Real-Time End-to-End Integration with Apache Kafka in Apache Sparks Structured Streaming End-to-end integration with Kafka, consuming messages from it, doing simple to complex windowing ETL, and pushing the desired output to various sinks such as memory, console, file, databases, and back to Kafka itself. @databricks

Documentation

Name Description Author
Spark Overview Apache Spark Overview. @ApacheSpark
Spark Standalone Mode Apache Spark Standalone Mode. @ApacheSpark
Spark SQL, DataFrames and Datasets Guide Spark SQL, DataFrames and Datasets Guide. @ApacheSpark
Spark Python API Spark Python API Docs. @ApacheSpark
Apache Arrow in Spark Spark Python API Docs. @ApacheSpark
7 steps for a developer to learn apache spark 7 steps for a developer to learn apache spark Databricks
A Gentle Introduction to Apache Spark A Gentle Introduction to Apache Spark Databricks
Building Continuous Applications with Apache Spark Building Continuous Applications with Apache Spark Databricks
Data Scientists Guide to Apache-Spark Data Scientists Guide to Apache Spark Databricks
Getting Started With Apache Spark On Azure Databricks Getting Started With Apache Spark On Azure Databricks Databricks
Guide to Data Science at Scale Guide to Data Science at Scale Databricks

Papers

Name Description Author
Spark Cluster Computing with Working Sets Spark Cluster Computing with Working Sets Matei Zaharia, Mosharaf Chowdhury, Michael J. Franklin, Scott Shenker, Ion Stoica

GraphFrames (Spark)

Presentations

Session Title Description Speaker
GraphFrames: Graph Queries In Spark SQL Introduction of GraphFrames. Research focused behind GraphFrames @ankurdave
Finding Graph Isomorphisms In GraphX And GraphFrames Introduction of GraphFrames. Research focused behind GraphFrames @michaelmalak
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop Showing two frameworks for doing analytics in graphs with spark as the underline framework for execution @__aliv & @RussSpitzer
GraphFrames: DataFrame-based Graphs for Apache® Spark™ developers of the GraphFrames package will give an overview, a live demo, and a discussion of design decisions and future plans. @databricks
Connecting Cassandra Data with GraphFrames We can leverage these roots in a less complicated manner by using GraphFrames and Spark to extract maximum analytical awesomeness from our existing Cassandra data Jon Haddad

Papers

Name Description Author
GraphFrames An Integrated API for Mixing Graph and Relational Queries Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia