HELK/README.md

110 lines
7.9 KiB
Markdown
Raw Normal View History

# HELK [Alpha]
2018-01-06 22:14:43 +00:00
A Hunting ELK (Elasticsearch, Logstash, Kibana) with advanced analytic capabilities.
2018-01-16 01:11:13 +00:00
![alt text](resources/images/HELK_Design.png "HELK Infrastructure")
2017-04-14 05:29:04 +00:00
2017-06-29 15:21:59 +00:00
# Goals
* Provide a free hunting platform to the community and share the basics of Threat Hunting.
* Make sense of a large amount of event logs and add more context to suspicious events during hunting.
* Expedite the time it takes to deploy an ELK stack.
2018-01-06 22:14:43 +00:00
* Improve the testing of hunting use cases in an easier and more affordable way.
* Enable Data Science via Apache Spark, GraphFrames & Jupyter Notebooks.
2017-06-29 15:21:59 +00:00
# Current Status: Alpha
The project is currently in an alpha stage, which means that the code and the functionality are still changing. We haven't yet tested the system with large data sources and in many scenarios. We invite you to try it and welcome any feedback.
# HELK Features
* **Kafka:** A distributed publish-subscribe messaging system that is designed to be fast, scalable, fault-tolerant, and durable.
* **Elasticsearch:** A highly scalable open-source full-text search and analytics engine.
* **Logstash:** A data collection engine with real-time pipelining capabilities.
* **Kibana:** An open source analytics and visualization platform designed to work with Elasticsearch.
* **ES-Hadoop:** An open-source, stand-alone, self-contained, small library that allows Hadoop jobs (whether using Map/Reduce or libraries built upon it such as Hive, Pig or Cascading or new upcoming libraries like Apache Spark ) to interact with Elasticsearch.
* **Spark:** A fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
* **GraphFrames:** A package for Apache Spark which provides DataFrame-based Graphs.
* **Jupyter Notebook:** An open-source web application that allows you to create and share documents that contain live code, equations, visualizations and narrative text.
2017-06-29 15:21:59 +00:00
# Resources
2018-04-10 07:03:19 +00:00
* [Welcome to HELK! : Enabling Advanced Analytics Capabilities](https://cyberwardog.blogspot.com/2018/04/welcome-to-helk-enabling-advanced_9.html)
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
* [Spark](https://spark.apache.org/docs/latest/index.html)
* [Spark Standalone Mode](https://spark.apache.org/docs/latest/spark-standalone.html)
2017-06-29 15:21:59 +00:00
* [Setting up a Pentesting.. I mean, a Threat Hunting Lab - Part 5](https://cyberwardog.blogspot.com/2017/02/setting-up-pentesting-i-mean-threat_98.html)
2018-01-08 22:58:42 +00:00
* [An Integrated API for Mixing Graph and Relational Queries](https://cs.stanford.edu/~matei/papers/2016/grades_graphframes.pdf)
* [Graph queries in Spark SQL](https://www.slideshare.net/SparkSummit/graphframes-graph-queries-in-spark-sql)
* [Graphframes Overview](http://graphframes.github.io/index.html)
2017-06-29 15:21:59 +00:00
* [Elastic Producs](https://www.elastic.co/products)
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
* [Elastic Subscriptions](https://www.elastic.co/subscriptions)
2018-01-08 22:58:42 +00:00
* [Elasticsearch Guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html)
* [spujadas elk-docker](https://github.com/spujadas/elk-docker)
* [deviantony docker-elk](https://github.com/deviantony/docker-elk)
2017-06-29 15:21:59 +00:00
2017-05-26 06:11:09 +00:00
# Getting Started
HELK ELK 6.2.0 & New features Elasticsearch + Deleted Docker elasticsearch config file (Duplicate) Logstash + Adjusted Batch size to 300 (Testing) + Renamed scripts to follow a standard naming convention + Added a fingerprint filter to all logs to help reduce duplicate logs + Removed ELK Version strings from all Logstash configs so that I dont have to update every single script every time ELK gets updated. + Added Document_id to every logstash output config to take the fingerprint value. Kibana + Renamed Index Patterns to standard naming convention. + Added experimental visualization vega setting. Enabling External URLs to use D3 libraries from their repos. This is grayed out in the Kibana config so user will have to enable it. + Updated name of index patterns across all visualizations and dashboards. Kafka + Log retention is now 24 hours and not 268 Hours + added auto_offset_reset => "earliest" to beats kafka input config Spark + updated es-hadoop version to 6.2.0 and added new spark jar packages: org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.1 & databricks:spark-sklearn:0.2.3 + Created an init file to run spark and jupyter all together as a service. This will allow us to restart jupyter and pyspark gracefully. Winlogbeat + Updated Winlogbeat config to take PowerShell and Microsoft-Windows-WMI-Activity/Operational logs. New Features + Cerebro + Python packages: -scipy==1.0.0 scikit-learn==0.19.1 nltk==3.2.5 matplotlib==2.1.2 seaborn==0.8.1 datasketch==1.2.5 tensorflow==1.5.0 keras==2.1.3 pyflux==0.4.15 imbalanced-learn==0.3.2 lime==0.1.1.29 Docker Hub + New HELK image available
2018-02-15 08:28:48 +00:00
## WIKI
* [Introduction](https://github.com/Cyb3rWard0g/HELK/wiki)
* [Architecture Overview](https://github.com/Cyb3rWard0g/HELK/wiki/Architecture-Overview)
* [Kafka](https://github.com/Cyb3rWard0g/HELK/wiki/Kafka)
* [Logstash](https://github.com/Cyb3rWard0g/HELK/wiki/Logstash)
* [Elasticsearch](https://github.com/Cyb3rWard0g/HELK/wiki/Elasticsearch)
* [Kibana](https://github.com/Cyb3rWard0g/HELK/wiki/Kibana)
* [Spark](https://github.com/Cyb3rWard0g/HELK/wiki/Spark)
* [Installation](https://github.com/Cyb3rWard0g/HELK/wiki/Installation)
## (Docker) Accessing the HELK's Images
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
By default, the HELK's containers are run in the background (Detached). You can see all your docker containers by running the following command:
```
sudo docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a97bd895a2b3 cyb3rward0g/helk-spark-worker:2.3.0 "./spark-worker-entr…" About an hour ago Up About an hour 0.0.0.0:8082->8082/tcp helk-spark-worker2
cbb31f688e0a cyb3rward0g/helk-spark-worker:2.3.0 "./spark-worker-entr…" About an hour ago Up About an hour 0.0.0.0:8081->8081/tcp helk-spark-worker
5d58068aa7e3 cyb3rward0g/helk-kafka-broker:1.1.0 "./kafka-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:9092->9092/tcp helk-kafka-broker
bdb303b09878 cyb3rward0g/helk-kafka-broker:1.1.0 "./kafka-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:9093->9093/tcp helk-kafka-broker2
7761d1e43d37 cyb3rward0g/helk-nginx:0.0.2 "./nginx-entrypoint.…" About an hour ago Up About an hour 0.0.0.0:80->80/tcp helk-nginx
ede2a2503030 cyb3rward0g/helk-jupyter:0.32.1 "./jupyter-entrypoin…" About an hour ago Up About an hour 0.0.0.0:4040->4040/tcp, 0.0.0.0:8880->8880/tcp helk-jupyter
ede19510e959 cyb3rward0g/helk-logstash:6.2.4 "/usr/local/bin/dock…" About an hour ago Up About an hour 5044/tcp, 9600/tcp helk-logstash
e92823b24b2d cyb3rward0g/helk-spark-master:2.3.0 "./spark-master-entr…" About an hour ago Up About an hour 0.0.0.0:7077->7077/tcp, 0.0.0.0:8080->8080/tcp helk-spark-master
6125921b310d cyb3rward0g/helk-kibana:6.2.4 "./kibana-entrypoint…" About an hour ago Up About an hour 5601/tcp helk-kibana
4321d609ae07 cyb3rward0g/helk-zookeeper:3.4.10 "./zookeeper-entrypo…" About an hour ago Up About an hour 2888/tcp, 0.0.0.0:2181->2181/tcp, 3888/tcp helk-zookeeper
9cbca145fb3e cyb3rward0g/helk-elasticsearch:6.2.4 "/usr/local/bin/dock…" About an hour ago Up About an hour 9200/tcp, 9300/tcp helk-elasticsearch
```
Then, you will just have to pick which container you want to access and run the following following commands:
2018-01-06 22:14:43 +00:00
```
sudo docker exec -ti <image-name> bash
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
root@ede2a2503030:/opt/helk/scripts#
2018-01-06 22:14:43 +00:00
```
2017-06-29 15:21:59 +00:00
# Author
2018-01-08 22:58:42 +00:00
* Roberto Rodriguez [@Cyb3rWard0g](https://twitter.com/Cyb3rWard0g) [@THE_HELK](https://twitter.com/THE_HELK)
2017-05-26 06:11:09 +00:00
2017-06-29 15:21:59 +00:00
# Contributors
* Robby Winchester [@robwinchester3](https://twitter.com/robwinchester3)
HELK ELK 6.2.0 & New features Elasticsearch + Deleted Docker elasticsearch config file (Duplicate) Logstash + Adjusted Batch size to 300 (Testing) + Renamed scripts to follow a standard naming convention + Added a fingerprint filter to all logs to help reduce duplicate logs + Removed ELK Version strings from all Logstash configs so that I dont have to update every single script every time ELK gets updated. + Added Document_id to every logstash output config to take the fingerprint value. Kibana + Renamed Index Patterns to standard naming convention. + Added experimental visualization vega setting. Enabling External URLs to use D3 libraries from their repos. This is grayed out in the Kibana config so user will have to enable it. + Updated name of index patterns across all visualizations and dashboards. Kafka + Log retention is now 24 hours and not 268 Hours + added auto_offset_reset => "earliest" to beats kafka input config Spark + updated es-hadoop version to 6.2.0 and added new spark jar packages: org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.1 & databricks:spark-sklearn:0.2.3 + Created an init file to run spark and jupyter all together as a service. This will allow us to restart jupyter and pyspark gracefully. Winlogbeat + Updated Winlogbeat config to take PowerShell and Microsoft-Windows-WMI-Activity/Operational logs. New Features + Cerebro + Python packages: -scipy==1.0.0 scikit-learn==0.19.1 nltk==3.2.5 matplotlib==2.1.2 seaborn==0.8.1 datasketch==1.2.5 tensorflow==1.5.0 keras==2.1.3 pyflux==0.4.15 imbalanced-learn==0.3.2 lime==0.1.1.29 Docker Hub + New HELK image available
2018-02-15 08:28:48 +00:00
* Jared Atkinson [@jaredatkinson](https://twitter.com/jaredcatkinson)
2018-01-08 22:58:42 +00:00
* Nate Guagenti [@neu5ron](https://twitter.com/neu5ron)
* Jordan Potti [@ok_bye_now](https://twitter.com/ok_bye_now)
* Lee Christensen [@tifkin_](https://twitter.com/tifkin_)
2017-06-29 15:21:59 +00:00
# Contributing
2018-01-08 22:58:42 +00:00
There are a few things that I would like to accomplish with the HELK as shown in the To-Do list below. I would love to make the HELK a stable build for everyone in the community. If you are interested on making this build a more robust one and adding some cool features to it, PLEASE feel free to submit a pull request. #SharingIsCaring
2017-06-29 15:21:59 +00:00
# TO-Do
2018-01-08 22:58:42 +00:00
- [X] Upload basic Kibana Dashboards
- [X] Integrate Spark & Graphframes
- [X] Add Jupyter Notebook on the top of Spark
- [X] Kafka Integration
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
- [X] Default X-Pack Basic - Free License Build for ELKStack
- [X] Spark Standalone Cluster Manager integration
- [X] Apache Arrow Integration for Pandas Dataframes
- [ ] Zepplin Notebook Docker option
- [ ] KSQL Client & Server Deployment (Waiting for v5.0)
- [ ] Kubernetes Cluster Migration
- [ ] OSQuery Data Ingestion
- [ ] Create Jupyter Notebooks showing how to use Spark & GraphFrames
- [ ] MITRE ATT&CK mapping to logs or dashboards
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
2018-05-03 19:54:12 +00:00
- [ ] Cypher for Apache Spark Integration (Might have to switch from Jupyter to Zeppelin Notebook)
2018-01-08 22:58:42 +00:00
- [ ] Somehow integrate neo4j spark connectors with build
- [ ] Nxlog parsers (Logstash Filters)
- [ ] Add more network data sources (i.e Bro)
2018-03-04 04:44:09 +00:00
- [ ] Research & integrate spark structured direct streaming
More coming soon...