Todo:
* Structure (scripts in directory)
* Recognition when Kibana and index config of it gets available (polling)
* Cron job for auto update
* Integration in compose file
## Overall
+ Removed the Init files dependencies on all containers
+ Added more resources to the resources folder (papers and presentations)
+ Updated to-do list on main README
+ Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43)
+ Updated WIki and added new images to it
+ Started documenting potential error messages or bugs with a few quick fixes
## Helk Install Script
+ Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK)
## ELK Stack
+ Started using Elastic Docker Images as a base
+ Updated ELK stack to 6.2.4 version
+ X-Pack Basic Free License attached to build automatically
+ Monitoring capabilities are now enabled in the build (Reason why Cerebro went away)
## Spark
+ Integrated Spark Standalone Cluster Manager
+ Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code
+ Added Spark Master and Worker Docker Images
+ Build runs now with 2 Workers and 1 Master by default.
+ Apache Arrow is enabled for Pandas Dataframe optimization
+ Created Spark-Base Docker Image (Applied to the Jupyter Image)
## Kafka
+ Kafka Container was split in Kafka Brokers and one Zookeeper
+ Helk runs with 2 Kafka Brokers and 1 Zookeeper by default
## Jupyter Container
+ Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess
+ New packages were added:
++ nxviz
++ hiveplot
++ pyarrow
+ Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames
Removed OTX Enrichment for now to reduce the load on logstash and keep it clean for now. It will be added in the future. Implementation is already developed.
Docker-Compose File
+ Split helk-elk service in 3 (Logstash, Kibana, Logstash)
HELK-base
+ New Docker Base image applied to all HELK's Docker images
HELK-analytics
+ updated file due to new helk-base image
HELK-elk
+ Removed Helk-elk folder
HELK-kafka
+ Updated it to version 1.1.0
HELK-Logstash
+ Updated all files to point to helk-kafka and helk-elasticsearch (New image after splitting helk-elk)
New Docker Images
+ helk-elasticsearch
+ helk-logstash
+ helk-kibana
+ helk-nginx
HELK-nginx
+ Removed route to elasticsearch:8082. Cerebro now can point to 172.18.0.2 (Internal Docker IP)
HELK-Install
+ organized script a little better by creating install_dockerl and install_docker_compose functions
HELK-kibana
+ updated Kibana configuration to set Kibana server to the name of the service helk-kibana. It allows remote connections to it (internally among docer images)
+ Updated elasticsearch url to new docker image (helk-elasticsearch:9200)
HELK-kafka
+ updated internal listeners on each broker to helk-kafka
Docker-Compose file
+ Updated Image versions
++ helk-elk:6.2.3
++ helk-kafka:1.0.1
++ helk-analytics:0.0.2
HELK-ANALYTICS
+ Upgraded spark to version 2.3.0
++ Check release notes: https://spark.apache.org/releases/spark-release-2-3-0.html
+ Upgraded Jupyter Lab to 0.31.12
+ Downgraded Tornado to version 4.* This is due to an error in dependencies happening in version 5.0 with python 3.
+ Upgraded ES-Hadoop package to version 6.2.3
++ Check release notes:
https://www.elastic.co/guide/en/elasticsearch/hadoop/6.2/eshadoop-6.2.3.html
HELK-ELK
+ Upgraded elastic components to 6.2.3
++ Check elasticsearch release notes:
https://www.elastic.co/guide/en/elasticsearch/reference/6.2/release-notes-6.2.3.html
++ No changes for Kibana
++ Check Logstash release notes:
https://www.elastic.co/guide/en/logstash/6.2/logstash-6-2-3.html
+ Logstash kafka input now adds metadata from kafka. Topic name, etc.
+ Fingerprint plugin in logstash config 09-all-filter.con is applied to only events with the message field.
+ logstash config 11-winevent-sysmon-filter.conf
++ removed field "user". This was causing issues when parsing events with Spark.
HELK-KAFKA
+ Upgraded Kafka to version 2.11-1.0.1
++ Check kafka release notes:
https://www.apache.org/dist/kafka/1.0.1/RELEASE_NOTES.html
+ Removed sleep time for kafka init file
+ updated kafka entrypoint updating version values
HELK helk_install main script
+ Fixed docker & docker-compose installation steps. This fixes issue https://github.com/Cyb3rWard0g/HELK/issues/33
HELK Winlogbeat install script
+ Updated beat version to 6.2.3
helk-analytics
+ Init file and Dockerfile updated with Spark version 2.3.0
+Jupyter Notebook from getting started folder updated
+ New jupyter notebook with graphframes example presented in BSColumbus 2018
helk-elk
+ Added properties to elasticsearch config file to set it as a standalone cluster. (It helps for when elasticsearch is restarted)
+ Updated Dashboards
+ Updated Kibana timeout to 60000
+ Updated Logstas - elasticsearch mapping templates after renaming fields.
+ Updated logstash filters renaming fields keeping a new flat schema. No more nested fields style.
helk-kafka
+ Updated Log retention hours to 2 hours
Resources:
- Created README to share all the blog posts, documentes and presentations that helped me to work on the HELK
Scripts
+ Deprecated most of the scripts used before to install ELK via TAR and DEB. Also deprecated scripts to updated geoip database.
HELK Design
+ moved everything to docker-compose approach for a more modular design.
+ separated the HELK in 3 services:
++helk-elk, helk-kafka, helk-analytics
+ Updated Design picture to show WEF ideas and also show Jupyter Lab integrations.
HELK Docker-Compose
+ Added ESDATA volume to keep logs after contaners get stopped
+ Services restart automatically after reboot
+ created blank env file for Kafka service. This allows the host to pass its own local IP to Kafka. This is needed for advertised listener configs on each broker.
HELK-ELK Version
- Updated to 6.2.2
ELasticsearch
- Added local docker network as part of the network.host option. This allows the HELK-ELK service to publish its docker local IP to other services/images in the docker compose environment.
Logstash
+ minimal updates to certain configs (Mainly renaming files and replacing certain strings)
Kibana
+ enableExternalUrls set to true for Vega visualization that need external libraries.
Spark - Analytics
+ Renamed service to Analytics
+ Integrated Apache Toree to allow Scala kernel in Jupyter
+ Pyspark, Scala and SQL are now available in Jupyter
Jupyter
+ Jupyter LAB has been enabled
Elasticsearch
+ Deleted Docker elasticsearch config file (Duplicate)
Logstash
+ Adjusted Batch size to 300 (Testing)
+ Renamed scripts to follow a standard naming convention
+ Added a fingerprint filter to all logs to help reduce duplicate logs
+ Removed ELK Version strings from all Logstash configs so that I dont have to update every single script every time ELK gets updated.
+ Added Document_id to every logstash output config to take the fingerprint value.
Kibana
+ Renamed Index Patterns to standard naming convention.
+ Added experimental visualization vega setting. Enabling External URLs to use D3 libraries from their repos. This is grayed out in the Kibana config so user will have to enable it.
+ Updated name of index patterns across all visualizations and dashboards.
Kafka
+ Log retention is now 24 hours and not 268 Hours
+ added auto_offset_reset => "earliest" to beats kafka input config
Spark
+ updated es-hadoop version to 6.2.0 and added new spark jar packages: org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.1 & databricks:spark-sklearn:0.2.3
+ Created an init file to run spark and jupyter all together as a service. This will allow us to restart jupyter and pyspark gracefully.
Winlogbeat
+ Updated Winlogbeat config to take PowerShell and Microsoft-Windows-WMI-Activity/Operational logs.
New Features
+ Cerebro
+ Python packages:
-scipy==1.0.0
scikit-learn==0.19.1
nltk==3.2.5
matplotlib==2.1.2
seaborn==0.8.1
datasketch==1.2.5
tensorflow==1.5.0
keras==2.1.3
pyflux==0.4.15
imbalanced-learn==0.3.2
lime==0.1.1.29
Docker Hub
+ New HELK image available
+ @bsisco via Issue #19 let us know that communication between systems and kafka was not working. I forgot to expose the right ports when running the HELK Docker image after being pulled.
+ ELK 6.1.3 version (Jun 30,2018 release)
+ Kafka Integration
-- Bash, DockerFile & Docker Image
+ Replaced ELK DEB Install Packages for TAR packages (Easier deployement and more control)
+ Logstash: JVM Heap 2GB default
+ ELK (Init Files created)
-- More control over service start
+ Left Linux DEB install bash script (deprecating it in next release)
+ ELK .yml files are not available to adjust deployment in an easier way.
+ Fixed Docker Run environment parameters to be call before pointing to the HELK image.
+ Edited every single file to have the right headers:
-- ELK version 6.1.3
-- Aplha Version