HELK Design
+ moved everything to docker-compose approach for a more modular design.
+ separated the HELK in 3 services:
++helk-elk, helk-kafka, helk-analytics
+ Updated Design picture to show WEF ideas and also show Jupyter Lab integrations.
HELK Docker-Compose
+ Added ESDATA volume to keep logs after contaners get stopped
+ Services restart automatically after reboot
+ created blank env file for Kafka service. This allows the host to pass its own local IP to Kafka. This is needed for advertised listener configs on each broker.
HELK-ELK Version
- Updated to 6.2.2
ELasticsearch
- Added local docker network as part of the network.host option. This allows the HELK-ELK service to publish its docker local IP to other services/images in the docker compose environment.
Logstash
+ minimal updates to certain configs (Mainly renaming files and replacing certain strings)
Kibana
+ enableExternalUrls set to true for Vega visualization that need external libraries.
Spark - Analytics
+ Renamed service to Analytics
+ Integrated Apache Toree to allow Scala kernel in Jupyter
+ Pyspark, Scala and SQL are now available in Jupyter
Jupyter
+ Jupyter LAB has been enabled
Elasticsearch
+ Deleted Docker elasticsearch config file (Duplicate)
Logstash
+ Adjusted Batch size to 300 (Testing)
+ Renamed scripts to follow a standard naming convention
+ Added a fingerprint filter to all logs to help reduce duplicate logs
+ Removed ELK Version strings from all Logstash configs so that I dont have to update every single script every time ELK gets updated.
+ Added Document_id to every logstash output config to take the fingerprint value.
Kibana
+ Renamed Index Patterns to standard naming convention.
+ Added experimental visualization vega setting. Enabling External URLs to use D3 libraries from their repos. This is grayed out in the Kibana config so user will have to enable it.
+ Updated name of index patterns across all visualizations and dashboards.
Kafka
+ Log retention is now 24 hours and not 268 Hours
+ added auto_offset_reset => "earliest" to beats kafka input config
Spark
+ updated es-hadoop version to 6.2.0 and added new spark jar packages: org.apache.spark:spark-sql-kafka-0-10_2.11:2.2.1 & databricks:spark-sklearn:0.2.3
+ Created an init file to run spark and jupyter all together as a service. This will allow us to restart jupyter and pyspark gracefully.
Winlogbeat
+ Updated Winlogbeat config to take PowerShell and Microsoft-Windows-WMI-Activity/Operational logs.
New Features
+ Cerebro
+ Python packages:
-scipy==1.0.0
scikit-learn==0.19.1
nltk==3.2.5
matplotlib==2.1.2
seaborn==0.8.1
datasketch==1.2.5
tensorflow==1.5.0
keras==2.1.3
pyflux==0.4.15
imbalanced-learn==0.3.2
lime==0.1.1.29
Docker Hub
+ New HELK image available
+ @bsisco via Issue #19 let us know that communication between systems and kafka was not working. I forgot to expose the right ports when running the HELK Docker image after being pulled.
+ ELK 6.1.3 version (Jun 30,2018 release)
+ Kafka Integration
-- Bash, DockerFile & Docker Image
+ Replaced ELK DEB Install Packages for TAR packages (Easier deployement and more control)
+ Logstash: JVM Heap 2GB default
+ ELK (Init Files created)
-- More control over service start
+ Left Linux DEB install bash script (deprecating it in next release)
+ ELK .yml files are not available to adjust deployment in an easier way.
+ Fixed Docker Run environment parameters to be call before pointing to the HELK image.
+ Edited every single file to have the right headers:
-- ELK version 6.1.3
-- Aplha Version
-Using Official Docker install script known as convenience script
- Saved a copy of the convenience script (Edge version) locally just in case (Script needs to be modified if it is intended to use in production.