HELK/docker/helk-spark-master/Dockerfile

# HELK script: HELK Spark Master Dockerfile
# HELK build Stage: Alpha
# Author: Roberto Rodriguez (@Cyb3rWard0g)
# License: GPL-3.0

FROM cyb3rward0g/helk-spark-base:2.4.3
LABEL maintainer="Roberto Rodriguez @Cyb3rWard0g"
LABEL description="Dockerfile base for HELK Spark Master."

ENV DEBIAN_FRONTEND noninteractive

COPY scripts/spark-master-entrypoint.sh ${SPARK_HOME}/sbin/
COPY spark-defaults.conf ${SPARK_HOME}/conf/

EXPOSE 8080 7077
WORKDIR $SPARK_HOME/sbin
ENTRYPOINT ["./spark-master-entrypoint.sh"]

USER sparkuser
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames 2018-05-03 19:54:12 +00:00			`# HELK script: HELK Spark Master Dockerfile`
HELK-07122018 License: GPL-3.0 Update ++ Updated all the local documents ++ Docker images in Dockerhub in progreess Docker-Compose ++ Created two options: basic and trial ELK Stack Docker Files ++ Created Trial Folders to make sure the configurations are set properly for when the user selects trial version of HELK. ++++ HELK trial = x-pack + trial license + security enabled ++ Deprecating the HELKs Platinum's Branch. Merging that branch with the HELKs master to allow user to select the type of license during the install process. Jupyter ++ Getting ready for Jupyterhub ++ Created two folders: basic and trial to allow elasticsearch interaciton with username and password hardcoded in the spark session. trial license requires any interaction with elasticsearch to be authenticated. Kibana ++ Added trial folder with scripts that set up security configs for the trial version of HELK. It creates users and roles to test the security features of x-pack Logstash ++ Created trial folder with another pipeline folder in it. The pipeline in trial has output configs with elasticsearch's username and password hardcoded. Ready for when the user sets the build with trial license and wants to send logs to elasticsearch. The logstash configs are the same as the ones from the defailt pipeline. They only have username and password configs on all the output configs. Nginx ++ set trial folder with the right config to allow Kibana handle the authentication process when user builds and installs HELK with a trial license. No need for nginx to handle the authentication. helk_install bash script ++ Updated script to handle license choice : basic or trial ++ basic license is selected by default. If user selects trial, it runs the specific docker-compose file needed to build and install HELK with the right trial configs. ++ Updated also the CLI options. User now will have to specify the license for HELK. Example: sudo ./helk_install.sh -i 192.168.64.131 -l basic 2018-07-12 04:29:09 +00:00			`# HELK build Stage: Alpha`
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames 2018-05-03 19:54:12 +00:00			`# Author: Roberto Rodriguez (@Cyb3rWard0g)`
HELK-07122018 License: GPL-3.0 Update ++ Updated all the local documents ++ Docker images in Dockerhub in progreess Docker-Compose ++ Created two options: basic and trial ELK Stack Docker Files ++ Created Trial Folders to make sure the configurations are set properly for when the user selects trial version of HELK. ++++ HELK trial = x-pack + trial license + security enabled ++ Deprecating the HELKs Platinum's Branch. Merging that branch with the HELKs master to allow user to select the type of license during the install process. Jupyter ++ Getting ready for Jupyterhub ++ Created two folders: basic and trial to allow elasticsearch interaciton with username and password hardcoded in the spark session. trial license requires any interaction with elasticsearch to be authenticated. Kibana ++ Added trial folder with scripts that set up security configs for the trial version of HELK. It creates users and roles to test the security features of x-pack Logstash ++ Created trial folder with another pipeline folder in it. The pipeline in trial has output configs with elasticsearch's username and password hardcoded. Ready for when the user sets the build with trial license and wants to send logs to elasticsearch. The logstash configs are the same as the ones from the defailt pipeline. They only have username and password configs on all the output configs. Nginx ++ set trial folder with the right config to allow Kibana handle the authentication process when user builds and installs HELK with a trial license. No need for nginx to handle the authentication. helk_install bash script ++ Updated script to handle license choice : basic or trial ++ basic license is selected by default. If user selects trial, it runs the specific docker-compose file needed to build and install HELK with the right trial configs. ++ Updated also the CLI options. User now will have to specify the license for HELK. Example: sudo ./helk_install.sh -i 192.168.64.131 -l basic 2018-07-12 04:29:09 +00:00			`# License: GPL-3.0`
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames 2018-05-03 19:54:12 +00:00
Towards ELK 7.0.1 2019-05-14 15:05:55 +00:00			`FROM cyb3rward0g/helk-spark-base:2.4.3`
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames 2018-05-03 19:54:12 +00:00			`LABEL maintainer="Roberto Rodriguez @Cyb3rWard0g"`
			`LABEL description="Dockerfile base for HELK Spark Master."`

			`ENV DEBIAN_FRONTEND noninteractive`

HELK v0.1.3-alpha08032018 All + Moved all to docker folder. Getting ready to start sharing other ways to deploy helk (terraform & Packer maybe) Compose-files + Basic & Trial Elastic Subscriptions available now and can be automatically managed via the helk_install script ELK Version : 6.3.2 Elasticsearch + Set 4GB for ES_JAVA_OPTS by default allowing the modification of it via docker-compose and calculating half of the host memory if it is not set + Added Entrypoint script and using docker-entrypoint to start ES Logstash + Big Pipeline Update by Nate Guagenti (@neu5ron) ++better cli & file name searching ++”dst_ip_public:true” filter out all rfc1918/non-routable ++Geo ASName ++Identification of 16+ windows IP fields ++Arrayed IPs support ++IPv6&IPv4 differentiation ++removing “-“ values and MORE!!! ++ THANK YOU SO MUCH NATE!!! ++ PR: https://github.com/Cyb3rWard0g/HELK/pull/93 + Added entrypoint script to push new output_templates straight to Elasticsearch per Nate's recommendation + Starting Logstash now with docker-entrypoint + "event_data" is now taken out of winlogbeat logs to allow integration with nxlog (sauce added by Nate Guagenti (@neu5ron) Kibana + Kibana yml file updated to allow a longer time for timeout Nginx: + it handles communications to Kibana and Jupyterhub via port 443 SSL + certificate and key get created at build time + Nate added several settings to improve the way how nginx operates Jupyterhub + Multiple users and mulitple notebooks open at the same time are possible now + Jupytehub now has 3 users hunter1,hunter2.hunter3 and password patterh is <user>P@ssw0rd! + Every notebook created is also JupyterLab + Updated ES-Hadoop 6.3.2 Kafka Update + 1.1.1 Update Spark Master + Brokers + reduce memory for brokers by default to 512m Resources: + Added new images for Wiki 2018-08-03 18:13:25 +00:00			`COPY scripts/spark-master-entrypoint.sh ${SPARK_HOME}/sbin/`
			`COPY spark-defaults.conf ${SPARK_HOME}/conf/`
HELK 6.2.4-050318 ## Overall + Removed the Init files dependencies on all containers + Added more resources to the resources folder (papers and presentations) + Updated to-do list on main README + Removed Static Network setting. Addressing overlapping network issues (https://github.com/Cyb3rWard0g/HELK/issues/43) + Updated WIki and added new images to it + Started documenting potential error messages or bugs with a few quick fixes ## Helk Install Script + Script now collects information about Available Memory and Disk size for LINUX host ONLY. it only continues if the box hosting the HELK has at least 12GB of RAM and 50GB of Disk Available. (This can be overwritten manually by just editing the helk_install script before installing the HELK) ## ELK Stack + Started using Elastic Docker Images as a base + Updated ELK stack to 6.2.4 version + X-Pack Basic Free License attached to build automatically + Monitoring capabilities are now enabled in the build (Reason why Cerebro went away) ## Spark + Integrated Spark Standalone Cluster Manager + Spark Node running with Jupyter Notebook now points to the Helk-Spark-Master container for any execution of code + Added Spark Master and Worker Docker Images + Build runs now with 2 Workers and 1 Master by default. + Apache Arrow is enabled for Pandas Dataframe optimization + Created Spark-Base Docker Image (Applied to the Jupyter Image) ## Kafka + Kafka Container was split in Kafka Brokers and one Zookeeper + Helk runs with 2 Kafka Brokers and 1 Zookeeper by default ## Jupyter Container + Preparing to add Zeppelin Notebook. the Analytics container is now named Jupyter. It uses the Spark-Base image to build on the top and install the necessary packagess + New packages were added: ++ nxviz ++ hiveplot ++ pyarrow + Apache Arrow is not enabled on the Jupyter node to be able to optimize the use of Pandas DataFrames 2018-05-03 19:54:12 +00:00
			`EXPOSE 8080 7077`
HELK v0.1.3-alpha08032018 All + Moved all to docker folder. Getting ready to start sharing other ways to deploy helk (terraform & Packer maybe) Compose-files + Basic & Trial Elastic Subscriptions available now and can be automatically managed via the helk_install script ELK Version : 6.3.2 Elasticsearch + Set 4GB for ES_JAVA_OPTS by default allowing the modification of it via docker-compose and calculating half of the host memory if it is not set + Added Entrypoint script and using docker-entrypoint to start ES Logstash + Big Pipeline Update by Nate Guagenti (@neu5ron) ++better cli & file name searching ++”dst_ip_public:true” filter out all rfc1918/non-routable ++Geo ASName ++Identification of 16+ windows IP fields ++Arrayed IPs support ++IPv6&IPv4 differentiation ++removing “-“ values and MORE!!! ++ THANK YOU SO MUCH NATE!!! ++ PR: https://github.com/Cyb3rWard0g/HELK/pull/93 + Added entrypoint script to push new output_templates straight to Elasticsearch per Nate's recommendation + Starting Logstash now with docker-entrypoint + "event_data" is now taken out of winlogbeat logs to allow integration with nxlog (sauce added by Nate Guagenti (@neu5ron) Kibana + Kibana yml file updated to allow a longer time for timeout Nginx: + it handles communications to Kibana and Jupyterhub via port 443 SSL + certificate and key get created at build time + Nate added several settings to improve the way how nginx operates Jupyterhub + Multiple users and mulitple notebooks open at the same time are possible now + Jupytehub now has 3 users hunter1,hunter2.hunter3 and password patterh is <user>P@ssw0rd! + Every notebook created is also JupyterLab + Updated ES-Hadoop 6.3.2 Kafka Update + 1.1.1 Update Spark Master + Brokers + reduce memory for brokers by default to 512m Resources: + Added new images for Wiki 2018-08-03 18:13:25 +00:00			`WORKDIR $SPARK_HOME/sbin`
v0.1.3-alpha08242018-a helk-spark-worker + set SPARK_WORKER_MEMORY to 1g + Enabled spark shuffle service to safely remove executors from apps helk-jupyter + Upgraded ES-Hadoop to 6.4.0 + Added Postgresql JAR and installed postgresql to manage the usage of multiple notebooks + Added entrypoint script to create hive user, set a password and create a hive_metastore database + Set Spark dynamic allocation settings to avoid Spark workers getting sucked on one application only 2018-08-24 22:13:13 +00:00			`ENTRYPOINT ["./spark-master-entrypoint.sh"]`

			`USER sparkuser`