Skip to content

Latest commit

 

History

History
50 lines (40 loc) · 1.32 KB

README.md

File metadata and controls

50 lines (40 loc) · 1.32 KB

Spark Optimisation Training

Spark optimisation training and workshop

Build Docker images

This builds all images needed for the setup.

cd docker; ./build.sh

Prepare environment

This script creates required directories, which are used by the setup.

./init_env.sh

Start application

# this directory will be shared among Spark and Jupyter services
mkdir ./shared-vol

# download data, specify --with-csv if you want to download and unzip data in csv format (100Gb) as well
cd shared-vol
../collect_data.sh

# this will start Docker compose application
SHARED_DIR=`pwd`/shared-vol docker-compose -f docker/docker-compose.yml up

Application URLs

Restart SparkLint to get new logs

Sparklint doesn't fetch new logs automatically. To process new logs you can either add them manually through UI or restart Sparklint docker component

cd docker; docker-compose restart sparklint

Cleanup Docker env

Removes all stopped containers and deletes images with intermediate layers.

cd docker; ./cleanup.sh