Skip to content

ma-ts/spark-optimization-training

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spark Optimisation Training

Spark optimisation training and workshop

Build Docker images

This builds all images needed for the setup.

cd docker; ./build.sh

Prepare environment

This script creates required directories, which are used by the setup.

./init_env.sh

Start application

# this directory will be shared among Spark and Jupyter services
mkdir ./shared-vol

# download data, specify --with-csv if you want to download and unzip data in csv format (100Gb) as well
cd shared-vol
../collect_data.sh

# this will start Docker compose application
SHARED_DIR=`pwd`/shared-vol docker-compose -f docker/docker-compose.yml up

Application URLs

Restart SparkLint to get new logs

Sparklint doesn't fetch new logs automatically. To process new logs you can either add them manually through UI or restart Sparklint docker component

cd docker; docker-compose restart sparklint

Cleanup Docker env

Removes all stopped containers and deletes images with intermediate layers.

cd docker; ./cleanup.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.4%
  • Other 0.6%