The purpose of this course is to provide an insight into the field of Data Analysis with large sets of experimental data. The students will learn to use and understand basic tools and methods which are used in real searches in gravitational wave and gamma-ray astronomy, such as those currently employed at AEI and LIGO.
- Install git.
- Create a github account (if you don't have one aready) and log into it.
To make your life easier when you upload the solutions to the next exercises you should now generate on your machine a
ssh
key that will allow you to do operations on your repository without being asked for a username and password each time.
- Generate a ssh key using the terminal/command line (Try this link first and try to figure it out. Go through with step 7 of adding your ssh key to your github account as well! -- If all else fails here is a more detailed guide)
- Once you are logged in with your account, fork this repository by pressing the fork button on the upper right corner of this repository's page.
Now you should have your own repository in your namespace called datalab
<username>/datalab
.
- You should also have a ssh key added to your account to continue - if not use the 'HTTPS' link for the repository - you will be prompted for a username and password everytime. Copy the git url of this repository by going to your github page, the repository and clicking on Code>SSH>copy:
- Open a command line/terminal an clone your repository. The command should look something like:
git clone [email protected]:<username>/datalab.git
This will automatically create a new folder called datalab
inside the folder where you ran the command and will give you an error if such a folder exists. If you want the folder to have another name run git clone gitithub.com:<username>/datalab.git <new_folder_name>
. IF you want to move the entire folder after you have cloned it everything will work fine as the git references are kept in hidden files inside the folder.
- Create a new file in
datalab/solutions/exercise_1.py
and push your changes to your repository.
Solution here
Go to your datalab
folder. Make a new folder called solutins
:
$ mkdir solutions
Create a new file called exercise_1.py
with any method.
$ touch solutions/exercise_1.py
Check the changes to your repository
$ git status
Commit the changes and then push them:
$ git add .
$ git commit -m "Saving my changes."
$ git log
$ git push origin main
To get new changes that are pushed to this main repository the simplest way is to add an upstream and rebase your code. Before you rebase you should commit all your local changes that you want to keep. Try it yourself using this link
Solution here
Go to your datalab
folder.
To see what repositorities you are tracking run git remote -v
- The output will probably look like this
$ git remote -v
origin [email protected]:<your_username>/datalab.git (fetch)
origin [email protected]:<your_username>/datalab.git (push)
Because you did the fork from the interface you can also get the new changes from the interface. But the better way to it is to add a 'remote' pointing to the fork (Add a keyname for the main repository). The textbook name for a repo you forked from is upstream.
Add a remote named upstream pointing to this repo using: git remote add upstream [email protected]:alebot/datalab.git
. Now when you run git remote -v
you should see something like this:
$ git remote -v
origin [email protected]:<your_username>/datalab.git (fetch)
origin [email protected]:<your_username>/datalab.git (push)
upstream [email protected]:alebot/datalab.git (fetch)
upstream [email protected]:alebot/datalab.git (push)
The best way to pull the new changes is using the rebase
comamnd. This means that any commits you have made will be 'rebased' onto the new changes in the repository you have forked. (Make sure you have commited all your changes before proceeding.
$ git status
$ git add .
$ git commit -m "Saving my changes."
$ git log
$ git fetch upstream
$ git rebase upstream/main
$ git log
The first task will be to compile the two C
source files. Go to your datalab/code
folder and simply try run in your command line:
./Makefile
I expect you might get some errors, missing libraries, missing executable. Try to solve them.
You can either do this in your local environment or use docker and run a container with c++ for example this one.
Test this is working correctly by running:
generate_source --help
prober --help
you should get no errors and just a help message.
-
Prepare your python environment. To solve the following exercises we will need preferebly python3 installed and at least a plotting library (such as matplotlib, but probably numpy, pandas etc will be useful as well. If you are using anaconda or miniconda make a new python environment for the datalab.
-
Have an IDE prepared, wheather it is Jupyter, PyCharm, Notebook++, etc - the most important thing is you can easily work with it. Try to write a script that prints "Hello World!" and run it.
Open Exercise_1.pdf read the theory and solve the tasks. Complete solutions here. The same for Exercise 3 and Exercise 4 with solutions. The final assignment sheet and data are in the assignment folder in this repo.
Resources:
- Introduction to HPC (High Performance Computing)
- Introduction to MPI and OpenMP
- Programming with CUDA: parallel reduction in GPUs
- Patterns for Parallel Programming
- Ian Foster Book: Designing and building parallel programms
- Ian Foster Course
- Online Course
- Python multiprocessing Tutorial
- pymp library
- Condor
- Coding Game: Have fun with MPI
- Video: Introduction to parallel programming with MPI
- mpi4py