reproducible-research-ideas

I'll use this repo to dump ideas about reproducible research.

My current workflow for simulations

Write a text input file for a given simulation program (e.g Gaussian/GAMESS/Nwchem input files), or use a GUI to produce a binary or text file (for example, Abaqus CAE).
Run the program
Take the textual output directly or use a script to extract relevant data from the output (which can also be binary)
Process data to create a plot, only with experimental points or also with a fitting function; in some cases, compare multiple data coming from multiple input files
Tweak code in order to get a better image
Include image in .latex source
Compile LaTeX to PDF.

Problems

Not knowing exactly the kind of simulations and the typical result, input and output files got scattered in the filesystem, without any kind of order.
A versioning system was used for the LaTeX source but not for input and output files, causing uncertainty about the "update status" of graphs (I didn't know if a simulation re-run was needed)
Some team members wrote text in Microsoft Word instead of writing LaTeX in a simple editor, causing various problems of formatting during the copy/paste; this led to a good amount of time loss
Multiple simulations were run with customs scripts, in foreground, with no control over the process (for example it was impossible to stop a single job without stopping everything and forcing to restart every simulations, even those already successfully run; if a job failed, there wasn't a clear indication)
Time constraints put so much pressure that code duplication happened (for graphs creation/manipulation)
Libraries/programs versions were not tracked
Matplotlib didn't provide reasonable defaults, thus requiring constant tweaking, worsened by code duplication
Code was entangled: there wasn't a clear separation between presentation code (graphical tweaks) and logic
The team chose to produce color images, but it would have been nice to produce both B/W and color
Some code was provided in appendix, but that's not nearly enough to make simulations reproducible
GIT history was messy due to the lack of squashing
PDF compilation became slow because \input{} was used instead of \include{}

Workflow proposal (WIP)

Use a queue software like Grid Engine (GE). GE can be easily installed on ubuntu via repositories, an installation from source code is discouraged (it's a mess of scripts)
If needed, configure a Beowulf cluster (I already did that for fun)
If the beowulf cluster is too slow, request resources to the professor for time on a university server/cluster. In any case, use a script to automate the upload and the job submission
Use GIT for everything (this was easy)
Explore literate programming (e.g Pweave)
Use tools like vagrant to guarantee reproducible environments
Use HDF5 to save simulations output as an intermediate step. Therefore we would save
input files
output files
HDF file containing results from output (obviously produced by a script)
Find a way to separate code regarding visual tweaking from "logic code". In addition, clearly separate the various steps (output to HDF5 processing, HDF5 processing). Write functions that accept an Experiment and produce a simple graph, a graph with fitting, or a function that compares multiple experiments
Investigate the easiest way to incorporate multiple subimages (it could be a single image with subplots coming from matplotlib, or multiple images stiched together by LaTeX)

Interesting links

http://bkanuka.com/articles/native-latex-plots/

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

reproducible-research-ideas

My current workflow for simulations

Problems

Workflow proposal (WIP)

Interesting links

About

Releases

Packages

davethecipo/reproducible-research-ideas

Folders and files

Latest commit

History

Repository files navigation

reproducible-research-ideas

My current workflow for simulations

Problems

Workflow proposal (WIP)

Interesting links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages