-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #11 from NERC-CEH/diagram_view
Diagram view of first stage pipeline (from sampling instrument to shared storage)
- Loading branch information
Showing
10 changed files
with
282 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
name: Pages and Graphviz re-render | ||
on: | ||
push: | ||
paths: 'docs/**/*' | ||
|
||
# Allows you to run this workflow manually from the Actions tab | ||
workflow_dispatch: | ||
|
||
# Sets permissions of the GITHUB_TOKEN to allow deployment to GitHub Pages | ||
permissions: | ||
contents: read | ||
pages: write | ||
id-token: write | ||
|
||
jobs: | ||
build: | ||
name: Rebuild graphs and pages | ||
runs-on: ubuntu-latest | ||
defaults: | ||
run: | ||
working-directory: docs | ||
steps: | ||
- uses: actions/checkout@v4 | ||
- name: Setup Ruby | ||
uses: ruby/setup-ruby@v1 | ||
with: | ||
ruby-version: '3.3' # Not needed with a .ruby-version file | ||
bundler-cache: true # runs 'bundle install' and caches installed gems automatically | ||
cache-version: 0 # Increment this number if you need to re-download cached gems | ||
working-directory: '${{ github.workspace }}/docs' | ||
- name: Setup Pages | ||
id: pages | ||
uses: actions/configure-pages@v3 | ||
- name: Build with Jekyll | ||
# Outputs to the './_site' directory by default | ||
# Will this copy the diagrams tho | ||
run: bundle exec jekyll build --baseurl "${{ steps.pages.outputs.base_path }}" | ||
env: | ||
JEKYLL_ENV: production | ||
- uses: ts-graphviz/setup-graphviz@v2 | ||
- name: Diagrams | ||
run: chmod +x ../scripts/render_diagrams.sh; bash ../scripts/render_diagrams.sh | ||
- name: Upload artifact | ||
# Automatically uploads an artifact from the './_site' directory by default | ||
uses: actions/upload-pages-artifact@v1 | ||
with: | ||
path: "docs/_site" | ||
|
||
# Deployment job | ||
deploy: | ||
environment: | ||
name: github-pages | ||
url: ${{ steps.deployment.outputs.page_url }} | ||
runs-on: ubuntu-latest | ||
needs: build | ||
steps: | ||
- name: Deploy to GitHub Pages | ||
id: deployment | ||
uses: actions/deploy-pages@v2 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
source "https://rubygems.org" | ||
# Hello! This is where you manage which Jekyll version is used to run. | ||
# When you want to use a different version, change it below, save the | ||
# file and run `bundle install`. Run Jekyll with `bundle exec`, like so: | ||
# | ||
# bundle exec jekyll serve | ||
# | ||
# This will help ensure the proper Jekyll version is running. | ||
# Happy Jekylling! | ||
#gem "jekyll", "~> 4.3.3" | ||
# This is the default theme for new Jekyll sites. You may change this to anything you like. | ||
gem "minima", "~> 2.5" | ||
# If you want to use GitHub Pages, remove the "gem "jekyll"" above and | ||
# uncomment the line below. To upgrade, run `bundle update github-pages`. | ||
gem "github-pages", "~> 231", group: :jekyll_plugins | ||
gem "webrick" | ||
gem "just-the-docs" | ||
# If you have any plugins, put them here! | ||
group :jekyll_plugins do | ||
gem "jekyll-feed", "~> 0.12" | ||
end | ||
|
||
# Windows and JRuby does not include zoneinfo files, so bundle the tzinfo-data gem | ||
# and associated library. | ||
platforms :mingw, :x64_mingw, :mswin, :jruby do | ||
gem "tzinfo", ">= 1", "< 3" | ||
gem "tzinfo-data" | ||
end | ||
|
||
# Performance-booster for watching directories on Windows | ||
gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin] | ||
|
||
# Lock `http_parser.rb` gem to `v0.6.x` on JRuby builds since newer versions of the gem | ||
# do not have a Java counterpart. | ||
gem "http_parser.rb", "~> 0.6.0", :platforms => [:jruby] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
title: Plankton ML / pipelines | ||
email: [email protected] | ||
description: >- # this means to ignore newlines until "baseurl:" | ||
This repository contains code, proof of concepts, test cases and workflows for low-investment methods to apply image machine learning to plankton characterisation. | ||
baseurl: "" # the subpath of your site, e.g. /blog | ||
url: "" # the base hostname & protocol for your site, e.g. http://example.com | ||
github_username: metazool | ||
|
||
# Build settings | ||
theme: just-the-docs | ||
plugins: | ||
- jekyll-feed |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,34 @@ | ||
# http://www.graphviz.org/content/cluster | ||
|
||
digraph G { | ||
rankdir=LR; | ||
graph [fontname = "Handlee"]; | ||
node [fontname = "Handlee"]; | ||
edge [fontname = "Handlee"]; | ||
|
||
bgcolor=transparent; | ||
|
||
scope [shape=rect label="Microscope \n(FlowCam)"]; | ||
pc [shape=rect label="Local PC"] | ||
|
||
scope2 [shape=rect label="Laser Imaging \n(Flow Cytometer)"]; | ||
pc2 [shape=rect label="Local PC"] | ||
|
||
san [shape=cylinder label="SAN \nprivate cloud"] | ||
vm [shape=rect label="VM \nprivate cloud"] | ||
store [shape=cylinder label="S3 \nobject store"] | ||
|
||
vm->store [label="triggered by app?" fontsize=10]; | ||
scope->pc | ||
scope2->pc2 | ||
|
||
pc2->san [label="physically, via USB stick", fontsize=10]; | ||
pc->san [label="physically, via USB stick", fontsize=10]; | ||
|
||
|
||
san->vm [dir=back] [label="manually run script" fontsize=10]; | ||
|
||
} | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
# http://www.graphviz.org/content/cluster | ||
|
||
digraph G { | ||
rankdir=LR; | ||
graph [fontname = "Handlee"]; | ||
node [fontname = "Handlee"]; | ||
edge [fontname = "Handlee"]; | ||
|
||
bgcolor=transparent; | ||
|
||
scope [shape=rect label="Microscope \n(FlowCam)"]; | ||
pc [shape=rect label="Local PC"] | ||
|
||
scope2 [shape=rect label="Laser imaging \n(Flow Cytometer)"]; | ||
pc2 [shape=rect label="Local PC"] | ||
|
||
san [shape=cylinder label="SAN \nprivate cloud"] | ||
engine [shape=rect label="Workflow engine"] | ||
tasks [label="Task graph"] | ||
store [shape=cylinder label="S3 \nobject store"] | ||
|
||
engine->tasks | ||
tasks->san; | ||
tasks->store []; | ||
scope->pc | ||
scope2->pc2 | ||
|
||
pc2->san [label="pull on a schedule?", dir=back,fontsize=10]; | ||
|
||
pc->san [label="push on a schedule?", fontsize=10]; | ||
|
||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# http://www.graphviz.org/content/cluster | ||
|
||
digraph G { | ||
rankdir=LR; | ||
|
||
edge [fontname = "Handlee"]; | ||
|
||
graph [fontsize=10 fontname="Handlee"]; | ||
node [shape=record fontsize=10 fontname="Handlee"]; | ||
|
||
bgcolor=transparent; | ||
|
||
subgraph cluster_0 { | ||
style=filled; | ||
color=lightgrey; | ||
node [color=white,style=filled]; | ||
store -> chunk -> sift -> profile -> upload; | ||
label = "Task flow"; | ||
fontsize = 20; | ||
} | ||
} | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
--- | ||
# Feel free to add content and custom Front Matter to this file. | ||
# To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults | ||
|
||
layout: home | ||
title: Plankton ML - workflow diagrams | ||
--- | ||
|
||
# Workflow Diagrams | ||
|
||
Views of the flow of data from the imaging instrument to cloud-accessible storage | ||
|
||
### As is | ||
|
||
Data saved during a session with the microscope is downloaded onto a USB key, then uploaded from a researcher's laptop into a shared storage area on a site-specific SAN. | ||
|
||
Later, a data scientist logs into a virtual machine in the on-premise "private cloud" and runs more than one script to read the data, process it for analysis, and then upload to s3 storage hosted at JASMIN. Authorisation in this chain requires personal credentials. | ||
|
||
<object data="as_is/instrument_to_store.svg" type="image/svg+xml"> | ||
</object> | ||
|
||
There are file naming conventions including metadata which doesn't follow the same path as the data, and there are spatio-temporal properties of the samples which could be recorded. | ||
|
||
### Could be | ||
|
||
PC that drives the instrument is connected to the storage network, but not the internet (for security standards compliance reasons). What are the current precedents for either directly saving output to shared storage, or a watcher process that either pulls or pushes data from a lab PC to networked storage? | ||
|
||
Automated workflow (could be Apache Airflow or Beam based - FDRI project is trialling components) which watches for new source data, distributes the preprocessing with Dask or Spark if necessary, and publishes analysis-ready data _and metadata_ to cloud storage, continuously. | ||
|
||
<object data="could_be/instrument_to_store.svg" type="image/svg+xml"> | ||
</object> | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,20 @@ | ||
--- | ||
# Feel free to add content and custom Front Matter to this file. | ||
# To modify the layout, see https://jekyllrb.com/docs/themes/#overriding-theme-defaults | ||
|
||
layout: home | ||
title: Plankton ML | ||
--- | ||
|
||
# Plankton ML | ||
|
||
This is a small experimental project on automating the analysis of plankton images | ||
|
||
* Inform related work on reproducible analytical pipelines for bioimage machine learning by grounding them in a concrete use case | ||
* Evaluate reusable components (e.g. the Cefas plankton model from scivision) and associated trade-offs | ||
* Evolve a shared template for similar smaller projects undertaken by members of the RSE group in the Environmental Data Service, UK Centre for Ecology and Hydrology | ||
|
||
Please see the associated Github repository which has [outline tasks in Issues](https://github.com/NERC-CEH/plankton_ml/issues) and [prototype work in pull requests](https://github.com/NERC-CEH/plankton_ml/pulls) | ||
|
||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
#!/bin/bash | ||
# Copilot generated script to render diagrams as SVG | ||
|
||
# Set the directory path | ||
DIR="./diagrams/" | ||
SITE="_site/" | ||
|
||
# Loop through each subdirectory | ||
for sub_dir in "$DIR"*/; do | ||
# Loop through each dot file in the subdirectory | ||
for dotfile in "$sub_dir"*.dot; do | ||
# Get the base name without extension | ||
base_name=$(basename "$dotfile" .dot) | ||
dir_path=${sub_dir//diagrams/_site\/diagrams} | ||
mkdir -p $dir_path | ||
output="$dir_path$base_name.svg" | ||
|
||
# Render the dot file to SVG | ||
dot -Tsvg "$dotfile" -o $output | ||
|
||
# Print a success message | ||
echo "Rendered $dotfile to $output" | ||
done | ||
done | ||
|
||
|