Merge pull request #78 from simleo/more_queries

Add SPARQL queries for CQ7, CQ8, CQ9
ResearchObject · May 31, 2024 · a6a6350 · a6a6350
2 parents 3285eb1 + 7187c02
commit a6a6350
Show file tree

Hide file tree

Showing 5 changed files with 182 additions and 2 deletions.
diff --git a/docs/requirements.md b/docs/requirements.md
@@ -27,8 +27,8 @@ id | CQ description | Existing/new terms | Rationale | Profile[^1] | Issue # |
  CQ5 | How long does this workflow component take to run? | [totalTime](http://schema.org/totalTime)? Allowed on [HowTo](http://schema.org/HowTo) and [HowToDirection](http://schema.org/HowToDirection) but not on [HowToStep](http://schema.org/HowToStep). Can also get actual duration from [endTime](http://schema.org/endTime) - [startTime](http://schema.org/startTime) on the action | If a workflow step is computationally expensive, I may need to get an estimate for impatient users, or show a warning | 1, 3 | [~~13~~](https://github.com/ResearchObject/workflow-run-crate/issues/13) |
  CQ6 | How long does this workflow take to run? | [totalTime](http://schema.org/totalTime). Can also get actual duration from [endTime](http://schema.org/endTime) - [startTime](http://schema.org/startTime) on the action | Same as CQ5, but with the full workflow | 2, 3 | [~~14~~](https://github.com/ResearchObject/workflow-run-crate/issues/14) |
  CQ7 | Was the execution successful? | [actionStatus](http://schema.org/actionStatus) to [FailedActionStatus](http://schema.org/FailedActionStatus) or [CompletedActionStatus](http://schema.org/CompletedActionStatus) - can also provide [error](http://schema.org/error) | Needed to know whether or not retrieve the results | 1, 2, 3 | [~~15~~](https://github.com/ResearchObject/workflow-run-crate/issues/15) |
- CQ8 | What are the inputs and outputs of the overall workflow (I don't care about the intermediate results) | [object](http://schema.org/object) and [result](http://schema.org/result) on the workflow run action | High level representation of the workflow execution | 2, 3 | [~~16~~](https://github.com/ResearchObject/workflow-run-crate/issues/16) |
- CQ9 | What is the source code version of the component executed in a workflow step? Is it a script? and executable? | [softwareVersion](http://schema.org/softwareVersion), though getting the version of the actual tool (e.g., `grep`) that was called by the wrapper might not be easy | Knowing which release/software version was used (reproducibility) | 1, 3 | [~~17~~](https://github.com/ResearchObject/workflow-run-crate/issues/17) |
+ CQ8 | What are the inputs and outputs of the overall workflow? | [object](http://schema.org/object) and [result](http://schema.org/result) on the workflow run action | High level representation of the workflow execution | 2, 3 | [~~16~~](https://github.com/ResearchObject/workflow-run-crate/issues/16) |
+ CQ9 | What is the source code version of the component executed in a workflow step? | [softwareVersion](http://schema.org/softwareVersion), though getting the version of the actual tool (e.g., `grep`) that was called by the wrapper might not be easy | Knowing which release/software version was used (reproducibility) | 1, 3 | [~~17~~](https://github.com/ResearchObject/workflow-run-crate/issues/17) |
  CQ10 | What is the script used to wrap up a software component? | We're mapping tool wrappers (e.g., `foo.cwl`) to [SoftwareApplication](http://schema.org/SoftwareApplication). Wrappers at lower levels can also be `SoftwareApplication`, but we need to draw the line somewhere | Many executables are complicated, and need an additional script to wrap them up or simplify. For example a "run.sh" script that exposes a simpler set of parameters and fixes another set. | 3 | [~~18~~](https://github.com/ResearchObject/workflow-run-crate/issues/18) |
  CQ11 | How were workflow parameters used in tool runs? | We're linking tool params directly (with [connectedTo](http://schema.org/connectedTo)), but that's inaccurate since those links only exist within a workflow. | Knowing how workflow parameters were passed to individual tools to find out how they affected the outputs | 3 | [~~25~~](https://github.com/ResearchObject/workflow-run-crate/issues/25) |
 

diff --git a/docs/sparql/cq7.py b/docs/sparql/cq7.py
@@ -0,0 +1,30 @@
+"""\
+This script contains the SPARQL query for Competency Question 7 "Was the
+execution successful?". In the discussion on
+https://github.com/ResearchObject/workflow-run-crate/issues/15 we decided to
+represent this by adding an "actionStatus" property to actions, and consider
+an execution successful if its value is "CompletedActionStatus" and not
+successful if the value is "FailedActionStatus".
+"""
+
+import rdflib
+from pathlib import Path
+
+CRATE = Path("crate")
+
+g = rdflib.Graph()
+g.parse(CRATE/"ro-crate-metadata.json")
+
+QUERY = """\
+PREFIX s: <http://schema.org/>
+
+SELECT ?action ?status
+WHERE {
+?action a s:CreateAction .
+?action s:actionStatus ?status .
+}
+"""
+
+qres = g.query(QUERY)
+for row in qres:
+    print(f"{row.action}, {row.status}")
diff --git a/docs/sparql/cq8.py b/docs/sparql/cq8.py
@@ -0,0 +1,53 @@
+"""\
+This script contains the SPARQL query for Competency Question 8 "What are the
+inputs and outputs of the overall workflow?". In the discussion on
+https://github.com/ResearchObject/workflow-run-crate/issues/16 we identified
+them as the "object" and "result" of the action corresponding to the
+workflow's execution.
+"""
+
+import rdflib
+from pathlib import Path
+
+CRATE = Path("crate")
+
+g = rdflib.Graph()
+g.parse(CRATE/"ro-crate-metadata.json")
+
+QUERY = """\
+PREFIX s: <http://schema.org/>
+PREFIX bioschemas: <https://bioschemas.org/>
+
+SELECT ?obj
+WHERE {
+?action a s:CreateAction .
+?workflow a bioschemas:ComputationalWorkflow .
+?action s:instrument ?workflow .
+OPTIONAL { ?action s:object ?obj } .
+}
+"""
+
+qres = g.query(QUERY)
+print("INPUTS")
+print("======")
+for row in qres:
+    print(row.obj)
+
+QUERY = """\
+PREFIX s: <http://schema.org/>
+PREFIX bioschemas: <https://bioschemas.org/>
+
+SELECT ?res
+WHERE {
+?action a s:CreateAction .
+?workflow a bioschemas:ComputationalWorkflow .
+?action s:instrument ?workflow .
+OPTIONAL { ?action s:result ?res } .
+}
+"""
+
+qres = g.query(QUERY)
+print("OUTPUTS")
+print("=======")
+for row in qres:
+    print(row.res)
diff --git a/docs/sparql/cq9.py b/docs/sparql/cq9.py
@@ -0,0 +1,32 @@
+"""\
+This script contains the SPARQL query for Competency Question 9 "What is the
+source code version of the component executed in a workflow step?". In
+https://github.com/ResearchObject/workflow-run-crate/pull/42 we ended up using
+"softwareVersion" with a fallback on "version" on the "SoftwareApplication"
+entity, which is used both in Process Run Crates and Provenance Run Crates for
+individual tools.
+"""
+
+import rdflib
+from pathlib import Path
+
+CRATE = Path("process_run_crate")
+
+g = rdflib.Graph()
+g.parse(CRATE/"ro-crate-metadata.json")
+
+QUERY = """\
+PREFIX s: <http://schema.org/>
+
+SELECT ?name ?version
+WHERE {
+?app a s:SoftwareApplication .
+?app s:name ?name .
+OPTIONAL { ?app s:softwareVersion ?version } .
+OPTIONAL { ?app s:version ?version } .
+}
+"""
+
+qres = g.query(QUERY)
+for row in qres:
+    print(row.name, row.version)
diff --git a/docs/sparql/process_run_crate/ro-crate-metadata.json b/docs/sparql/process_run_crate/ro-crate-metadata.json
@@ -0,0 +1,65 @@
+{
+    "@context": "https://w3id.org/ro/crate/1.1/context", 
+    "@graph": [
+        {
+            "@id": "ro-crate-metadata.json",
+            "@type": "CreativeWork",
+            "conformsTo": {"@id": "https://w3id.org/ro/crate/1.1"},
+            "about": {"@id": "./"}
+        },
+        {
+            "@id": "./",
+            "@type": "Dataset",
+            "conformsTo": {"@id": "https://w3id.org/ro/wfrun/process/0.1"},
+            "hasPart": [
+                {"@id": "pics/2017-06-11%2012.56.14.jpg"},
+                {"@id": "pics/sepia_fence.jpg"}
+            ],
+            "mentions": {"@id": "#SepiaConversion_1"},
+            "name": "My Pictures"
+        },
+        {
+            "@id": "https://w3id.org/ro/wfrun/process/0.1",
+            "@type": "CreativeWork",
+            "name": "Process Run Crate",
+            "version": "0.1"
+        },
+        {
+            "@id": "https://www.imagemagick.org/",
+            "@type": "SoftwareApplication",
+            "url": "https://www.imagemagick.org/",
+            "name": "ImageMagick",
+            "softwareVersion": "6.9.7-4"
+        },
+        {
+            "@id": "#SepiaConversion_1",
+            "@type": "CreateAction",
+            "name": "Convert dog image to sepia",
+            "description": "convert -sepia-tone 80% test_data/sample/pics/2017-06-11\\ 12.56.14.jpg test_data/sample/pics/sepia_fence.jpg",
+            "endTime": "2018-09-19T17:01:07+10:00",
+            "instrument": {"@id": "https://www.imagemagick.org/"},
+            "object": {"@id": "pics/2017-06-11%2012.56.14.jpg"},
+            "result": {"@id": "pics/sepia_fence.jpg"},
+            "agent": {"@id": "https://orcid.org/0000-0001-9842-9718"}
+        },
+        {
+            "@id": "pics/2017-06-11%2012.56.14.jpg",
+            "@type": "File",
+            "description": "Original image",
+            "encodingFormat": "image/jpeg",
+            "name": "2017-06-11 12.56.14.jpg (input)"
+        },
+        {
+            "@id": "pics/sepia_fence.jpg",
+            "@type": "File",
+            "description": "The converted picture, now sepia-colored",
+            "encodingFormat": "image/jpeg",
+            "name": "sepia_fence (output)"
+        },
+        {
+            "@id": "https://orcid.org/0000-0001-9842-9718",
+            "@type": "Person",
+            "name": "Stian Soiland-Reyes"
+        }
+    ]
+}