Skip to content

Commit

Permalink
Update ontology summary (#2329)
Browse files Browse the repository at this point in the history
* enable release diff in odk config

* run release simple diff and post diff in PR

* fix release artefact uri and diff filename

* add edges, synonyms, xrefs and cl_terms reports to odk config

* update custom reports to only cl

* wip ontology content report

* add cxg and hra numbers

* revert default report sparql queries

* create custom reports for CL only

* add custom CL reports to odk config

* script to generate content summary

* add command to generate content summary as part of release

* revert changes in edges, synonyms and xrefs reports

* add instructions to add summary table in release notes

* use cl-base to run diff

* change file name and actions version

* update to commit cl-base-diff and not full release diff

* add report on diff between release and previous release

The report is generated by the OAK diff command using the base releases
for better comparison. The report shows new terms, new relationships,
obsolete terms, changes on synonyms and definitions. The report is
appended to the table with the ontology summary content and saved in the
reports/summary_release.md file to be used as a release note.

* update cl-release docs to reflect the new release process

The release notes need to be updated and this commit explains how to
fix a current issue in the OAK diff command when generating the report.

* rewording explanation in readme

Co-authored-by: Aleix Puig <[email protected]>

* add missing dependency in prepare_content_summary goal

The file is used in the rule, but it wasn't defined as a dependency, which could be used as an updated file.

Co-authored-by: Nico Matentzoglu <[email protected]>

* update the sparql queries for the custom reports

Filter out the obsolete classes and the obsolete CP namespace from the
queries not to count them in the custom reports and so to the ontology
content summary report generated for the releases.

* use cl-base.obo to generate robot release base diff

We need to download the cl-base.obo to generate the output for the OAK
diff command, so we can use the same artefact to generate the robot diff
instead of downloading another artefact. This also adds the two dependencies
for the `release-base-diff` target to make sure the files are updated.

* improve the documentation about CL release workflow

Update the link to the documentation about how to update the imports
because the previous one was linking to an non-existing page.
Change to inline code syntax instead of code block the GitHub release
link because it was breaking the list numbers, making it to reset the
numbering.
Finally undo the change on the number of the last three items on the list
as mistakenly done on the previous commit.

---------

Co-authored-by: Anita Caron <[email protected]>
Co-authored-by: Aleix Puig <[email protected]>
Co-authored-by: Nico Matentzoglu <[email protected]>
  • Loading branch information
4 people authored Jun 12, 2024
1 parent 3d32380 commit 47e9202
Show file tree
Hide file tree
Showing 14 changed files with 313 additions and 22 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/post-release-diff.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
pull_request:
branches: [ master ]
paths:
- 'src/ontology/diffs/cl-diff.md'
- 'src/ontology/reports/cl-base-diff.md'

# Allows you to run this workflow manually from the Actions tab
workflow_dispatch:
Expand All @@ -15,15 +15,15 @@ jobs:
post_diff:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4
- name: Prepare release comment
env:
GITHUB_SHA: ${{ github.sha }}
run: "echo \"[Here's a diff of how this release impacts cl.owl](https://github.com/obophenotype/cell-ontology/blob/${{ env.GITHUB_SHA }}/src/ontology/diffs/cl-diff.md)\" >comment.md"
run: "echo \"[Here's a diff of how this release impacts cl-base.owl](https://github.com/obophenotype/cell-ontology/blob/${{ env.GITHUB_SHA }}/src/ontology/reports/cl-base-diff.md)\" >comment.md"
- name: Post reasoned comment
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
uses: NejcZdovc/comment-pr@v1.1.1
uses: NejcZdovc/comment-pr@v2
with:
github_token: ${{ env.GITHUB_TOKEN }}
file: "../../comment.md"
Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ src/patterns/pattern_owl_seed.txt
src/ontology/ontologyterms.txt
src/ontology/simple_seed.txt
src/ontology/reports/*
!src/ontology/reports/cl-base-diff.md
src/ontology/cl-hipc.owl
site/
src/ontology/cl-check.obo
Expand Down
8 changes: 3 additions & 5 deletions docs/cl-release.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Preparation:
1. Make sure that all changes to master are committed to Github (`git status` should say that there are no modified files)
1. Locally make sure you have the latest changes from master (`git pull`)
1. Checkout a new branch (e.g. `git checkout -b release-2021-01-01`)
1. You may or may not want to refresh your imports as part of your release strategy (see [here](UpdateImports.md))(Note: in CL we decouple our imports and releases - we hence advice that you do not update imports)
1. You may or may not want to refresh your imports as part of your release strategy (see second section [here](Adding_classes_from_another_ontology.md))(Note: in CL we decouple our imports and releases - we hence advice that you do not update imports)
1. Make sure you have the latest ODK installed by running `docker pull obolibrary/odkfull`

To actually run the release, you:
Expand All @@ -54,12 +54,10 @@ To actually run the release, you:
1. Deploy release on GitHub by running `make deploy_release GHVERSION="v2022-06-20"` on the release branch (DO NOTE CHANGE TO MAIN BRANCH!), replacing the date with the date of release (NOTE: no `sh run.sh`)
Editors note: ODK 1.3.2 will have a feature to run the release from inside the docker container. For now deploy_release has to be run outside.
1. This should end with a GitHub release link that looks something like:
```
https://github.com/obophenotype/cl/releases/tag/untagged-8935f3432525b27a0d84
```
`https://github.com/obophenotype/cl/releases/tag/untagged-8935f3432525b27a0d84`
Copy the link and paste it in your browser, this should show you a draft release.
1. Click the edit button (the pencil button on the top right corner) and change the tag to the GHVERSION you entered above (eg v2022-06-20)
1. Change the `TBD.` in the main text to a summary of the main changes in the release if needed.
1. Change the `TBD.` in the main text to a summary of the main changes in the release if needed. Copy and paste the text and table from the `reports/summary_release.md` file. This file is in `.gitignore` and will only be available to those who have run the release. The section `Classes added` needs to be manually amended due a [known issue](https://github.com/INCATools/ontology-access-kit/issues/732) in the OAK diff command. Remove the duplicated classes and update the number of new classes created.
1. Scroll down all the way and click the `update release` button.


6 changes: 3 additions & 3 deletions src/ontology/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
# More information: https://github.com/INCATools/ontology-development-kit/

# Fingerprint of the configuration file when this Makefile was last generated
CONFIG_HASH= 8b5b779b91f8bb931caf3512d6c7fcb325ef83bafc7ba409b86058a9dae7f67f
CONFIG_HASH= b786b0d7cbd09184896d55b42fe68a40981419e0c9d848963a74348b7bb955b7


# ----------------------------------------
Expand Down Expand Up @@ -45,7 +45,7 @@ REPORT_LABEL =
REPORT_PROFILE_OPTS =
OBO_FORMAT_OPTIONS =
SPARQL_VALIDATION_CHECKS = equivalent-classes owldef-self-reference nolabels pmid-not-dbxref obsolete-replaced_by obsolete-alt-id orcid-contributor illegal-annotation-property label-synonym-polysemy illegal-date
SPARQL_EXPORTS = basic-report
SPARQL_EXPORTS = cl_terms cl-edges cl-synonyms cl-xrefs cl-def-xrefs
ODK_VERSION_MAKEFILE = v1.5

TODAY ?= $(shell date +%Y-%m-%d)
Expand Down Expand Up @@ -87,7 +87,7 @@ endif
all: all_odk

.PHONY: all_odk
all_odk: odkversion config_check test custom_reports all_assets
all_odk: odkversion config_check test custom_reports all_assets release_diff

.PHONY: test
test: odkversion dosdp_validation reason_test sparql_test robot_reports $(REPORTDIR)/validate_profile_owl2dl_$(ONT).owl.txt
Expand Down
9 changes: 7 additions & 2 deletions src/ontology/cl-odk.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ report_fail_on: None
use_dosdps: TRUE
use_mappings: True
use_edit_file_imports: FALSE
release_diff: TRUE
export_formats:
- owl
- obo
Expand Down Expand Up @@ -112,8 +113,12 @@ robot_report:
- illegal-annotation-property
- label-synonym-polysemy
- illegal-date
custom_sparql_exports :
- basic-report
custom_sparql_exports:
- cl_terms
- cl-edges
- cl-synonyms
- cl-xrefs
- cl-def-xrefs
components:
products:
- filename: hra_subset.owl
Expand Down
22 changes: 17 additions & 5 deletions src/ontology/cl.Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -337,13 +337,25 @@ DEPLOY_GH=true

.PHONY: cl
cl:
$(MAKE) prepare_release IMP=false PAT=false
$(MAKE) release-diff
$(MAKE) prepare_release IMP=false PAT=false MIR=false
$(MAKE) release-base-diff
$(MAKE) prepare_content_summary
if [ $(DEPLOY_GH) = true ]; then $(MAKE) deploy_release GHVERSION="v$(TODAY)"; fi

.PHONY: release-diff
release-diff:
$(ROBOT) diff --labels True -f markdown --left-iri http://purl.obolibrary.org/obo/cl.owl --right ../../cl.owl --output diffs/$(ONT)-diff.md
CURRENT_BASE_RELEASE=$(ONTBASE)/cl-base.obo

$(TMPDIR)/current-base-release.obo:
wget $(CURRENT_BASE_RELEASE) -O $@

.PHONY: release-base-diff
release-base-diff: $(TMPDIR)/current-base-release.obo $(RELEASEDIR)/cl-base.obo
$(ROBOT) diff --labels True -f markdown --left $(TMPDIR)/current-base-release.obo --right $(RELEASEDIR)/cl-base.obo --output reports/$(ONT)-base-diff.md

.PHONY: prepare_content_summary
prepare_content_summary: $(RELEASEDIR)/cl-base.owl $(RELEASEDIR)/cl-base.obo $(TMPDIR)/current-base-release.obo custom_reports
python ./$(SCRIPTSDIR)/content_summary.py --ontology_iri $< --ont_namespace "CL" > $(REPORTDIR)/ontology_content.md
runoak -i simpleobo:$(TMPDIR)/current-base-release.obo diff -X simpleobo:$(RELEASEDIR)/cl-base.obo -o $(REPORTDIR)/diff_release_oak.md --output-type md
cat $(REPORTDIR)/ontology_content.md $(REPORTDIR)/diff_release_oak.md > $(REPORTDIR)/summary_release.md

FILTER_OUT=../patterns/definitions.owl ../patterns/pattern.owl reports/cl-edit.owl-obo-report.tsv
MAIN_FILES_RELEASE = $(foreach n, $(filter-out $(FILTER_OUT), $(RELEASE_ASSETS)), ../../$(n)) \
Expand Down
2 changes: 1 addition & 1 deletion src/ontology/reports/edges.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -2775,4 +2775,4 @@
<http://purl.obolibrary.org/obo/CL_0000662> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/CL_0000468>
<http://purl.obolibrary.org/obo/CL_0000218> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/CL_0000217>
<http://purl.obolibrary.org/obo/CL_0002136> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/CL_0000460>
<http://purl.obolibrary.org/obo/CL_0002303> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/CL_0000529>
<http://purl.obolibrary.org/obo/CL_0002303> <http://www.w3.org/2000/01/rdf-schema#subClassOf> <http://purl.obolibrary.org/obo/CL_0000529>
2 changes: 1 addition & 1 deletion src/ontology/reports/synonyms.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -4598,4 +4598,4 @@
<http://purl.obolibrary.org/obo/CL_0000919> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "CD8-positive, CD25-positive Treg"
<http://purl.obolibrary.org/obo/CL_0002299> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "pale thymic epithelial cell"
<http://purl.obolibrary.org/obo/CL_0000705> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "R6 cell"
<http://purl.obolibrary.org/obo/CL_1000323> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "goblet cell of epithelium of pyloric gland"
<http://purl.obolibrary.org/obo/CL_1000323> <http://www.geneontology.org/formats/oboInOwl#hasExactSynonym> "goblet cell of epithelium of pyloric gland"
2 changes: 1 addition & 1 deletion src/ontology/reports/xrefs.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -1439,4 +1439,4 @@
<http://purl.obolibrary.org/obo/CL_1000323> "FMA:263061"
<http://purl.obolibrary.org/obo/CL_0000636> "BTO:0003064"
<http://purl.obolibrary.org/obo/CL_1000342> "FMA:263102"
<http://purl.obolibrary.org/obo/CL_1001220> "KUPO:0001086"
<http://purl.obolibrary.org/obo/CL_1001220> "KUPO:0001086"
198 changes: 198 additions & 0 deletions src/scripts/content_summary.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,198 @@
""" Script to summarize content in an ontology """
import argparse
from datetime import datetime

import pandas as pd
from rdflib import Graph


class OntologyContentReport:
"""Generic class for summarizing content in an ontology"""

def __init__(self, ontology_iri, ont_namespace):
"""
Initialize the OntologyContentReport object.
Args:
ontology_iri (str): The IRI or filepath of the ontology to summarize.
ont_namespace (str): The namespace of the ontology.
"""
self.ontology_iri = ontology_iri
self.ont_namespace = ont_namespace
self.g = self._init_graph(ontology_iri)
self.date = datetime.now().strftime("%Y-%m-%d")
self.nb_subclass_root = None
self.nb_annotations = None
self.nb_synonyms = None
self.nb_references = None
self.nb_def_references = None
self.nb_relationships = None
self.nb_cxg = None
self.nb_hra = None

def _init_graph(self, ontology_iri):
"""
Load the given ontology into a Graph object.
Args:
ontology_iri (str): The IRI or filepath of the ontology.
Returns:
rdflib.Graph: The loaded ontology graph.
"""
g = Graph()
g.parse(ontology_iri, format="xml")
return g

def query(self, query):
"""
Execute a SPARQL query on the ontology graph.
Args:
query (str): The SPARQL query to execute.
Returns:
int: The count of query results.
"""
response = self.g.query(query)
return response.bindings[0]["count"]

def get_content_summary(self):
"""
Query the ontology graph to get the content summary.
"""
self.nb_subclass_root = self.query(f"""
SELECT (COUNT (DISTINCT ?class) AS ?count)
WHERE {{
?ont rdf:type owl:Ontology .
?ont <http://purl.obolibrary.org/obo/IAO_0000700> ?root .
?class rdfs:subClassOf* ?root .
FILTER (STRSTARTS(STR(?class), "http://purl.obolibrary.org/obo/{self.ont_namespace}_"))
}}
""")

self.nb_annotations = self.query(f"""
SELECT (COUNT (?annotation) AS ?count)
WHERE {{
?annotation rdf:type owl:AnnotationProperty .
?class rdf:type owl:Class .
?class ?annotation ?value .
FILTER (STRSTARTS(STR(?class), "http://purl.obolibrary.org/obo/{self.ont_namespace}_"))
}}
""")

self.nb_cxg = self.query(f"""
SELECT (COUNT (?cxg) AS ?count)
WHERE {{
?cxg rdf:type owl:Class .
?cxg <http://www.geneontology.org/formats/oboInOwl#inSubset> <http://purl.obolibrary.org/obo/cl#cellxgene_subset> .
FILTER (STRSTARTS(STR(?cxg), "http://purl.obolibrary.org/obo/{self.ont_namespace}_"))
}}
""")

self.nb_hra = self.query(f"""
SELECT (COUNT (?hra) AS ?count)
WHERE {{
?hra rdf:type owl:Class .
?hra <http://www.geneontology.org/formats/oboInOwl#inSubset> <http://purl.obolibrary.org/obo/uberon/core#human_reference_atlas> .
FILTER (STRSTARTS(STR(?hra), "http://purl.obolibrary.org/obo/{self.ont_namespace}_"))
}}
""")

self.nb_synonyms = self.count_report(
self.load_report(f"{self.ont_namespace.lower()}-synonyms")
)

self.nb_relationships = self.count_report(
self.load_report(f"{self.ont_namespace.lower()}-edges")
)

self.nb_references = self.count_report(self.load_report(
f"{self.ont_namespace.lower()}-xrefs")["?xref"].unique()
)

self.nb_def_references = self.count_report(
self.load_report(
f"{self.ont_namespace.lower()}-def-xrefs"
)["?xref"].unique()
)

def load_report(self, report_type):
"""
Load a report from a file.
Args:
report_type (str): The type of report to load.
Returns:
pandas.DataFrame: The loaded report data.
"""
return pd.read_csv(f"reports/{report_type}.tsv", sep="\t")

def count_report(self, data):
"""
Count the number of rows in a report.
Args:
data (pandas.DataFrame): The report data.
Returns:
int: The number of rows in the report.
"""
return len(data)

def prepare_report(self):
"""
Prepare the content summary report for printing.
"""
print(f"# Release Notes {self.date}")
print("## Ontology content summary")

summary_table = [
{
"Metric": "Number of subclasses of root",
"Value": self.nb_subclass_root
},
{
"Metric": f"Number of annotations on {self.ont_namespace} terms",
"Value": self.nb_annotations
},
{
"Metric": "Number of synonyms",
"Value": self.nb_synonyms
},
{
"Metric": "Number of unique references",
"Value": self.nb_references
},
{
"Metric": "Number of unique references in definitions",
"Value": self.nb_def_references
},
{
"Metric": f"Number of relationships with {self.ont_namespace} term as subject",
"Value": self.nb_relationships
},
{
"Metric": "Number of cellxgene classes",
"Value": self.nb_cxg
},
{
"Metric": "Number of HRA classes",
"Value": self.nb_hra
}
]

print(pd.DataFrame(summary_table).to_markdown(index=False))


if __name__ == "__main__":
cli = argparse.ArgumentParser()
cli.add_argument("--ontology_iri", type=str, help="IRI or filepath of ontology to summarize")
cli.add_argument("--ont_namespace", type=str, help="Ontology namespace")

args = cli.parse_args()

report = OntologyContentReport(args.ontology_iri, args.ont_namespace)
report.get_content_summary()
report.prepare_report()
16 changes: 16 additions & 0 deletions src/sparql/cl-def-xrefs.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
prefix oio: <http://www.geneontology.org/formats/oboInOwl#>
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix definition: <http://purl.obolibrary.org/obo/IAO_0000115>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?cls ?xref WHERE
{
?cls definition: ?def .
?ax a owl:Axiom;
owl:annotatedSource ?cls;
owl:annotatedProperty definition:;
owl:annotatedTarget ?def;
oio:hasDbXref ?xref .
FILTER NOT EXISTS { ?cls owl:deprecated "true"^^xsd:boolean . }
FILTER(isIRI(?cls) && STRSTARTS(str(?cls), "http://purl.obolibrary.org/obo/CL_") || STRSTARTS(str(?cls), "http://purl.obolibrary.org/obo/cl#"))
}
21 changes: 21 additions & 0 deletions src/sparql/cl-edges.sparql
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
prefix owl: <http://www.w3.org/2002/07/owl#>
prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>
prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
prefix xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?x ?p ?y
WHERE {
{?x rdfs:subClassOf [
a owl:Restriction ;
owl:onProperty ?p ;
owl:someValuesFrom ?y ]
}
UNION {
?x rdfs:subClassOf ?y .
BIND(rdfs:subClassOf AS ?p)
}
?x a owl:Class .
?y a owl:Class .
FILTER NOT EXISTS { ?x owl:deprecated "true"^^xsd:boolean . }
FILTER(isIRI(?x) && STRSTARTS(str(?x), "http://purl.obolibrary.org/obo/CL_") || STRSTARTS(str(?x), "http://purl.obolibrary.org/obo/cl#"))
}
Loading

0 comments on commit 47e9202

Please sign in to comment.