Merge branch 'database'

- Integrated event sourcing with PostgreSQL for tracking biomero workflows and tasks. - Added database views for job progress, workflow statistics, and task-to-job mapping. - Enhanced testing with in-memory SQLite support and custom mocks for event sourcing. - Migrated to a SQLAlchemy backend with scoped sessions and improved configurability. - Improved logging, documentation, and test coverage for new and existing components.
NL-BioImaging · Dec 3, 2024 · e1494ed · e1494ed
2 parents eed9fd3 + baf5ae3
commit e1494ed
Show file tree

Hide file tree

Showing 14 changed files with 3,884 additions and 158 deletions.
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -15,7 +15,7 @@ jobs:
     strategy:
       fail-fast: false
       matrix:
-        python-version: ["3.7", "3.9", "3.10"]
+        python-version: ["3.8", "3.9", "3.10"]
 
     steps:
     - uses: actions/checkout@v4
@@ -36,4 +36,8 @@ jobs:
         flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
     - name: Test with pytest
       run: |
-        python -m pytest
+        python -m pytest --cov=biomero --cov-report=xml
+    - name: Coveralls GitHub Action
+      uses: coverallsapp/[email protected]
+
+
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -27,7 +27,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python@v5
       with:
-        python-version: '3.7'
+        python-version: '3.8'
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip

diff --git a/.github/workflows/sphinx.yml b/.github/workflows/sphinx.yml
@@ -13,6 +13,11 @@ jobs:
     - uses: actions/checkout@v4
     - name: Build HTML
       uses: ammaraskar/sphinx-action@master
+      with:
+        pre-build-command: |
+          # Install necessary dependencies
+          apt-get update --allow-releaseinfo-change -y && apt-get install -y gcc python3-dev libpq-dev postgresql-client
+          pg_config --version
       env:
         SETUPTOOLS_SCM_PRETEND_VERSION: 1
     - name: Upload artifacts

diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 # BIOMERO - BioImage analysis in OMERO
-[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![DOI](https://zenodo.org/badge/638954891.svg)](https://zenodo.org/badge/latestdoi/638954891) [![PyPI - Version](https://img.shields.io/pypi/v/biomero)](https://pypi.org/project/biomero/) [![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/biomero)](https://pypi.org/project/biomero/) ![Slurm](https://img.shields.io/badge/Slurm-21.08.6-blue.svg) ![OMERO](https://img.shields.io/badge/OMERO-5.6.8-blue.svg) [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7530/badge)](https://bestpractices.coreinfrastructure.org/projects/7530) [![Sphinx build](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml) [![pages-build-deployment](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment) [![python-package build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml) [![python-publish build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml)
+[![License](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![DOI](https://zenodo.org/badge/638954891.svg)](https://zenodo.org/badge/latestdoi/638954891) [![PyPI - Version](https://img.shields.io/pypi/v/biomero)](https://pypi.org/project/biomero/) [![PyPI - Python Versions](https://img.shields.io/pypi/pyversions/biomero)](https://pypi.org/project/biomero/) ![Slurm](https://img.shields.io/badge/Slurm-21.08.6-blue.svg) ![OMERO](https://img.shields.io/badge/OMERO-5.6.8-blue.svg) [![fair-software.eu](https://img.shields.io/badge/fair--software.eu-%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F%20%20%E2%97%8F-green)](https://fair-software.eu) [![OpenSSF Best Practices](https://bestpractices.coreinfrastructure.org/projects/7530/badge)](https://bestpractices.coreinfrastructure.org/projects/7530) [![Sphinx build](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/sphinx.yml) [![pages-build-deployment](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/pages/pages-build-deployment) [![python-package build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml/badge.svg)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-package.yml) [![python-publish build](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml/badge.svg?branch=main)](https://github.com/NL-BioImaging/biomero/actions/workflows/python-publish.yml) [![Coverage Status](https://coveralls.io/repos/github/NL-BioImaging/biomero/badge.svg?branch=main)](https://coveralls.io/github/NL-BioImaging/biomero?branch=main)
 
 The **BIOMERO** framework, for **B**io**I**mage analysis in **OMERO**, allows you to run (FAIR) bioimage analysis workflows directly from OMERO on a high-performance compute (HPC) cluster, remotely through SSH.
 
@@ -64,7 +64,7 @@ Your Slurm cluster/login node needs to have:
 Your OMERO _processing_ node needs to have:
 1. SSH client and access to the Slurm cluster (w/ private key / headless)
 2. SCP access to the Slurm cluster
-3. Python3.7+
+3. Python3.8+
 4. This library installed 
     - Latest release on PyPI `python3 -m pip install biomero`
     - or latest Github version `python3 -m pip install 'git+https://github.com/NL-BioImaging/biomero'`

diff --git a/biomero/__init__.py b/biomero/__init__.py
@@ -1,14 +1,9 @@
 from .slurm_client import SlurmClient
-
+import importlib.metadata
 try:
-    import importlib.metadata
-    try:
-        __version__ = importlib.metadata.version(__package__)
-    except importlib.metadata.PackageNotFoundError:
-        __version__ = "Version not found"
-except ModuleNotFoundError:  # Python 3.7
-    try:
-        import pkg_resources
-        __version__ = pkg_resources.get_distribution(__package__).version
-    except pkg_resources.DistributionNotFound:
-        __version__ = "Version not found"
+    __version__ = importlib.metadata.version(__package__)
+except importlib.metadata.PackageNotFoundError:
+    __version__ = "Version not found"
+
+from .eventsourcing import *
+from .views import *
diff --git a/biomero/constants.py b/biomero/constants.py
@@ -16,6 +16,7 @@
 
 IMAGE_EXPORT_SCRIPT = "_SLURM_Image_Transfer.py"
 IMAGE_IMPORT_SCRIPT = "SLURM_Get_Results.py"
+CONVERSION_SCRIPT = "SLURM_Remote_Conversion.py"
 RUN_WF_SCRIPT = "SLURM_Run_Workflow.py"
 
 
@@ -106,4 +107,15 @@ class transfer:
     FORMAT_OMETIFF = 'OME-TIFF'
     FORMAT_ZARR = 'ZARR'
     FOLDER = "Folder_Name"
-    FOLDER_DEFAULT = 'SLURM_IMAGES_'
+    FOLDER_DEFAULT = 'SLURM_IMAGES_'
+
+
+class workflow_status:
+    INITIALIZING = "INITIALIZING"
+    TRANSFERRING = "TRANSFERRING"
+    CONVERTING = "CONVERTING"
+    RETRIEVING = "RETRIEVING"
+    DONE = "DONE"
+    FAILED = "FAILED"
+    RUNNING = "RUNNING"
+    JOB_STATUS = "JOB_"
diff --git a/biomero/database.py b/biomero/database.py
@@ -0,0 +1,198 @@
+# -*- coding: utf-8 -*-
+# Copyright 2024 Torec Luik
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+from eventsourcing.utils import get_topic, clear_topic_cache
+import logging
+from sqlalchemy import create_engine, text, Column, Integer, String, URL, DateTime, Float
+from sqlalchemy.orm import sessionmaker, declarative_base, scoped_session
+from sqlalchemy.dialects.postgresql import UUID as PGUUID
+import os
+
+logger = logging.getLogger(__name__)
+
+# --------------------- VIEWS DB tables/classes ---------------------------- #
+
+# Base class for declarative class definitions
+Base = declarative_base()
+
+
+class JobView(Base):
+    """
+    SQLAlchemy model for the 'biomero_job_view' table.
+
+    Attributes:
+        slurm_job_id (Integer): The unique identifier for the Slurm job.
+        user (Integer): The ID of the user who submitted the job.
+        group (Integer): The group ID associated with the job.
+        task_id (UUID): The unique identifier for the biomero task
+    """
+    __tablename__ = 'biomero_job_view'
+
+    slurm_job_id = Column(Integer, primary_key=True)
+    user = Column(Integer, nullable=False)
+    group = Column(Integer, nullable=False)
+    task_id = Column(PGUUID(as_uuid=True))
+
+
+class JobProgressView(Base):
+    """
+    SQLAlchemy model for the 'biomero_job_progress_view' table.
+
+    Attributes:
+        slurm_job_id (Integer): The unique identifier for the Slurm job.
+        status (String): The current status of the Slurm job.
+        progress (String, optional): The progress status of the Slurm job.
+    """
+    __tablename__ = 'biomero_job_progress_view'
+
+    slurm_job_id = Column(Integer, primary_key=True)
+    status = Column(String, nullable=False)
+    progress = Column(String, nullable=True)
+
+
+class WorkflowProgressView(Base):
+    """
+    SQLAlchemy model for the 'workflow_progress_view' table.
+
+    Attributes:
+        workflow_id (PGUUID): The unique identifier for the workflow (primary key).
+        status (String, optional): The current status of the workflow.
+        progress (String, optional): The progress status of the workflow.
+        user (String, optional): The user who initiated the workflow.
+        group (String, optional): The group associated with the workflow.
+        name (String, optional): The name of the workflow
+    """
+    __tablename__ = 'biomero_workflow_progress_view'
+
+    workflow_id = Column(PGUUID(as_uuid=True), primary_key=True)
+    status = Column(String, nullable=True)
+    progress = Column(String, nullable=True)
+    user = Column(Integer, nullable=True)
+    group = Column(Integer, nullable=True)
+    name = Column(String, nullable=True)
+    task = Column(String, nullable=True)
+    start_time = Column(DateTime, nullable=False)
+
+
+class TaskExecution(Base):
+    """
+    SQLAlchemy model for the 'biomero_task_execution' table.
+
+    Attributes:
+        task_id (PGUUID): The unique identifier for the task.
+        task_name (String): The name of the task.
+        task_version (String): The version of the task.
+        user_id (Integer, optional): The ID of the user who initiated the task.
+        group_id (Integer, optional): The group ID associated with the task.
+        status (String): The current status of the task.
+        start_time (DateTime): The time when the task started.
+        end_time (DateTime, optional): The time when the task ended.
+        error_type (String, optional): Type of error encountered during execution, if any.
+    """
+    __tablename__ = 'biomero_task_execution'
+
+    task_id = Column(PGUUID(as_uuid=True), primary_key=True)
+    task_name = Column(String, nullable=False)
+    task_version = Column(String)
+    user_id = Column(Integer, nullable=True)
+    group_id = Column(Integer, nullable=True)
+    status = Column(String, nullable=False)
+    start_time = Column(DateTime, nullable=False)
+    end_time = Column(DateTime, nullable=True)
+    error_type = Column(String, nullable=True)
+
+
+class EngineManager:
+    """
+    Manages the SQLAlchemy engine and session lifecycle.
+
+    Class Attributes:
+        _engine: The SQLAlchemy engine used to connect to the database.
+        _scoped_session_topic: The topic of the scoped session.
+        _session: The scoped session used for database operations.
+    """
+    _engine = None
+    _scoped_session_topic = None
+    _session = None
+
+    @classmethod
+    def create_scoped_session(cls, sqlalchemy_url: str = None):
+        """
+        Creates and returns a scoped session for interacting with the database.
+
+        If the engine doesn't already exist, it initializes the SQLAlchemy engine 
+        and sets up the scoped session.
+
+        Args:
+            sqlalchemy_url (str, optional): The SQLAlchemy database URL. If not provided, 
+                the method will retrieve the value from the 'SQLALCHEMY_URL' environment variable.
+
+        Returns:
+            str: The topic of the scoped session adapter class.
+        """
+        if cls._engine is None:      
+            # Note, we only allow sqlalchemy eventsourcing module
+            if not sqlalchemy_url:
+                sqlalchemy_url = os.getenv('SQLALCHEMY_URL')
+            cls._engine = create_engine(sqlalchemy_url)
+
+            # setup tables if they don't exist yet
+            Base.metadata.create_all(cls._engine)
+
+            # Create a scoped_session object.
+            cls._session = scoped_session(
+                sessionmaker(autocommit=False, autoflush=True, bind=cls._engine)
+            )
+
+            class MyScopedSessionAdapter:
+                def __getattribute__(self, item: str) -> None:
+                    return getattr(cls._session, item)
+
+            # Produce the topic of the scoped session adapter class.
+            cls._scoped_session_topic = get_topic(MyScopedSessionAdapter)
+
+        return cls._scoped_session_topic
+
+    @classmethod
+    def get_session(cls):
+        """
+        Retrieves the current scoped session.
+
+        Returns:
+            Session: The SQLAlchemy session for interacting with the database.
+        """
+        return cls._session()
+
+    @classmethod
+    def commit(cls):
+        """
+        Commits the current transaction in the scoped session.
+        """
+        cls._session.commit()
+
+    @classmethod
+    def close_engine(cls):
+        """
+        Closes the database engine and cleans up the session.
+
+        This method disposes of the SQLAlchemy engine, removes the session, 
+        and resets all associated class attributes to `None`.
+        """
+        if cls._engine is not None:
+            cls._session.remove()
+            cls._engine.dispose()
+            cls._engine = None
+            cls._session = None  
+            cls._scoped_session_topic = None
+            clear_topic_cache()