add CLI and tools for generating configs; significant refactor (#291)

* add main entry point; add config generation prompt * clean up * add prompt for feature descriptions to gen-target * remove failfast * ignore main.py in code coverage * clean up report * update aia test * update target obj test * clean up * add factory test * update tests * update tests * install libomp on macOS CI runners for xgboost * install libomp on macOS CI runners * update README * clean up * clean up worst case * refactor attribute attack * refactor worst case attack * refactor structural attack * format JSON report output * refactor attack report creation Move common report code to attack base class. * refactor report generation * update safemodel logger * update docs * clean up * set target logger name to full path * update CHANGELOG * move user stories to examples dir Signed-off-by: Richard Preen <[email protected]> * update safemodel example * add note on versions to user stories readme * update README * update max features to prompt * update max feature prompt msg * clean up --------- Signed-off-by: Richard Preen <[email protected]>
AI-SDC · Jul 4, 2024 · 7735770 · 7735770
1 parent 97cc214
commit 7735770
Show file tree

Hide file tree

Showing 83 changed files with 3,914 additions and 5,956 deletions.
diff --git a/.codecov.yml b/.codecov.yml
@@ -1,9 +1,9 @@
----
 # configuration for https://codecov.io
 ignore:
-  - "setup.py"
-  - "aisdc/safemodel/classifiers/new_model_template.py"
+  - "aisdc/config"
+  - "aisdc/main.py"
   - "aisdc/preprocessing"
-  - "user_stories"
+  - "aisdc/safemodel/classifiers/new_model_template.py"
   - "examples"
-...
+  - "setup.py"
+  - "user_stories"
diff --git a/.github/workflows/tests.yml b/.github/workflows/tests.yml
@@ -22,6 +22,11 @@ jobs:
         with:
           python-version: ${{ matrix.python-version }}
 
+      # xgboost requires libomp on macOS
+      - name: Install dependencies on macOS
+        if: runner.os == 'macOS'
+        run: brew install libomp
+
       - name: Install
         run: pip install .[test]
 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -1,12 +1,15 @@
 # Changelog
 
+## Version 1.2.0 (under development)
+
 Changes:
 *   Add support for scikit-learn MLPClassifier ([#276](https://github.com/AI-SDC/AI-SDC/pull/276))
 *   Use default XGBoost params if not defined in structural attacks ([#277](https://github.com/AI-SDC/AI-SDC/pull/277))
 *   Clean up documentation ([#282](https://github.com/AI-SDC/AI-SDC/pull/282))
 *   Clean up repository and update packaging ([#283](https://github.com/AI-SDC/AI-SDC/pull/283))
 *   Format docstrings ([#286](https://github.com/AI-SDC/AI-SDC/pull/286))
 *   Refactor ([#284](https://github.com/AI-SDC/AI-SDC/pull/284), [#285](https://github.com/AI-SDC/AI-SDC/pull/285), [#287](https://github.com/AI-SDC/AI-SDC/pull/287))
+*   Add CLI and tools for generating configs; significant refactor ([#291](https://github.com/AI-SDC/AI-SDC/pull/291))
 
 ## Version 1.1.3 (Apr 26, 2024)
 

diff --git a/README.md b/README.md
@@ -12,8 +12,6 @@ The `aisdc` package provides:
 * A variety of privacy attacks for assessing machine learning models.
 * The safemodel package: a suite of open source wrappers for common machine learning frameworks, including [scikit-learn](https://scikit-learn.org) and [Keras](https://keras.io). It is designed for use by researchers in Trusted Research Environments (TREs) where disclosure control methods must be implemented. Safemodel aims to give researchers greater confidence that their models are more compliant with disclosure control.
 
-A collection of user guides can be found in the [`user_stories`](user_stories) folder of this repository. These guides include configurable examples from the perspective of both a researcher and a TRE, with separate scripts for each. Instructions on how to use each of these scripts and which scripts to use are included in the README located in the folder.
-
 ## Installation
 
 [![PyPI package](https://img.shields.io/pypi/v/aisdc.svg)](https://pypi.org/project/aisdc)
@@ -32,14 +30,15 @@ To additionally install the safemodel package:
 $ pip install aisdc[safemodel]
 ```
 
-## Running
-
-To run an example, simply execute the desired script. For example, to run LiRA:
-
+Note: macOS users may need to install libomp due to a dependency on XGBoost:
 ```
-$ python -m lira_attack_example
+$ brew install libomp
 ```
 
+## Running
+
+See the [`examples`](examples/).
+
 ## Acknowledgement
 
 This work was funded by UK Research and Innovation under Grant Numbers MC_PC_21033 and MC_PC_23006 as part of Phase 1 of the [DARE UK](https://dareuk.org.uk) (Data and Analytics Research Environments UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automatic checking of Research Outputs (SACRO; MC_PC_23006) and Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033).This project has also been supported by MRC and EPSRC [grant number MR/S010351/1]: PICTURES.

diff --git a/aisdc/attacks/attack.py b/aisdc/attacks/attack.py
@@ -1,32 +1,90 @@
 """Base class for an attack object."""
 
+from __future__ import annotations
+
+import importlib
 import inspect
-import json
+import logging
+import os
+import uuid
+from datetime import datetime
+
+from fpdf import FPDF
 
+from aisdc.attacks import report
 from aisdc.attacks.target import Target
 
+logger = logging.getLogger(__name__)
+
 
 class Attack:
-    """Base (abstract) class to represent an attack."""
+    """Base class to represent an attack."""
 
-    def __init__(self) -> None:
-        self.attack_config_json_file_name = None
+    def __init__(self, output_dir: str = "outputs", write_report: bool = True) -> None:
+        """Instantiate an attack.
 
-    def attack(self, target: Target) -> None:
+        Parameters
+        ----------
+        output_dir : str
+            name of the directory where outputs are stored
+        write_report : bool
+            Whether to generate a JSON and PDF report.
+        """
+        self.output_dir: str = output_dir
+        self.write_report: bool = write_report
+        self.attack_metrics: dict | list = {}
+        self.metadata: dict = {}
+        if not os.path.exists(self.output_dir):
+            os.makedirs(self.output_dir)
+
+    def attack(self, target: Target) -> dict:
         """Run an attack."""
         raise NotImplementedError
 
+    def _construct_metadata(self) -> None:
+        """Generate attack metadata."""
+        self.metadata = {
+            "attack_name": str(self),
+            "attack_params": self.get_params(),
+            "global_metrics": {},
+        }
+
+    def _get_attack_metrics_instances(self) -> dict:
+        """Get metrics for each individual repetition of an attack."""
+        raise NotImplementedError  # pragma: no cover
+
+    def _make_pdf(self, output: dict) -> FPDF | None:
+        """Create PDF report."""
+        raise NotImplementedError  # pragma: no cover
+
+    def _make_report(self, target: Target) -> dict:
+        """Create attack report."""
+        logger.info("Generating report")
+        self._construct_metadata()
+        self.metadata["target_model"] = target.model_name
+        self.metadata["target_model_params"] = target.model_params
+        output: dict = {
+            "log_id": str(uuid.uuid4()),
+            "log_time": datetime.now().strftime("%d/%m/%Y %H:%M:%S"),
+            "metadata": self.metadata,
+            "attack_experiment_logger": self._get_attack_metrics_instances(),
+        }
+        return output
+
+    def _write_report(self, output: dict) -> None:
+        """Write report as JSON and PDF."""
+        dest: str = os.path.join(self.output_dir, "report")
+        if self.write_report:
+            logger.info("Writing report: %s.json %s.pdf", dest, dest)
+            report.write_json(output, dest)
+            pdf_report = self._make_pdf(output)
+            if pdf_report is not None:
+                report.write_pdf(dest, pdf_report)
+
     def __str__(self) -> str:
         """Return the string representation of an attack."""
         raise NotImplementedError
 
-    def _update_params_from_config_file(self) -> None:
-        """Read a configuration file and load it into a dictionary object."""
-        with open(self.attack_config_json_file_name, encoding="utf-8") as f:
-            config = json.loads(f.read())
-        for key, value in config.items():
-            setattr(self, key, value)
-
     @classmethod
     def _get_param_names(cls) -> list[str]:
         """Get parameter names."""
@@ -49,3 +107,10 @@ def get_params(self) -> dict:
         for key in self._get_param_names():
             out[key] = getattr(self, key)
         return out
+
+
+def get_class_by_name(class_path: str):
+    """Return a class given its name."""
+    module_path, class_name = class_path.rsplit(".", 1)
+    module = importlib.import_module(module_path)
+    return getattr(module, class_name)
diff --git a/aisdc/attacks/attack_report_formatter.py b/aisdc/attacks/attack_report_formatter.py
@@ -11,6 +11,7 @@
 
 import matplotlib.pyplot as plt
 import numpy as np
+import yaml
 
 
 def cleanup_files_for_release(
@@ -73,7 +74,7 @@ def add_attack_output(self, incoming_json: dict, class_name: str) -> None:
                 class_name = class_name + "_" + str(incoming_json["log_id"])
 
             file_data[class_name] = incoming_json
-            json.dump(file_data, f)
+            json.dump(file_data, f, indent=4)
 
     def get_output_filename(self) -> str:
         """Return the filename of the JSON file which has been created."""
@@ -447,16 +448,16 @@ class GenerateTextReport:
 
     def __init__(self) -> None:
         self.text_out = []
-        self.target_json_filename = None
+        self.target_yaml_filename = None
         self.attack_json_filename = None
         self.model_name_from_target = None
 
         self.immediate_rejection = []
         self.support_rejection = []
         self.support_release = []
 
-    def _process_target_json(self) -> None:
-        """Create a summary of a target model JSON file."""
+    def _process_target_yaml(self) -> None:
+        """Create a summary of a target model YAML file."""
         model_params_of_interest = [
             "C",
             "kernel",
@@ -470,33 +471,33 @@ def _process_target_json(self) -> None:
             "learning_rate",
         ]
 
-        with open(self.target_json_filename, encoding="utf-8") as f:
-            json_report = json.loads(f.read())
+        with open(self.target_yaml_filename, encoding="utf-8") as f:
+            yaml_report = yaml.safe_load(f)
 
         output_string = "TARGET MODEL SUMMARY\n"
 
-        if "model_name" in json_report:
+        if "model_name" in yaml_report:
             output_string = (
-                output_string + "model_name: " + json_report["model_name"] + "\n"
+                output_string + "model_name: " + yaml_report["model_name"] + "\n"
             )
 
-        if "n_samples" in json_report:
+        if "n_samples" in yaml_report:
             output_string = output_string + "number of samples used to train: "
-            output_string = output_string + str(json_report["n_samples"]) + "\n"
+            output_string = output_string + str(yaml_report["n_samples"]) + "\n"
 
-        if "model_params" in json_report:
+        if "model_params" in yaml_report:
             for param in model_params_of_interest:
-                if param in json_report["model_params"]:
+                if param in yaml_report["model_params"]:
                     output_string = output_string + param + ": "
                     output_string = output_string + str(
-                        json_report["model_params"][param]
+                        yaml_report["model_params"][param]
                     )
                     output_string = output_string + "\n"
 
-        if "model_path" in json_report:
-            filepath = os.path.split(os.path.abspath(self.target_json_filename))[0]
+        if "model_path" in yaml_report:
+            filepath = os.path.split(os.path.abspath(self.target_yaml_filename))[0]
             self.model_name_from_target = os.path.join(
-                filepath, json_report["model_path"]
+                filepath, yaml_report["model_path"]
             )
 
         self.text_out.append(output_string)
@@ -519,10 +520,10 @@ def process_attack_target_json(
             json_report = json.loads(f.read())
 
         if target_filename is not None:
-            self.target_json_filename = target_filename
+            self.target_yaml_filename = target_filename
 
             with open(target_filename, encoding="utf-8") as f:
-                target_file = json.loads(f.read())
+                target_file = yaml.safe_load(f)
                 json_report = {**json_report, **target_file}
 
         modules = [
@@ -576,8 +577,8 @@ def export_to_file(  # pylint: disable=too-many-arguments
         copy_of_text_out = self.text_out
         self.text_out = []
 
-        if self.target_json_filename is not None:
-            self._process_target_json()
+        if self.target_yaml_filename is not None:
+            self._process_target_yaml()
 
         self.text_out += copy_of_text_out
 
@@ -594,7 +595,7 @@ def export_to_file(  # pylint: disable=too-many-arguments
             copy_into_release = [
                 output_filename,
                 self.attack_json_filename,
-                self.target_json_filename,
+                self.target_yaml_filename,
             ]
 
             if model_filename is None: