Skip to content

Commit

Permalink
add CLI and tools for generating configs; significant refactor (#291)
Browse files Browse the repository at this point in the history
* add main entry point; add config generation prompt

* clean up

* add prompt for feature descriptions to gen-target

* remove failfast

* ignore main.py in code coverage

* clean up report

* update aia test

* update target obj test

* clean up

* add factory test

* update tests

* update tests

* install libomp on macOS CI runners for xgboost

* install libomp on macOS CI runners

* update README

* clean up

* clean up worst case

* refactor attribute attack

* refactor worst case attack

* refactor structural attack

* format JSON report output

* refactor attack report creation

Move common report code to attack base class.

* refactor report generation

* update safemodel logger

* update docs

* clean up

* set target logger name to full path

* update CHANGELOG

* move user stories to examples dir

Signed-off-by: Richard Preen <[email protected]>

* update safemodel example

* add note on versions to user stories readme

* update README

* update max features to prompt

* update max feature prompt msg

* clean up

---------

Signed-off-by: Richard Preen <[email protected]>
  • Loading branch information
rpreen authored Jul 4, 2024
1 parent 97cc214 commit 7735770
Show file tree
Hide file tree
Showing 83 changed files with 3,914 additions and 5,956 deletions.
10 changes: 5 additions & 5 deletions .codecov.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
---
# configuration for https://codecov.io
ignore:
- "setup.py"
- "aisdc/safemodel/classifiers/new_model_template.py"
- "aisdc/config"
- "aisdc/main.py"
- "aisdc/preprocessing"
- "user_stories"
- "aisdc/safemodel/classifiers/new_model_template.py"
- "examples"
...
- "setup.py"
- "user_stories"
5 changes: 5 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ jobs:
with:
python-version: ${{ matrix.python-version }}

# xgboost requires libomp on macOS
- name: Install dependencies on macOS
if: runner.os == 'macOS'
run: brew install libomp

- name: Install
run: pip install .[test]

Expand Down
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
# Changelog

## Version 1.2.0 (under development)

Changes:
* Add support for scikit-learn MLPClassifier ([#276](https://github.com/AI-SDC/AI-SDC/pull/276))
* Use default XGBoost params if not defined in structural attacks ([#277](https://github.com/AI-SDC/AI-SDC/pull/277))
* Clean up documentation ([#282](https://github.com/AI-SDC/AI-SDC/pull/282))
* Clean up repository and update packaging ([#283](https://github.com/AI-SDC/AI-SDC/pull/283))
* Format docstrings ([#286](https://github.com/AI-SDC/AI-SDC/pull/286))
* Refactor ([#284](https://github.com/AI-SDC/AI-SDC/pull/284), [#285](https://github.com/AI-SDC/AI-SDC/pull/285), [#287](https://github.com/AI-SDC/AI-SDC/pull/287))
* Add CLI and tools for generating configs; significant refactor ([#291](https://github.com/AI-SDC/AI-SDC/pull/291))

## Version 1.1.3 (Apr 26, 2024)

Expand Down
13 changes: 6 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,6 @@ The `aisdc` package provides:
* A variety of privacy attacks for assessing machine learning models.
* The safemodel package: a suite of open source wrappers for common machine learning frameworks, including [scikit-learn](https://scikit-learn.org) and [Keras](https://keras.io). It is designed for use by researchers in Trusted Research Environments (TREs) where disclosure control methods must be implemented. Safemodel aims to give researchers greater confidence that their models are more compliant with disclosure control.

A collection of user guides can be found in the [`user_stories`](user_stories) folder of this repository. These guides include configurable examples from the perspective of both a researcher and a TRE, with separate scripts for each. Instructions on how to use each of these scripts and which scripts to use are included in the README located in the folder.

## Installation

[![PyPI package](https://img.shields.io/pypi/v/aisdc.svg)](https://pypi.org/project/aisdc)
Expand All @@ -32,14 +30,15 @@ To additionally install the safemodel package:
$ pip install aisdc[safemodel]
```

## Running

To run an example, simply execute the desired script. For example, to run LiRA:

Note: macOS users may need to install libomp due to a dependency on XGBoost:
```
$ python -m lira_attack_example
$ brew install libomp
```

## Running

See the [`examples`](examples/).

## Acknowledgement

This work was funded by UK Research and Innovation under Grant Numbers MC_PC_21033 and MC_PC_23006 as part of Phase 1 of the [DARE UK](https://dareuk.org.uk) (Data and Analytics Research Environments UK) programme, delivered in partnership with Health Data Research UK (HDR UK) and Administrative Data Research UK (ADR UK). The specific projects were Semi-Automatic checking of Research Outputs (SACRO; MC_PC_23006) and Guidelines and Resources for AI Model Access from TrusTEd Research environments (GRAIMATTER; MC_PC_21033).­This project has also been supported by MRC and EPSRC [grant number MR/S010351/1]: PICTURES.
Expand Down
89 changes: 77 additions & 12 deletions aisdc/attacks/attack.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,90 @@
"""Base class for an attack object."""

from __future__ import annotations

import importlib
import inspect
import json
import logging
import os
import uuid
from datetime import datetime

from fpdf import FPDF

from aisdc.attacks import report
from aisdc.attacks.target import Target

logger = logging.getLogger(__name__)


class Attack:
"""Base (abstract) class to represent an attack."""
"""Base class to represent an attack."""

def __init__(self) -> None:
self.attack_config_json_file_name = None
def __init__(self, output_dir: str = "outputs", write_report: bool = True) -> None:
"""Instantiate an attack.
def attack(self, target: Target) -> None:
Parameters
----------
output_dir : str
name of the directory where outputs are stored
write_report : bool
Whether to generate a JSON and PDF report.
"""
self.output_dir: str = output_dir
self.write_report: bool = write_report
self.attack_metrics: dict | list = {}
self.metadata: dict = {}
if not os.path.exists(self.output_dir):
os.makedirs(self.output_dir)

def attack(self, target: Target) -> dict:
"""Run an attack."""
raise NotImplementedError

def _construct_metadata(self) -> None:
"""Generate attack metadata."""
self.metadata = {
"attack_name": str(self),
"attack_params": self.get_params(),
"global_metrics": {},
}

def _get_attack_metrics_instances(self) -> dict:
"""Get metrics for each individual repetition of an attack."""
raise NotImplementedError # pragma: no cover

def _make_pdf(self, output: dict) -> FPDF | None:
"""Create PDF report."""
raise NotImplementedError # pragma: no cover

def _make_report(self, target: Target) -> dict:
"""Create attack report."""
logger.info("Generating report")
self._construct_metadata()
self.metadata["target_model"] = target.model_name
self.metadata["target_model_params"] = target.model_params
output: dict = {
"log_id": str(uuid.uuid4()),
"log_time": datetime.now().strftime("%d/%m/%Y %H:%M:%S"),
"metadata": self.metadata,
"attack_experiment_logger": self._get_attack_metrics_instances(),
}
return output

def _write_report(self, output: dict) -> None:
"""Write report as JSON and PDF."""
dest: str = os.path.join(self.output_dir, "report")
if self.write_report:
logger.info("Writing report: %s.json %s.pdf", dest, dest)
report.write_json(output, dest)
pdf_report = self._make_pdf(output)
if pdf_report is not None:
report.write_pdf(dest, pdf_report)

def __str__(self) -> str:
"""Return the string representation of an attack."""
raise NotImplementedError

def _update_params_from_config_file(self) -> None:
"""Read a configuration file and load it into a dictionary object."""
with open(self.attack_config_json_file_name, encoding="utf-8") as f:
config = json.loads(f.read())
for key, value in config.items():
setattr(self, key, value)

@classmethod
def _get_param_names(cls) -> list[str]:
"""Get parameter names."""
Expand All @@ -49,3 +107,10 @@ def get_params(self) -> dict:
for key in self._get_param_names():
out[key] = getattr(self, key)
return out


def get_class_by_name(class_path: str):
"""Return a class given its name."""
module_path, class_name = class_path.rsplit(".", 1)
module = importlib.import_module(module_path)
return getattr(module, class_name)
43 changes: 22 additions & 21 deletions aisdc/attacks/attack_report_formatter.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@

import matplotlib.pyplot as plt
import numpy as np
import yaml


def cleanup_files_for_release(
Expand Down Expand Up @@ -73,7 +74,7 @@ def add_attack_output(self, incoming_json: dict, class_name: str) -> None:
class_name = class_name + "_" + str(incoming_json["log_id"])

file_data[class_name] = incoming_json
json.dump(file_data, f)
json.dump(file_data, f, indent=4)

def get_output_filename(self) -> str:
"""Return the filename of the JSON file which has been created."""
Expand Down Expand Up @@ -447,16 +448,16 @@ class GenerateTextReport:

def __init__(self) -> None:
self.text_out = []
self.target_json_filename = None
self.target_yaml_filename = None
self.attack_json_filename = None
self.model_name_from_target = None

self.immediate_rejection = []
self.support_rejection = []
self.support_release = []

def _process_target_json(self) -> None:
"""Create a summary of a target model JSON file."""
def _process_target_yaml(self) -> None:
"""Create a summary of a target model YAML file."""
model_params_of_interest = [
"C",
"kernel",
Expand All @@ -470,33 +471,33 @@ def _process_target_json(self) -> None:
"learning_rate",
]

with open(self.target_json_filename, encoding="utf-8") as f:
json_report = json.loads(f.read())
with open(self.target_yaml_filename, encoding="utf-8") as f:
yaml_report = yaml.safe_load(f)

output_string = "TARGET MODEL SUMMARY\n"

if "model_name" in json_report:
if "model_name" in yaml_report:
output_string = (
output_string + "model_name: " + json_report["model_name"] + "\n"
output_string + "model_name: " + yaml_report["model_name"] + "\n"
)

if "n_samples" in json_report:
if "n_samples" in yaml_report:
output_string = output_string + "number of samples used to train: "
output_string = output_string + str(json_report["n_samples"]) + "\n"
output_string = output_string + str(yaml_report["n_samples"]) + "\n"

if "model_params" in json_report:
if "model_params" in yaml_report:
for param in model_params_of_interest:
if param in json_report["model_params"]:
if param in yaml_report["model_params"]:
output_string = output_string + param + ": "
output_string = output_string + str(
json_report["model_params"][param]
yaml_report["model_params"][param]
)
output_string = output_string + "\n"

if "model_path" in json_report:
filepath = os.path.split(os.path.abspath(self.target_json_filename))[0]
if "model_path" in yaml_report:
filepath = os.path.split(os.path.abspath(self.target_yaml_filename))[0]
self.model_name_from_target = os.path.join(
filepath, json_report["model_path"]
filepath, yaml_report["model_path"]
)

self.text_out.append(output_string)
Expand All @@ -519,10 +520,10 @@ def process_attack_target_json(
json_report = json.loads(f.read())

if target_filename is not None:
self.target_json_filename = target_filename
self.target_yaml_filename = target_filename

with open(target_filename, encoding="utf-8") as f:
target_file = json.loads(f.read())
target_file = yaml.safe_load(f)
json_report = {**json_report, **target_file}

modules = [
Expand Down Expand Up @@ -576,8 +577,8 @@ def export_to_file( # pylint: disable=too-many-arguments
copy_of_text_out = self.text_out
self.text_out = []

if self.target_json_filename is not None:
self._process_target_json()
if self.target_yaml_filename is not None:
self._process_target_yaml()

self.text_out += copy_of_text_out

Expand All @@ -594,7 +595,7 @@ def export_to_file( # pylint: disable=too-many-arguments
copy_into_release = [
output_filename,
self.attack_json_filename,
self.target_json_filename,
self.target_yaml_filename,
]

if model_filename is None:
Expand Down
Loading

0 comments on commit 7735770

Please sign in to comment.