Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SVM Outlier detector #814

Merged
merged 34 commits into from
Jul 7, 2023
Merged

SVM Outlier detector #814

merged 34 commits into from
Jul 7, 2023

Conversation

mauicv
Copy link
Collaborator

@mauicv mauicv commented Jun 12, 2023

What is this?

This PR adds the SVM outlier detector. Example notebook here.

Runtime comparisons:

Note that because I've allowed the user to set the optimization by the kwarg rather than using their choice of device this means the user can run the sdg method on the gpu. In reality, this just means that the Nystrome approximation will be performed on the GPU and but the SVM will be fit on the CPU.

see this notebook. x_ref here is a dataset sampled from a normal distribution in 512 dimensions. dsize is the size of the dataset sampled. There are four combinations for each of device = 'cpu'/'gpu' or optimization equal to gd, sdg. We want to see gpu-gd as the lowest curve.

device-optimization-analyisis

TODO:

  • Remove sklearn backend
  • Implement SGD and GD variants
  • Add error checks for device-optimization combinations
  • Test device implementation in notebook
  • Verify correct tensors in backend implementation
  • Fix tests
  • Rewrite docstrings
  • Write tests for device/optimizer kwarg checks
  • Requested PR changes
  • Run GPU and CPU notebook and double check results
  • FInal check

Notes:

  1. SVM is a kernel detector by default and uses GaussianRBF which isn't torch-scriptable yet. Hence this PR will not implement torch scriptable functionality for this detector.
  2. The GaussianRBF is incompatible with the SKlearn backend which isn't really an issue because SKlearn users can specify the kernel as a string and this behaviour is exposed through the wrapper API. Will probably require a note in the docs though.
  3. Originally this was implemented with the option of two separate backends. The PyTorch backend performs well for GPUs and the Sklearn backend performs well on CPUs. The benefit to this approach was that the Sklearn backend is not dependent on the PyTorch dependency however this came at the cost of the user having to use Sklearn kernels which have to be configured in a different way to our Kernels. This means that the initialization of each detector is quite different. Having a different API for each backend is likely to confuse users and will require lots of documentation to help them navigate. The value add for these detectors is primarily that they have GPU support. If users still want to use the CPU then the torch dependency shouldn’t be a massive blocker. Instead, this was changed to have GdSVMDetector and SdgSVMDetector both of which will be torch backends but not selectable by the user through the backend kwarg. Instead, the user specifies an optimization kwarg. This is similar to the PCA linear and kernel backends (here and here) so it's not a massive break from the approach taken elsewhere.

@mauicv mauicv added the Type: New method New method proposals label Jun 12, 2023
@mauicv mauicv self-assigned this Jun 12, 2023
@mauicv mauicv changed the title Add inital implemnetation of svm SVM Outlier detector Jun 12, 2023
@codecov
Copy link

codecov bot commented Jun 12, 2023

Codecov Report

Merging #814 (7a64454) into master (5e69f4b) will increase coverage by 0.20%.
The diff coverage is 98.78%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #814      +/-   ##
==========================================
+ Coverage   81.63%   81.84%   +0.20%     
==========================================
  Files         156      159       +3     
  Lines       10152    10317     +165     
==========================================
+ Hits         8288     8444     +156     
- Misses       1864     1873       +9     
Impacted Files Coverage Δ
alibi_detect/od/_svm.py 97.50% <97.50%> (ø)
alibi_detect/od/pytorch/svm.py 99.16% <99.16%> (ø)
alibi_detect/od/pytorch/__init__.py 100.00% <100.00%> (ø)
alibi_detect/utils/pytorch/losses.py 100.00% <100.00%> (ø)

... and 4 files with indirect coverage changes

@mauicv mauicv requested review from jklaise and ojcobb June 15, 2023 13:51
Copy link
Contributor

@jklaise jklaise left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice one! A few comments, mostly to do with tidiness.



def hinge_loss(preds: torch.Tensor) -> torch.Tensor:
"L(pred) = max(0, 1-pred) summed over multiple preds"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: "summed" -> "averaged" ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻 (checking Janis' requested changes whilst he is away)

Comment on lines 28 to 32
backend: Literal['pytorch', 'sklearn'] = 'sklearn',
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None,
kernel: Union['torch.nn.Module', Literal['linear', 'poly', 'rbf', 'sigmoid']] = 'rbf',
sigma: Optional[float] = None,
kernel_params: Optional[Dict] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument order is different from other detectors that take a kernel (e.g. PCA), please double check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be one of the tasks before the public release - consistency of public APIs wrt args (order, naming, types etc.)

Copy link
Collaborator Author

@mauicv mauicv Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added an issue to here, I'll leave it for now as it makes sense to just do this all at once at the end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

directly through its primal formulation. The Nystroem approximation can optionally be used to speed up
training and inference by approximating the kernel's RKHS.

We provide two backends for the one class svm, one based on PyTorch and one based on scikit-learn. The
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one class -> one-class

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻


Parameters
----------
kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Order of args different again and some missing, see also above note.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment on lines 74 to 75
Additional parameters (keyword arguments) for kernel function passed as a dictionary. Only used for the
'``sklearn``' backend and the kernel is a custom function.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confusing grammar and wording - not clear what's possible. (and ->or if?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: No longer relevant due to other changes...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

Comment on lines 118 to 121
nu=fit_kwargs.get('nu', 0.5),
tol=fit_kwargs.get('tol', 1e-3),
max_iter=fit_kwargs.get('max_iter', 1000),
verbose=fit_kwargs.get('verbose', 0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slightly worried that defaults are defined in two places - here and in fit. Is there a better way of doing this?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've made an issue for this as I use the same pattern in the gmm detector.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

"""
self.check_fitted()
x = self.nystroem.transform(x)
return - self.gmm.score_samples(x)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove space after the minus sign (assuming it's not a bug to have a minus sign!)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

kernel_params=self.kernel_params,
)
x_ref = self.nystroem.fit(x_ref).transform(x_ref)
self.gmm = SGDOneClassSVM(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name gmm misleading?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤦 Good catch, removed!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

if isinstance(self.kernel, str):
if self.kernel not in ['rbf']:
raise ValueError(
f'Currently only the rbf Kernel is supported for the SVM torch backend, got {self.kernel}.'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit misleading as the user can use a custom kernel.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻

self.n_components
)

X_nys = self.nystroem.fit(x_ref).transform(x_ref)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noting that in sklearn backend the transformed data was called x_ref, although I think X_nys is clearer. Would suggest making the change in sklearn backend for consistency (and suggest sticking with lower-case convention, so x_nys).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏻


The Support vector machine outlier detector fits a one-class SVM to the reference data.

Rather than the typical approach of optimizing the exact kernel OCSVM objective through a dual formulation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this might be a bit too much detail for the highest level docstring. However I do intend to iterate slightly on these docstrings when doing the documentation so happy to leave until then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I'll leave it for now.


Rather than the typical approach of optimizing the exact kernel OCSVM objective through a dual formulation,
here we instead map the data into the kernel's RKHS and then solve the linear optimization problem
directly through its primal formulation. The Nystroem approximation can optionally be used to speed up
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Nystroem approximation isn't optional

Parameters
----------
kernel
Used to define similarity between data points. If using the pytorch backend, this can be either a string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second sentence here doesn't make sense

n_components: Optional[int] = None,
backend: Literal['pytorch', 'sklearn'] = 'sklearn',
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None,
kernel: Union['torch.nn.Module', Literal['linear', 'poly', 'rbf', 'sigmoid']] = 'rbf',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The kernel definition here has become a little unwieldy and inconsistent with how we do things elsewhere. Also, although it is technically possible, users should not be specifying linear or polynomial kernels for this detector as the method relies on the kernel inducing an infinite dimensional RKHS.

I wonder if we instead encourage users to pass kernels in exactly the same way as they do for other detectors (so here we'd just have the kernel kwarg and not the sigma and kernel_params kwarg). We then always do our own nystrtoem approximation to produce kernalised features and only rely on sklearn for the OCSVM part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think consistency across detector definitions is particularly important given we envisage the primary use-case of the detector is as a component within a large ensemble.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See PR notes for update on solution taken

if self.kernel == 'rbf':
if self.sigma is not None:
sigma = torch.tensor(self.sigma, device=self.device)
self.kernel = GaussianRBF(sigma=sigma)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here if self.sigma is None then sigma is not defined and sigma=sigma causes an error

@mauicv mauicv requested review from ojcobb and ascillitoe June 29, 2023 11:19
backends = {
'pytorch': {
'sgd': SdgSVMTorch,
'gd': DgSVMTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the class names have the d and g the wrong way round. Also could we perhaps use "batch gradient descent" (so BgdSVMTorch) for the Pytorch variant? I think it makes the distinction clearer as just gradient descent is general enough that it could refer to both variants.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with BGD.

def __init__(
self,
nu: float,
kernel: 'torch.nn.Module',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we could pass the gaussian RBF as a default here?

Parameters
----------
nu
The proportion of the training data that should be considered outliers. Note that this does not necessarily
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets make clear this should be thought of as a regularisation parameter that affects how smooth the decsion boundary will be.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I've basically just added your comment on at the end:

The proportion of the training data that should be considered outliers. Note that this does not necessarily correspond to the false positive rate on test data, which is still defined when calling the infer_threshold method. nu should be thought of as a regularization parameter that affects how smooth the svm decision boundary is.

self.check_fitted()
x_nys = self.nystroem.transform(x)
x_nys = x_nys.cpu().numpy()
return self._to_tensor(-self.svm.score_samples(x_nys))
Copy link
Contributor

@ojcobb ojcobb Jun 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, due to differences between how we and sklearn set up our optimisations, scores returned via sgd vs gd variants differ by a linear transformation. Lets fix the intercept at 1 and scale coefficients accordingly such that we have consistency in this regard. (Note that out intercept vairable is equal to sklearn's offset variable as intercept = 1-offset.)

}


class SVM(BaseDetector, ThresholdMixin, FitMixin):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pass None as the default for n_components?

"""
if (isinstance(device, str) and device in ('gpu', 'cuda')) or \
(isinstance(device, torch.device) and device.type == 'cuda'):
warnings.warn(('The `sgd` optimization option is best suited for CPU. If '
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we modify to convey that sgd will run on cpu regardless and that in this case only the nystroem approximation will be done on gpu.

@ojcobb
Copy link
Contributor

ojcobb commented Jun 30, 2023

Much cleaner detector definition and abstractions now - definitely a good shout to have user specify optimisation method themselves. Nice one!

kernel
Kernel function to use for outlier detection. Should be an instance of a subclass of `torch.nn.Module`.
n_components
Number of components in the Nystroem approximation By default uses all of them.
Copy link
Contributor

@ascillitoe ascillitoe Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

, needed after approximation.

from alibi_detect.base import (BaseDetector, FitMixin, ThresholdMixin,
outlier_prediction_dict)
from alibi_detect.exceptions import _catch_error as catch_error
from alibi_detect.od.pytorch import DgSVMTorch, SdgSVMTorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: a little atypical to have the abbreviation SVM in all caps, but SGD as Sgd. However, on the other hand, SGDSVM is a little hard to read... Maybe add a _, or just leave...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, for some reason SGD_SVMTorch and SgdSvmTorch seems worse than SgdSVMTorch to me. I'm ambivalent though, @ojcobb, any preference?

The number of iterations over which the loss must decrease by `tol` in order for optimization to continue.
This is only used for the ``'gd'`` optimization..
verbose
Verbosity level during training. ``0`` is silent, ``1`` a progress bar or sklearn training output for the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

verbose description is referring to sklearn implementation only. Unclear if verbose is still relevant when the torch backend is used. Would be good to make the description more generic, or be more descriptive as to when the verbose kwarg is relevant.

as a tuple of the form `(min_eta, max_eta)` and only used for the ``'gd'`` optimization.
n_step_sizes
The number of step sizes in the defined range to be tested for loss reduction. This many points are spaced
equidistantly along the range in log space. This is only used for the ``'gd'`` optimization.
Copy link
Contributor

@ascillitoe ascillitoe Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe spaced equidistantly -> uniformly distributed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to spaced evenly!

max_iter: int = 1000,
verbose: int = 0,
) -> Dict:
"""Fit the SVM detector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are being more descriptive about what is being fit in SgdSVMTorch.fit method, is it worth being slightly more descriptive here? i.e. Fit the pytorch SVM detector or similar?


Returns
-------
Dictionary with fit results. The dictionary contains the following keys
Copy link
Contributor

@ascillitoe ascillitoe Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keys -> keys: ?

Copy link
Contributor

@ascillitoe ascillitoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've attempted to review the changes requested by @jklaise, and it looks like they've all been addressed. Also agree the new hidden sklearn backend approach is far cleaner.

Except for a few nitpicks (and @ojcobb's comments), everything else looks good! Although only moderate confidence...

x_ref: np.ndarray,
tol: float = 1e-6,
max_iter: int = 1000,
step_size_range: Tuple[float, float] = (1e-6, 1.0),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change default step_size_range to (1e-8, 1.0)?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, missed this! Now fixed!

Copy link
Contributor

@ojcobb ojcobb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mauicv mauicv merged commit e3771c3 into SeldonIO:master Jul 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: New method New method proposals
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants