-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SVM Outlier detector #814
SVM Outlier detector #814
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #814 +/- ##
==========================================
+ Coverage 81.63% 81.84% +0.20%
==========================================
Files 156 159 +3
Lines 10152 10317 +165
==========================================
+ Hits 8288 8444 +156
- Misses 1864 1873 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice one! A few comments, mostly to do with tidiness.
alibi_detect/utils/pytorch/losses.py
Outdated
|
||
|
||
def hinge_loss(preds: torch.Tensor) -> torch.Tensor: | ||
"L(pred) = max(0, 1-pred) summed over multiple preds" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: "summed" -> "averaged" ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻 (checking Janis' requested changes whilst he is away)
alibi_detect/od/_svm.py
Outdated
backend: Literal['pytorch', 'sklearn'] = 'sklearn', | ||
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None, | ||
kernel: Union['torch.nn.Module', Literal['linear', 'poly', 'rbf', 'sigmoid']] = 'rbf', | ||
sigma: Optional[float] = None, | ||
kernel_params: Optional[Dict] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Argument order is different from other detectors that take a kernel (e.g. PCA
), please double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be one of the tasks before the public release - consistency of public APIs wrt args (order, naming, types etc.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added an issue to here, I'll leave it for now as it makes sense to just do this all at once at the end.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/_svm.py
Outdated
directly through its primal formulation. The Nystroem approximation can optionally be used to speed up | ||
training and inference by approximating the kernel's RKHS. | ||
|
||
We provide two backends for the one class svm, one based on PyTorch and one based on scikit-learn. The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one class -> one-class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/_svm.py
Outdated
|
||
Parameters | ||
---------- | ||
kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Order of args different again and some missing, see also above note.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/_svm.py
Outdated
Additional parameters (keyword arguments) for kernel function passed as a dictionary. Only used for the | ||
'``sklearn``' backend and the kernel is a custom function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confusing grammar and wording - not clear what's possible. (and ->or if?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: No longer relevant due to other changes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/sklearn/svm.py
Outdated
nu=fit_kwargs.get('nu', 0.5), | ||
tol=fit_kwargs.get('tol', 1e-3), | ||
max_iter=fit_kwargs.get('max_iter', 1000), | ||
verbose=fit_kwargs.get('verbose', 0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slightly worried that defaults are defined in two places - here and in fit
. Is there a better way of doing this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've made an issue for this as I use the same pattern in the gmm
detector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/sklearn/svm.py
Outdated
""" | ||
self.check_fitted() | ||
x = self.nystroem.transform(x) | ||
return - self.gmm.score_samples(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: remove space after the minus sign (assuming it's not a bug to have a minus sign!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/sklearn/svm.py
Outdated
kernel_params=self.kernel_params, | ||
) | ||
x_ref = self.nystroem.fit(x_ref).transform(x_ref) | ||
self.gmm = SGDOneClassSVM( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name gmm
misleading?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤦 Good catch, removed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/pytorch/svm.py
Outdated
if isinstance(self.kernel, str): | ||
if self.kernel not in ['rbf']: | ||
raise ValueError( | ||
f'Currently only the rbf Kernel is supported for the SVM torch backend, got {self.kernel}.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit misleading as the user can use a custom kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
alibi_detect/od/pytorch/svm.py
Outdated
self.n_components | ||
) | ||
|
||
X_nys = self.nystroem.fit(x_ref).transform(x_ref) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noting that in sklearn
backend the transformed data was called x_ref
, although I think X_nys
is clearer. Would suggest making the change in sklearn
backend for consistency (and suggest sticking with lower-case convention, so x_nys
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏻
|
||
The Support vector machine outlier detector fits a one-class SVM to the reference data. | ||
|
||
Rather than the typical approach of optimizing the exact kernel OCSVM objective through a dual formulation, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this might be a bit too much detail for the highest level docstring. However I do intend to iterate slightly on these docstrings when doing the documentation so happy to leave until then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'll leave it for now.
alibi_detect/od/_svm.py
Outdated
|
||
Rather than the typical approach of optimizing the exact kernel OCSVM objective through a dual formulation, | ||
here we instead map the data into the kernel's RKHS and then solve the linear optimization problem | ||
directly through its primal formulation. The Nystroem approximation can optionally be used to speed up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Nystroem approximation isn't optional
alibi_detect/od/_svm.py
Outdated
Parameters | ||
---------- | ||
kernel | ||
Used to define similarity between data points. If using the pytorch backend, this can be either a string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second sentence here doesn't make sense
alibi_detect/od/_svm.py
Outdated
n_components: Optional[int] = None, | ||
backend: Literal['pytorch', 'sklearn'] = 'sklearn', | ||
device: Optional[Union[Literal['cuda', 'gpu', 'cpu'], 'torch.device']] = None, | ||
kernel: Union['torch.nn.Module', Literal['linear', 'poly', 'rbf', 'sigmoid']] = 'rbf', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kernel definition here has become a little unwieldy and inconsistent with how we do things elsewhere. Also, although it is technically possible, users should not be specifying linear or polynomial kernels for this detector as the method relies on the kernel inducing an infinite dimensional RKHS.
I wonder if we instead encourage users to pass kernels in exactly the same way as they do for other detectors (so here we'd just have the kernel
kwarg and not the sigma
and kernel_params
kwarg). We then always do our own nystrtoem approximation to produce kernalised features and only rely on sklearn for the OCSVM part?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think consistency across detector definitions is particularly important given we envisage the primary use-case of the detector is as a component within a large ensemble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See PR notes for update on solution taken
alibi_detect/od/pytorch/svm.py
Outdated
if self.kernel == 'rbf': | ||
if self.sigma is not None: | ||
sigma = torch.tensor(self.sigma, device=self.device) | ||
self.kernel = GaussianRBF(sigma=sigma) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here if self.sigma
is None
then sigma
is not defined and sigma=sigma
causes an error
alibi_detect/od/_svm.py
Outdated
backends = { | ||
'pytorch': { | ||
'sgd': SdgSVMTorch, | ||
'gd': DgSVMTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here the class names have the d
and g
the wrong way round. Also could we perhaps use "batch gradient descent" (so BgdSVMTorch
) for the Pytorch variant? I think it makes the distinction clearer as just gradient descent is general enough that it could refer to both variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with BGD
.
alibi_detect/od/_svm.py
Outdated
def __init__( | ||
self, | ||
nu: float, | ||
kernel: 'torch.nn.Module', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we could pass the gaussian RBF as a default here?
Parameters | ||
---------- | ||
nu | ||
The proportion of the training data that should be considered outliers. Note that this does not necessarily |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets make clear this should be thought of as a regularisation parameter that affects how smooth the decsion boundary will be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I've basically just added your comment on at the end:
The proportion of the training data that should be considered outliers. Note that this does not necessarily correspond to the false positive rate on test data, which is still defined when calling the
infer_threshold
method.nu
should be thought of as a regularization parameter that affects how smooth the svm decision boundary is.
alibi_detect/od/pytorch/svm.py
Outdated
self.check_fitted() | ||
x_nys = self.nystroem.transform(x) | ||
x_nys = x_nys.cpu().numpy() | ||
return self._to_tensor(-self.svm.score_samples(x_nys)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, due to differences between how we and sklearn
set up our optimisations, scores returned via sgd
vs gd
variants differ by a linear transformation. Lets fix the intercept at 1 and scale coefficients accordingly such that we have consistency in this regard. (Note that out intercept
vairable is equal to sklearn
's offset
variable as intercept = 1-offset
.)
} | ||
|
||
|
||
class SVM(BaseDetector, ThresholdMixin, FitMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pass None
as the default for n_components
?
alibi_detect/od/pytorch/svm.py
Outdated
""" | ||
if (isinstance(device, str) and device in ('gpu', 'cuda')) or \ | ||
(isinstance(device, torch.device) and device.type == 'cuda'): | ||
warnings.warn(('The `sgd` optimization option is best suited for CPU. If ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we modify to convey that sgd
will run on cpu regardless and that in this case only the nystroem approximation will be done on gpu.
Much cleaner detector definition and abstractions now - definitely a good shout to have user specify optimisation method themselves. Nice one! |
alibi_detect/od/_svm.py
Outdated
kernel | ||
Kernel function to use for outlier detection. Should be an instance of a subclass of `torch.nn.Module`. | ||
n_components | ||
Number of components in the Nystroem approximation By default uses all of them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
,
needed after approximation
.
alibi_detect/od/_svm.py
Outdated
from alibi_detect.base import (BaseDetector, FitMixin, ThresholdMixin, | ||
outlier_prediction_dict) | ||
from alibi_detect.exceptions import _catch_error as catch_error | ||
from alibi_detect.od.pytorch import DgSVMTorch, SdgSVMTorch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: a little atypical to have the abbreviation SVM
in all caps, but SGD as Sgd
. However, on the other hand, SGDSVM
is a little hard to read... Maybe add a _
, or just leave...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, for some reason SGD_SVMTorch
and SgdSvmTorch
seems worse than SgdSVMTorch
to me. I'm ambivalent though, @ojcobb, any preference?
alibi_detect/od/_svm.py
Outdated
The number of iterations over which the loss must decrease by `tol` in order for optimization to continue. | ||
This is only used for the ``'gd'`` optimization.. | ||
verbose | ||
Verbosity level during training. ``0`` is silent, ``1`` a progress bar or sklearn training output for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
verbose
description is referring to sklearn
implementation only. Unclear if verbose
is still relevant when the torch
backend is used. Would be good to make the description more generic, or be more descriptive as to when the verbose
kwarg is relevant.
alibi_detect/od/_svm.py
Outdated
as a tuple of the form `(min_eta, max_eta)` and only used for the ``'gd'`` optimization. | ||
n_step_sizes | ||
The number of step sizes in the defined range to be tested for loss reduction. This many points are spaced | ||
equidistantly along the range in log space. This is only used for the ``'gd'`` optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe spaced equidistantly
-> uniformly distributed
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to spaced evenly
!
alibi_detect/od/pytorch/svm.py
Outdated
max_iter: int = 1000, | ||
verbose: int = 0, | ||
) -> Dict: | ||
"""Fit the SVM detector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are being more descriptive about what is being fit in SgdSVMTorch.fit
method, is it worth being slightly more descriptive here? i.e. Fit the pytorch SVM detector
or similar?
alibi_detect/od/pytorch/svm.py
Outdated
|
||
Returns | ||
------- | ||
Dictionary with fit results. The dictionary contains the following keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys
-> keys:
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alibi_detect/od/_svm.py
Outdated
x_ref: np.ndarray, | ||
tol: float = 1e-6, | ||
max_iter: int = 1000, | ||
step_size_range: Tuple[float, float] = (1e-6, 1.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change default step_size_range
to (1e-8, 1.0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this! Now fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
What is this?
This PR adds the SVM outlier detector. Example notebook here.
Runtime comparisons:
Note that because I've allowed the user to set the optimization by the
kwarg
rather than using their choice of device this means the user can run thesdg
method on thegpu
. In reality, this just means that the Nystrome approximation will be performed on the GPU and but the SVM will be fit on the CPU.see this notebook.
x_ref
here is a dataset sampled from a normal distribution in 512 dimensions.dsize
is the size of the dataset sampled. There are four combinations for each ofdevice = 'cpu'/'gpu'
or optimization equal togd
,sdg
. We want to seegpu-gd
as the lowest curve.TODO:
Notes:
GaussianRBF
which isn't torch-scriptable yet. Hence this PR will not implement torch scriptable functionality for this detector.TheGaussianRBF
is incompatible with the SKlearn backend which isn't really an issue because SKlearn users can specify the kernel as a string and this behaviour is exposed through the wrapper API. Will probably require a note in the docs though.kwarg
. Instead, the user specifies an optimizationkwarg
. This is similar to the PCA linear and kernel backends (here and here) so it's not a massive break from the approach taken elsewhere.