Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The case g_w ~= 0 is ill-specified. #8

Open
PierreQuinton opened this issue Jan 24, 2024 · 3 comments
Open

The case g_w ~= 0 is ill-specified. #8

PierreQuinton opened this issue Jan 24, 2024 · 3 comments

Comments

@PierreQuinton
Copy link

PierreQuinton commented Jan 24, 2024

In case the solution gives $|g_w|=0$, then we get division by $0$. In this case, we get that the lagrangian $\lambda$ is $0$, therefore the solution is the same as the unconstrained one. Going back to the original problem we are therefore solving $\max_{d\in\mathbb R^n} \min_{i\in [n]}\langle g_i, d\rangle$. Now it is not too hard to show that whenever there is $0< w\in \mathbb{R}^m$ such that $A^T w=0$ ( $A$ is the jacobian matrix), then for any $d\neq 0$, then there is $i$ such that $\langle g_i, d\rangle < 0$. This means that the $\min$ is $-\infty$ unless $d=0$, therefore the outer $\max$ is reached for $d=0$.

@PierreQuinton PierreQuinton changed the title The case g_w == 0 is ill-specified. The case g_w ~= 0 is ill-specified. Jan 24, 2024
@Cranial-XIX
Copy link
Owner

Sorry that I do not see the problem here. Note that when $g_w = 0$ it has to be that $g_0 = 0$, as $g_w^\top g_0 \geq (1 - c) ||g_0||^2 \geq 0$. So it just means that CAGrad finds a stationary point of $L_0$, which is also a Pareto optimal point.

@PierreQuinton
Copy link
Author

This holds only when $c\leq 1$, I guess you don't consider so much the case $c>1$, could you explain a bit why ?

If $c>1$, it is possible that $g_w=0$ and $g_0\neq 0$ in that case when dividing by $| g_w|$ the computation of CAGrad in your code will blow up, This is because $\lambda=0$ and setting $d=g_0+g_w/\lambda$ is a problem. Actually in the original paper, I think we are supposed to split the cases when this happens, luckily if we know that $\lambda=0$, then the problem has the same solution as the unconstrained one, therefore from what I said above, you can just pick $d=0$.

@Cranial-XIX
Copy link
Owner

Thanks for asking, I now understand what you mean. Yes, $\lambda = 0$ should be a separate case as the optimal $d$ form is different. Thanks for catching it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants