-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For layer sparse regularization, why not use L1 loss? #10
Comments
These weights are computed via alpha blending activation, which means all of them are non-negative and summed to 1 (convex combination). Thereby the L1 norm of the weights is constantly 1 and cannot be optimized. |
Thank you for your guidance, can you ask where you plan to publish this article. |
This paper has been accepted by ACM MM 2023. |
I wanted to cite your paper, but I only found it on arXiv Preprint |
I think it's okay to cite the arXiv preprint since the official version is still under publishing. |
You said in your paper :
“we aim to penalize the L0 norm of these components, i.e., k [𝑊1 , . . . , 𝑊𝐾 ] k 0 . Since the L0 norm is not differ-
entiable, we design a specialized soft counting norm”.
But the L1 norm is the optimal convex approximation of the L0 norm, and it is easier to optimize and solve than the L0 norm, so why don't you use the L1 norm to achieve regularization.
The text was updated successfully, but these errors were encountered: