Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For layer sparse regularization, why not use L1 loss? #10

Open
BCL123456-BAL opened this issue Sep 23, 2023 · 5 comments
Open

For layer sparse regularization, why not use L1 loss? #10

BCL123456-BAL opened this issue Sep 23, 2023 · 5 comments

Comments

@BCL123456-BAL
Copy link

You said in your paper :
“we aim to penalize the L0 norm of these components, i.e., k [𝑊1 , . . . , 𝑊𝐾 ] k 0 . Since the L0 norm is not differ-
entiable, we design a specialized soft counting norm”.
But the L1 norm is the optimal convex approximation of the L0 norm, and it is easier to optimize and solve than the L0 norm, so why don't you use the L1 norm to achieve regularization.

@yuehaowang
Copy link
Owner

These weights are computed via alpha blending activation, which means all of them are non-negative and summed to 1 (convex combination). Thereby the L1 norm of the weights is constantly 1 and cannot be optimized.

@BCL123456-BAL
Copy link
Author

Thank you for your guidance, can you ask where you plan to publish this article.

@yuehaowang
Copy link
Owner

This paper has been accepted by ACM MM 2023.

@BCL123456-BAL
Copy link
Author

I wanted to cite your paper, but I only found it on arXiv Preprint

@yuehaowang
Copy link
Owner

I think it's okay to cite the arXiv preprint since the official version is still under publishing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants