For layer sparse regularization, why not use L1 loss？ #10

BCL123456-BAL · 2023-09-23T02:02:08Z

You said in your paper :
“we aim to penalize the L0 norm of these components, i.e., k [𝑊1 , . . . , 𝑊𝐾 ] k 0 . Since the L0 norm is not differ-
entiable, we design a specialized soft counting norm”.
But the L1 norm is the optimal convex approximation of the L0 norm, and it is easier to optimize and solve than the L0 norm, so why don't you use the L1 norm to achieve regularization.

yuehaowang · 2023-09-23T08:01:49Z

These weights are computed via alpha blending activation, which means all of them are non-negative and summed to 1 (convex combination). Thereby the L1 norm of the weights is constantly 1 and cannot be optimized.

BCL123456-BAL · 2023-09-23T09:58:58Z

Thank you for your guidance, can you ask where you plan to publish this article.

yuehaowang · 2023-09-23T14:41:54Z

This paper has been accepted by ACM MM 2023.

BCL123456-BAL · 2023-09-24T05:20:43Z

I wanted to cite your paper, but I only found it on arXiv Preprint

yuehaowang · 2023-09-24T08:40:08Z

I think it's okay to cite the arXiv preprint since the official version is still under publishing.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

For layer sparse regularization, why not use L1 loss？ #10

For layer sparse regularization, why not use L1 loss？ #10

BCL123456-BAL commented Sep 23, 2023

yuehaowang commented Sep 23, 2023

BCL123456-BAL commented Sep 23, 2023

yuehaowang commented Sep 23, 2023

BCL123456-BAL commented Sep 24, 2023

yuehaowang commented Sep 24, 2023

For layer sparse regularization, why not use L1 loss？ #10

For layer sparse regularization, why not use L1 loss？ #10

Comments

BCL123456-BAL commented Sep 23, 2023

yuehaowang commented Sep 23, 2023

BCL123456-BAL commented Sep 23, 2023

yuehaowang commented Sep 23, 2023

BCL123456-BAL commented Sep 24, 2023

yuehaowang commented Sep 24, 2023