You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm posting this question here because I don't know anywhere else to ask it but I can take it down if it's not the proper medium to ask it.
Looking at the paper, it seemed to me that the relatively large variation in score due to different initialisation could mean that if we average the scoring given by n different initialisation we could have a much more consistent scoring.
However, after implementing a quick version of that idea, the result showed no significative improvement.
Can someone explain to me why I was wrong ?
Thanks in advance
The text was updated successfully, but these errors were encountered:
This is probably because the selected architecture will be the one with the highest score, and the highest scores do not show high variance anyway. There should be a big difference when you select the worse architecture though. You can check it if you're interested. :)
However an important takeaway from this plot is that relatively good archs will show lower variance. This idea is actually implemented here, with modest but consistent results.
Hello,
I'm posting this question here because I don't know anywhere else to ask it but I can take it down if it's not the proper medium to ask it.
Looking at the paper, it seemed to me that the relatively large variation in score due to different initialisation could mean that if we average the scoring given by n different initialisation we could have a much more consistent scoring.
However, after implementing a quick version of that idea, the result showed no significative improvement.
Can someone explain to me why I was wrong ?
Thanks in advance
The text was updated successfully, but these errors were encountered: