-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path7lossfunctionsoptimizers.html
219 lines (190 loc) · 14.5 KB
/
7lossfunctionsoptimizers.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
<meta name="description" content="Deep Learning Tutorial using Keras">
<meta name="author" content="Lindsey M Kitchell">
<title>Intro to Deep Learning</title>
<!-- Bootstrap core CSS -->
<link href="vendor/bootstrap/css/bootstrap.min.css" rel="stylesheet">
<!-- Custom styles for this template -->
<link href="css/simple-sidebar.css" rel="stylesheet">
<!-- fonts -->
<link href="https://fonts.googleapis.com/css?family=Source+Sans+Pro:300,400,600,700&display=swap" rel="stylesheet">
</head>
<body>
<div class="d-flex" id="wrapper">
<!-- Sidebar -->
<div class="bg-light border-right" id="sidebar-wrapper">
<div class="sidebar-heading">Deep Learning With Keras</div>
<div class="list-group list-group-flush">
<a href="1introtodeeplearning.html" class="list-group-item list-group-item-action bg-light">1. Intro to Deep Learning</a>
<a href="2introtokeras.html" class="list-group-item list-group-item-action bg-light">2. Intro to Keras</a>
<a href="3mlpsinkeras.html" class="list-group-item list-group-item-action bg-light">3. MLPs in Keras</a>
<a href="4cnnsinkeras.html" class="list-group-item list-group-item-action bg-light">4. CNNs in Keras</a>
<a href="5activationfunctions.html" class="list-group-item list-group-item-action bg-light">5. Activation Functions</a>
<a href="6otherkerasfunctions.html" class="list-group-item list-group-item-action bg-light">6. Other Useful Keras Functions</a>
<a href="7lossfunctionsoptimizers.html" class="list-group-item list-group-item-action bg-light">7. Loss Functions and Optimizers</a>
<a href="8evaluatingnns.html" class="list-group-item list-group-item-action bg-light">8. Evaluating Neural Networks</a>
<a href="9datapreprocessing.html" class="list-group-item list-group-item-action bg-light">9. Data Preprocessing</a>
<a href="10regularization.html" class="list-group-item list-group-item-action bg-light">10. Regularization</a>
<a href="11hyperparametertuning.html" class="list-group-item list-group-item-action bg-light">11. Hyperparameter Tuning</a>
</div>
</div>
<!-- /#sidebar-wrapper -->
<!-- Page Content -->
<div id="page-content-wrapper">
<nav class="navbar navbar-expand-lg navbar-light bg-light border-bottom">
<button class="btn btn-primary" id="menu-toggle">Toggle Menu</button>
<button class="navbar-toggler" type="button" data-toggle="collapse" data-target="#navbarSupportedContent" aria-controls="navbarSupportedContent" aria-expanded="false" aria-label="Toggle navigation">
<span class="navbar-toggler-icon"></span>
</button>
<div class="collapse navbar-collapse" id="navbarSupportedContent">
<ul class="navbar-nav ml-auto mt-2 mt-lg-0">
<li class="nav-item active">
<a class="nav-link" href="index.html">Home <span class="sr-only">(current)</span></a>
</li>
<li class="nav-item">
<a class="nav-link" target="_blank" href="https://lindseykitchell.weebly.com/">About the Author</a>
</li>
<!--
<li class="nav-item dropdown">
<a class="nav-link dropdown-toggle" href="#" id="navbarDropdown" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false">
Dropdown
</a>
<div class="dropdown-menu dropdown-menu-right" aria-labelledby="navbarDropdown">
<a class="dropdown-item" href="#">Action</a>
<a class="dropdown-item" href="#">Another action</a>
<div class="dropdown-divider"></div>
<a class="dropdown-item" href="#">Something else here</a>
</div>
</li>
-->
</ul>
</div>
</nav>
<div class="container-fluid">
<h1 id="loss-functions-and-optimizers">Loss Functions and Optimizers</h1>
<hr>
<p>When compiling your model you need to choose a loss function and an optimizer. The loss function is the quantity that will be minimized during training. The optimizer determines how the network will be updated based on the loss function. </p>
<p>Example compile step:</p>
<pre><code class="lang-python"><span class="hljs-keyword">model</span>.compile(optimizer=<span class="hljs-string">'rmsprop'</span>, loss=<span class="hljs-string">'binary_crossentropy'</span>, metrics=[<span class="hljs-string">'accuracy'</span>])
</code></pre>
<h2 id="loss-functions">Loss Functions</h2>
<p>There are some simple guidlines for choosing the correct loss function:</p>
<p><strong>binary crossentropy</strong> (<code>binary_crossentropy</code>) is used when you have a two-class, or binary, classification problem. </p>
<p><strong>categorical crossentropy</strong> (<code>categorical_crossentropy</code>) is used for a multi-class classification problem. </p>
<p><strong>mean squared error</strong> (<code>mean_squared_error</code>) is used for a regression problem. </p>
<p>In general, crossentropy loss functions are best to use when the model you use is outputting probabilities. </p>
<p>Here is the <a href="https://keras.io/losses/">Keras documentaiton for loss functions</a></p>
<h2 id="optimizers">Optimizers</h2>
<p>There are many optimizers you can use and many are a variant of stochastic gradient descent. For all of them you will be able to tune the <strong>learning rate</strong> parameter. The learning rate parameter tells the optimizer how far to move the weights of the layer in the direction opposite of the gradient. This parameter is very important, if it is too high then the training of the model may never converge. If it is too low, then the training is more relibable but very slow. It is best to try out multiple different learning rates to find which one is best. </p>
<p><img class="center img-fluid" src="https://cdn-images-1.medium.com/max/800/1*EP8stDFdu_OxZFGimCZRtQ.jpeg" alt=""></p>
<p><a href="https://towardsdatascience.com/estimating-optimal-learning-rate-for-a-deep-neural-network-ce32f2556ce0">image source is this useful resource on learning rates</a></p>
<p>Here is the <a href="https://keras.io/optimizers/">Keras documentation on optimizers</a>.</p>
<h3 id="stochastic-gradient-descent">Stochastic Gradient Descent</h3>
<pre><code class="lang-python">keras.optimizers.SGD(<span class="hljs-attr">lr=0.01,</span> <span class="hljs-attr">momentum=0.0,</span> <span class="hljs-attr">decay=0.0,</span> <span class="hljs-attr">nesterov=False)</span>
</code></pre>
<p>This is a common 'basic' optimizer and many optimizers are variants of this. It can be adjusted by changing the learning rate, momentum and decay.</p>
<ul>
<li>learning rate (lr) </li>
<li>momentum - accelerates SGD in the relevant direction and dampens oscillations. Basically it helps SGD push past local optima, gaining faster convergence and less oscillation. A typical choice of momentum is between 0.5 to 0.9.</li>
<li>decay - you can set a decay function for the learning rate. This will adjust the learning rate as training progresses.</li>
<li>nesterov - <a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1">Nesterov momentum is a different version of the momentum method which has stronger theoretical converge guarantees for convex functions. In practice, it works slightly better than standard momentum</a></li>
</ul>
<p><strong>Decay Functions</strong></p>
<ul>
<li>Time based decay</li>
</ul>
<p>This changes the learning rate by dividing it by the epoch the model is on. </p>
<pre><code class="lang-python"><span class="hljs-attr">decay_rate</span> = learning_rate / epochs
<span class="hljs-comment">## set decay = decay_rate in the SGD function</span>
</code></pre>
<ul>
<li>Step decay </li>
</ul>
<p>Step decay can be done using the <a href="https://keras.io/callbacks/#learningratescheduler">learning rate scheduler</a> callback function to drop the learning rate every few epochs. In the example below it drops it by half every 10 epochs. </p>
<pre><code class="lang-python">def step_decay(epoch):
<span class="hljs-attr">initial_lrate</span> = <span class="hljs-number">0.1</span>
<span class="hljs-attr">drop</span> = <span class="hljs-number">0.5</span>
<span class="hljs-attr">epochs_drop</span> = <span class="hljs-number">10.0</span>
<span class="hljs-attr">lrate</span> = initial_lrate * math.pow(drop,
math.floor((<span class="hljs-number">1</span>+epoch)/epochs_drop))
return lrate
<span class="hljs-attr">lrate</span> = LearningRateScheduler(step_decay)
<span class="hljs-comment">#include the callback in the fit function</span>
model.fit(X_train, y_train, <span class="hljs-attr">validation_data=(X_test,</span> y_test),
<span class="hljs-attr">epochs=epochs,</span> <span class="hljs-attr">batch_size=batch_size,</span> <span class="hljs-attr">callbacks=lrate,</span>
<span class="hljs-attr">verbose=2)</span>
</code></pre>
<ul>
<li>Exponential decay</li>
</ul>
<pre><code class="lang-python">def exp_decay(epoch):
<span class="hljs-attr">initial_lrate</span> = <span class="hljs-number">0.1</span>
<span class="hljs-attr">k</span> = <span class="hljs-number">0.1</span>
<span class="hljs-attr">lrate</span> = initial_lrate * exp(-k*epoch)
return lrate
<span class="hljs-attr">lrate</span> = LearningRateScheduler(exp_decay)
<span class="hljs-comment">#include the callback in the fit function</span>
model.fit(X_train, y_train, <span class="hljs-attr">validation_data=(X_test,</span> y_test),
<span class="hljs-attr">epochs=epochs,</span> <span class="hljs-attr">batch_size=batch_size,</span> <span class="hljs-attr">callbacks=lrate,</span>
<span class="hljs-attr">verbose=2)</span>
</code></pre>
<h2 id="adaptive-learning-rate-optimizers">Adaptive learning rate optimizers</h2>
<p>The following optimizers use a heuristic approach to tune some parameters automatically. Descriptions are mostly from the Keras documentation. </p>
<h3 id="adagrad">Adagrad</h3>
<pre><code class="lang-python">keras.optimizers.Adagrad(<span class="hljs-attr">lr=0.01,</span> <span class="hljs-attr">epsilon=None,</span> <span class="hljs-attr">decay=0.0)</span>
</code></pre>
<p>Adagrad is an optimizer with parameter-specific learning rates, which are adapted relative to how frequently a parameter gets updated during training. The more updates a parameter receives, the smaller the updates.</p>
<p>Keras recommends that you use the default parameters. </p>
<h3 id="adadelta">Adadelta</h3>
<pre><code class="lang-python">keras.optimizers.Adadelta(<span class="hljs-attr">lr=1.0,</span> <span class="hljs-attr">rho=0.95,</span> <span class="hljs-attr">epsilon=None,</span> <span class="hljs-attr">decay=0.0)</span>
</code></pre>
<p>Adadelta is a more robust extension of Adagrad that adapts learning rates based on a moving window of gradient updates, instead of accumulating all past gradients. This way, Adadelta continues learning even when many updates have been done. Compared to Adagrad, in the original version of Adadelta you don't have to set an initial learning rate. In this version, initial learning rate and decay factor can be set, as in most other Keras optimizers.</p>
<p>Keras recommends that you use the default parameters. </p>
<h3 id="rmsprop">RMSprop</h3>
<pre><code class="lang-python">keras.optimizers.RMSprop(<span class="hljs-attr">lr=0.001,</span> <span class="hljs-attr">rho=0.9,</span> <span class="hljs-attr">epsilon=None,</span> <span class="hljs-attr">decay=0.0)</span>
</code></pre>
<p>RMSprop is similar to Adadelta and adjusts the Adagrad method in a very simple way in an attempt to reduce its aggressive, monotonically decreasing learning rate.</p>
<p>Keras recommends that you only adjust the learning rate of this optimzer. </p>
<h3 id="adam">Adam</h3>
<pre><code class="lang-python">keras.optimizers.Adam(<span class="hljs-attr">lr=0.001,</span> <span class="hljs-attr">beta_1=0.9,</span> <span class="hljs-attr">beta_2=0.999,</span> <span class="hljs-attr">epsilon=None,</span> <span class="hljs-attr">decay=0.0,</span> <span class="hljs-attr">amsgrad=False)</span>
</code></pre>
<p>Adam is an update to the RMSProp optimizer. It is basically RMSprop with momentum.</p>
<p>Keras recommends that you use the default parameters. </p>
<p>The loss functions, metrics, and optimizers can be customized and configured like so:</p>
<pre><code class="lang-python">from keras <span class="hljs-built_in">import</span> optimizers
from keras <span class="hljs-built_in">import</span> losses
from keras <span class="hljs-built_in">import</span> metrics
model.compile(<span class="hljs-attr">optimizer=optimizers.RMSprop(lr=0.001),</span> <span class="hljs-attr">loss=losses.binary_crossentropy,</span> <span class="hljs-attr">metrics=[metrics.binary_accuracy])</span>
<span class="hljs-comment">#OR</span>
<span class="hljs-attr">loss</span> = losses.binary_crossentropy
<span class="hljs-attr">rmsprop</span> = optimizers.RMSprop(<span class="hljs-attr">lr=0.001)</span>
model.compile(<span class="hljs-attr">optimizer=rmsprop,</span> <span class="hljs-attr">loss=loss,</span> <span class="hljs-attr">metrics=[metrics.binary_accuracy])</span>
</code></pre>
<h3 id="useful-resources-">Useful Resources:</h3>
<ul>
<li><a href="https://medium.com/octavian-ai/which-optimizer-and-learning-rate-should-i-use-for-deep-learning-5acb418f9b2">https://medium.com/octavian-ai/which-optimizer-and-learning-rate-should-i-use-for-deep-learning-5acb418f9b2</a></li>
<li><a href="https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1">https://towardsdatascience.com/learning-rate-schedules-and-adaptive-learning-rate-methods-for-deep-learning-2c8f433990d1</a></li>
<li><a href="http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms">http://ruder.io/optimizing-gradient-descent/index.html#gradientdescentoptimizationalgorithms</a></li>
</ul>
<p><strong>Please move on to <a href="8evaluatingnns.html">Evaluating the Neural Networks (cross validation)</a></strong></p>
</div>
</div>
<!-- /#page-content-wrapper -->
</div>
<!-- /#wrapper -->
<!-- Bootstrap core JavaScript -->
<script src="vendor/jquery/jquery.min.js"></script>
<script src="vendor/bootstrap/js/bootstrap.bundle.min.js"></script>
<!-- Menu Toggle Script -->
<script>
$("#menu-toggle").click(function(e) {
e.preventDefault();
$("#wrapper").toggleClass("toggled");
});
</script>
</body>
</html>