Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increasingly negative loss in denoising autoencoder #49

Open
pumakim opened this issue Dec 3, 2016 · 1 comment
Open

Increasingly negative loss in denoising autoencoder #49

pumakim opened this issue Dec 3, 2016 · 1 comment

Comments

@pumakim
Copy link

pumakim commented Dec 3, 2016

Hi,
I am training 3 layers stacked denoising autoencoder which has a bit of difference in loss function.

I want to make autoencoder that tries to reconstruct the 'global' input (not previous layer's output), which means the original input that was fed to the first layer, using normally obtained input which is previous layer's output.

I just edited all parameter 'self.x' to 'self.x_global' on def 'get_cost_updates' in layers/da.py.
self.x_global means the original input that was fed to the first layer (self.x in models/sda.py).

And the result of training was like this.

[2016-12-04 01:36:48.845309] > ... training the model
[2016-12-04 01:37:01.856494] > layer 0, epoch 0, reconstruction cost 405.427734
[2016-12-04 01:37:15.579682] > layer 0, epoch 1, reconstruction cost 381.404175
[2016-12-04 01:37:29.242537] > layer 0, epoch 2, reconstruction cost 377.724701
[2016-12-04 01:37:43.045209] > layer 0, epoch 3, reconstruction cost 375.875977
[2016-12-04 01:37:56.615403] > layer 0, epoch 4, reconstruction cost 374.741211
[2016-12-04 01:38:11.105572] > layer 1, epoch 0, reconstruction cost -108174.476562
[2016-12-04 01:38:24.891239] > layer 1, epoch 1, reconstruction cost -334065.656250
[2016-12-04 01:38:38.807076] > layer 1, epoch 2, reconstruction cost -561826.187500
[2016-12-04 01:38:52.979225] > layer 1, epoch 3, reconstruction cost -790545.687500
[2016-12-04 01:39:07.143726] > layer 1, epoch 4, reconstruction cost -1019794.250000
[2016-12-04 01:39:21.975468] > layer 2, epoch 0, reconstruction cost -152930.156250
[2016-12-04 01:39:36.551489] > layer 2, epoch 1, reconstruction cost -460353.750000
[2016-12-04 01:39:51.328428] > layer 2, epoch 2, reconstruction cost -767839.625000
[2016-12-04 01:40:05.910295] > layer 2, epoch 3, reconstruction cost -1075358.750000
[2016-12-04 01:40:20.484577] > layer 2, epoch 4, reconstruction cost -1382889.500000

Reconstruction cost goes increasingly negative.
Is it normal? What it means??

Here is my edited code (just self.x to self.x_global in original code).

self.x_global is self.x in models/sda.py (original input)
###################################################
def get_last_cost_updates(self, corruption_level, learning_rate, momentum):
""" This function computes the cost and the updates for one trainng step of the dA """

    tilde_x = self.get_corrupted_input(self.x, corruption_level)
    y = self.get_hidden_values(tilde_x)
z = self.get_reconstructed_input(y)
    L = - T.sum(self.x_global * T.log(z) + (1 - self.x_global) * T.log(1 - z), axis=1)
#L=0

    if self.reconstruct_activation is T.tanh:
        L = T.sqr(self.x_global - z).sum(axis=1)
#    L=0

    if self.sparsity_weight is not None:
        sparsity_level = T.extra_ops.repeat(self.sparsity, self.n_hidden)
        avg_act = y.mean(axis=0)

        kl_div = self.kl_divergence(sparsity_level, avg_act)

        cost = T.mean(L) + self.sparsity_weight * kl_div.sum()
    else:
        cost = T.mean(L)

    # compute the gradients of the cost of the `dA` with respect
    # to its parameters (derivative cost with respect to params)
    gparams = T.grad(cost, self.params)
    # generate the list of updates
    updates = collections.OrderedDict()
    for dparam, gparam in zip(self.delta_params, gparams):
        updates[dparam] = momentum * dparam - gparam*learning_rate
    for dparam, param in zip(self.delta_params, self.params):
        updates[param] = param + updates[dparam]

    return (cost, updates)

###################################################

@MaigoAkisame
Copy link
Collaborator

I'm not sure if this is caused by overflow.

I see two possible paths for calculating L and cost. Do you know which path is actually executed in your experiment?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants