You Raise Me Up: [Research] Neural Networks' Optimization method--Moving Average Model

Sunday, 30 December 2018

[Research] Neural Networks' Optimization method--Moving Average Model

1. Introduction

Moving average model is a method used to optimize neural networks which makes the mode more robust based on the training set. In order to better understand this method, in this post, I will introduce the principle of this method.

2. Background

2.1 Low Pass Filter，LPF

The first-order filter is also called first-order inertial filter, or first-order low pass filter. The equation is expressed in the form:

Y(n)=αX(n) (1-α)Y(n-1)

where, α is called filter coefficient; X(n) stands for the current sample; Y(n-1) is the last output after filtering; Y(n) is the current output after filtering. We can see, the first-order low pass filter weighs the current sample and previous output and uses the sum of them as the current output.

The feature of this filter:

The smaller α is, the more stable of the output but the lower the sensibility is;
The smaller α is, the lower the sensibility is but the more stable of the output;

To balance the sensibility and stability, we need to set the coefficient reasonably.

3. Moving Average Model

As said before, we want to make the model more robust which means more stable. According to the feature of the LPF, the coefficient α should be smaller enough. That is to say, 1-α should be big enough.

Then we have the definition of the moving average model:

shadow_variablen=decay * shadow_variablen-1 +(1-decay) * variablen

The shadow_variable is the output at each input.

In practical, the decay would be set close to 1, e.g. 0.999, 0.9999. In order to make the model update faster, the function in TensorFlow provides a parameter to dynamically set the value of decay:

decay=min{decay, (1+epoch_updates)/(10+epoch_updates)}

where, epoch_updates means the times of epoch (whole examples are trained once).

Take an example:

At first, we set variable is 0 and the shadow_variable is 0. the decay is 0.99

Then set epoch_updates 0, the variable is 5:

decay=min{0.99, (1+0)/(10+0)}=0.1; shadow_variable=0.1*0+0.9*5=4.5

Then set epoch_updates 10000, the variable is 10:

decay=min{0.99, (1+10000)/(10+10000)}=0.99; shadow_variable=0.99*4.5+0.01*10=4.555.

Then set epoch_updates 10000, the variable is 10:

decay=min{0.99, (1+10000)/(10+10000)}=0.99; shadow_variable=0.99*4.555+0.01*10=4.60945.

3.1 Objective

The model parameters include the weights and bias. So this moving average model will filter both weights and bias.

Reference

https://stanford.edu/~boyd/ee102/conv_demo.pdf

You Raise Me Up