Wednesday 13 March 2019

[Research] Particle Filter

1. Introduction

A stochastic model is a tool for estimating probability distributions of potential outcomes by allowing for random variation in one or more inputs over time. Stochastic models include the Gauss distribution based stochastic model and non-Gauss distribution based stochastic model. According to the stochastic model, Filters can be divided into two classes: 1) Bayes filter, particle filter with non-Gauss distribution based stochastic model; 2) Kalman filter, extended Kalman filter and Unscented Kalman filter with Gauss distribution based stochastic model.
Figure 1: classes of filters
Actually, when dealing with uncertain situations, we usually set the Gauss model as its stochastic model. However, that is not suitable for any situations. Thus, Bayes filter and particle filter are designed for cases using non-Gauss model. In the last post, I introduced Bayes filter, in this post, I will focus on the particle filter. In the beginning, I will give you an overview of particle filters.  And then details will be given.
Figure 2. Bayes filter framework
Figure 3. Overview of the particle filter

2. Inference Model

Particle filter and Bayes filter share the same inference model. Because in the Bayes filter, the posterior probability is hard to compute. We usually use Monte Carlo sampling method to approximate this posterior probability. That is the main idea of the particle filter. Instead of directly computing the posterior probability, the particle filter is to generate many random variables (called particles) to approaching this probability to conduct filtering.

From figure 2, the framework of Bayes filter, we know in a dynamic system (u means the activation to keep the system dynamic), x means the state variables and z the measurement variables. At each time epoch, the last state can be transited into the next state using Bayes Theorem based on existing information, which is the so-called prediction. When there is a measurement at current epoch, it will be fused to compute the posterior, which is the so-called update.

However, there are some conditions for Bayes inference. 1) the system will be dynamic which is already met. 2) The transition of the system states should be followed first-order Markov chain. It means the current state is only determined by the last state. For example, p(xt|x1,x2,...,xt-1) = p(xt|xt-1). 3) variables of measurements are independent to each other. p(zt|z1,z2,...,zt-1) = p(zt).

Bayes filter uses a recursive form to approach the optimal solution of the posterior probability density function (filtering). The optimal estimation of the system state is implemented by using the posterior probability density function, generally like, Maximum a posterior estimate (MAP) and minimum mean square error (MMSE). 
Figure 4. Estimation models
However, as for nonlinear and non-Guass system, it is hard to get the complete solution of the posterior probability. Thus, we usually use approximation methods to approach integration. the particle filter is a method to use the Monte Carlo sampling method to approximate this posterior probability.  

3. Importance Sampling (IS)

In section 2, we talked about the principle of the particle filter is to use the Monte Carlo method to generate particles to approximate the posterior probability. The process is: assume we can extract a few samples meeting p(xt|yt) and they are independent and identically distributed, x_t^(i), i = 1,...,N, then the expectation estimation of an arbitrary function f(xt) can be approached by using the form of sum.
Figure 5. sum to approaching integration
Then we can sum up three steps of Monte Carlo sampling:
  1. Build a probability model
  2. Generate Sampling from the appointed probability model
  3. Compute some estimators of these samples 


Then there is a question, generally, we don't know the information about this posterior probability, how can we generate particles to approximate it? That is what we call importance sampling.

The process of importance sampling is to arbitrarily import a known and easy-to-sample importance probability density function q(xt|z(1:t)). Then we generate N samples from this probability and use their weighted sum to approximate the posterior probability. Then we have
Figure 6. the Importance Sampling
That is the main idea of the Importance Sampling. However, here are two problems. 
  • How to distribute the weights?
  • At each epoch, we need to use all measurements to generate new samples and weights. It is very low-efficiency.    
Then a sequential importance sampling method is proposed to solve these problems.

4. Sequential Importance Sampling (SIS)

Sequential important sampling is the basis of the particle filter, which is based on the sequential analysis in statistics. It is applied to Monte Carlo to estimate the posterior probability recursively. The process is:
Assume the importance probability density function is q(x0:t|z1:t), then we have:
Figure 7. the sequential Importance Sampling probability density function
According to first-order Markov chain, current state is only determined by last one, and each measurement is independent from each other.
Figure 8. Applied Markov feature 
So then we need to get the sequential posterior probability density function. 
Figure 9. the sequential posterior probability density function.
And then we get the sequential weights:
Figure 10. the sequential weights.
From figure 7, 9, 10, we get the sequential model of importance sampling function, posterior model function and weights. Among them, the importance sampling function is user-defined, the posterior model can be computed by known probability, and so does the weights. With all variables known, we just need to intialize the weights and importance function. Particle filter will be implemented recursively. 

Based on aforesaid content, i give the process of the particle filter
Figure 11. particle filter process.

5. Weight Degeneracy

To get the correct state estimation, we hope the variance of weights is close to zero. However, sequential Monte Carlo faces the problem of weight degeneracy, which makes the variance of weights greater. And the meaning of weight degeneracy is after a few iterations, only a small number of them have greater weights and the others' weights are approximately zero. It leads to a decrease in efficient particles. The performance of particle filter degrades.

Generally, we use the number of efficient particles, Neff,  to evaluate the degeneracy degree. The smaller of Neff, the severer of weight degeneracy.  
Figure 12. the evaluation of weight degeneracy.
We set a threshold Nthresh, if the Neff is greater than Nthresh, there means weight degeneracy. We need to launch resampling to mitigate it.

Facing this weight degeneracy, there are two methods to mitigate it. 1) resampling; 2) chose more suitable importance density function. 

6. Solution 1: Resampling 

To solve the weight degeneracy problem, many kinds of resampling methods are used for it. 1) systematic resampling; 2) residual resampling; 3) stratified resampling; 4) multinomial resampling. There is a paper making a comparison of those resampling schemes and displaying the details of this method. Here I explain the process of them use code. The main rule of resampling is to generate the sub indexes and use it to chose particles. 

6.1 systematic resampling

def systematic_resample(weights):
    """ Performs the systemic resampling algorithm used by particle filters.
    This algorithm separates the sample space into N divisions. A single random
    offset is used to to choose where to sample from for all divisions. This
    guarantees that every sample is exactly 1/N apart.
    Parameters
    ----------
    weights : list-like of float
        list of weights as floats
    Returns
    -------
    indexes : ndarray of ints
        array of indexes into the weights defining the resample. i.e. the
        index of the zeroth resample is indexes[0], etc.
    """
    N = len(weights)
    # make N subdivisions, and choose positions with a consistent random offset
    positions = (random() + np.arange(N)) / N
    indexes = np.zeros(N, 'i')
    cumulative_sum = np.cumsum(weights)
    i, j = 0, 0
    while i < N:
        if positions[i] < cumulative_sum[j]:
            indexes[i] = j
            i += 1
        else:
            j += 1
    return indexes


6.2 residual resampling

def residual_resample(weights):

    """ Performs the residual resampling algorithm used by particle filters.
    Based on observation that we don't need to use random numbers to select
    most of the weights. Take int(N*w^i) samples of each particle i, and then
    resample any remaining using a standard resampling algorithm [1]
    Parameters
    ----------
    weights : list-like of float
        list of weights as floats
    Returns
    -------

    indexes : ndarray of ints
        array of indexes into the weights defining the resample. i.e. the
        index of the zeroth resample is indexes[0], etc.
    References
    ----------
    .. [1] J. S. Liu and R. Chen. Sequential Monte Carlo methods for dynamic
       systems. Journal of the American Statistical Association,
       93(443):1032–1044, 1998.
    """
    N = len(weights)
    indexes = np.zeros(N, 'i')

    # take int(N*w) copies of each weight, which ensures particles with the
    # same weight are drawn uniformly
    num_copies = (np.floor(N*np.asarray(weights))).astype(int)
    k = 0
    for i in range(N):
        for _ in range(num_copies[i]): # make n copies
            indexes[k] = i
            k += 1
    # use multinormal resample on the residual to fill up the rest. This
    # maximizes the variance of the samples
    residual = weights - num_copies     # get fractional part
    residual /= sum(residual)           # normalize
    cumulative_sum = np.cumsum(residual)
    cumulative_sum[-1] = 1. # avoid round-off errors: ensures sum is exactly one
    indexes[k:N] = np.searchsorted(cumulative_sum, random(N-k))
    return indexesdef residual_resample(weights):
    N = len(weights)
    indexes = np.zeros(N, 'i')
    num_copies = (np.floor(N*np.asarray(weights))).astype(int)
    k = 0
    for i in range(N):
        for _ in range(num_copies[i]): # make n copies
            indexes[k] = i
            k += 1
    # use multinormal resample on the residual to fill up the rest. This
    # maximizes the variance of the samples
    residual = weights - num_copies     # get fractional part
    residual /= sum(residual)           # normalize
    cumulative_sum = np.cumsum(residual)
    cumulative_sum[-1] = 1. # avoid round-off errors: ensures sum is exactly one
    indexes[k:N] = np.searchsorted(cumulative_sum, random(N-k))
    return indexes

6.3 stratified resampling

def stratified_resample(weights):
    """ Performs the stratified resampling algorithm used by particle filters.
    This algorithms aims to make selections relatively uniformly across the
    particles. It divides the cumulative sum of the weights into N equal
    divisions, and then selects one particle randomly from each division. This
    guarantees that each sample is between 0 and 2/N apart.
    Parameters
    ----------
    weights : list-like of float
        list of weights as floats
    Returns
    -------
    indexes : ndarray of ints
        array of indexes into the weights defining the resample. i.e. the
        index of the zeroth resample is indexes[0], etc.
    """
    N = len(weights)
    # make N subdivisions, and chose a random position within each one
    positions = (random(N) + range(N)) / N
    indexes = np.zeros(N, 'i')
    cumulative_sum = np.cumsum(weights)
    i, j = 0, 0
    while i < N:
        if positions[i] < cumulative_sum[j]:
            indexes[i] = j
            i += 1
        else:
            j += 1
    return indexes

6.4 multinomial resampling

def multinomial_resample(weights):
    """ This is the naive form of roulette sampling where we compute the
    cumulative sum of the weights and then use binary search to select the
    resampled point based on a uniformly distributed random number. Run time
    is O(n log n). You do not want to use this algorithm in practice; for some
    reason it is popular in blogs and online courses so I included it for
    reference.
    
   Parameters
   ----------
    weights : list-like of float
        list of weights as floats
    Returns
    -------
    indexes : ndarray of ints
        array of indexes into the weights defining the resample. i.e. the
        index of the zeroth resample is indexes[0], etc.
    """
    cumulative_sum = np.cumsum(weights)
    cumulative_sum[-1] = 1.  # avoid round-off errors: ensures sum is exactly one
    return np.searchsorted(cumulative_sum, random(len(weights)))


6.5 comparison of these resampling methods

To test the performance of these methods, we set the weight list is a, and draw the figures of resampling using different methods. 
a = [.1, .2, .3, .4, .2, .3, .1]
np.random.seed(4)
plot_multinomial_resample(a)
plot_residual_resample(a)
plot_systematic_resample(a)
plot_stratified_resample(a)
Then I give the sampling performance of them:
Figure 13. comparison of four resampling methods
The performance of the multinomial resampling is quite bad. There is a very large weight that was not sampled at all. The largest weight only got one resample, yet the smallest weight was sample was sampled twice. Most tutorials on the net that I have read use multinomial resampling, and I am not sure why. Multinomial resampling is rarely used in the literature or for real problems. I recommend not using it unless you have a very good reason to do so.

The residual resampling algorithm does excellently at what it tries to do: ensure all the largest weights are resampled multiple times. It doesn't evenly distribute the samples across the particles - many reasonably large weights are not resampled at all.

Both systematic and stratified perform very well. Systematic sampling does an excellent job of ensuring we sample from all parts of the particle space while ensuring larger weights are proportionality resampled more often. Stratified resampling is not quite as uniform as systematic resampling, but it is a bit better at ensuring the higher weights get resampled more. In practice, both the stratified and systematic algorithms perform well and similarly across a variety of problems. 

7. Solution 2: choosing the importance density function

The choice of the importance probability density function is very important to the performance of the particle filter. The standard to choose a importance sampling function is to make the variance of particles minimum.

In applications, we usually choose the transition probability function of the state as the proposal (importance) probability, p(xt|xt-1). The weight is in the form:
Figure 14. take state transition probability as the proposal probability
The state transition probability has a simple form and is easy to implement. In situations with not high-accuracy measurement, it can achieve a good filtering performance. 

However, the state transition probability doesn't consider the latest measurement leading to a gap between samples and real posterior distribution. When the measurement model has high accuracy, this difference will be notable.

8. Practice

To let you be familiar with the particle filter. I found an example from filterpy.
filterpy is a open package which includes many kinds of filters, e.g. Bayes filter, particle filter, Kalman filter, extended Kalman filter and UKF.
from filterpy.monte_carlo import systematic_resample
from numpy.linalg import norm
from numpy.random import randn
import scipy.stats
import matplotlib.pyplot as plt
from numpy.random import uniform

def create_uniform_particles(x_range, y_range, hdg_range, N):
    particles = np.empty((N, 3))
    particles[:, 0] = uniform(x_range[0], x_range[1], size=N)
    particles[:, 1] = uniform(y_range[0], y_range[1], size=N)
    particles[:, 2] = uniform(hdg_range[0], hdg_range[1], size=N)
    particles[:, 2] %= 2 * np.pi
    return particles

def create_gaussian_particles(mean, std, N):
    particles = np.empty((N, 3))
    particles[:, 0] = mean[0] + (randn(N) * std[0])
    particles[:, 1] = mean[1] + (randn(N) * std[1])
    particles[:, 2] = mean[2] + (randn(N) * std[2])
    particles[:, 2] %= 2 * np.pi
    return particles

def predict(particles, u, std, dt=1.):
    """ move according to control input u (heading change, velocity)
    with noise Q (std heading change, std velocity)`"""

    N = len(particles)
    # update heading
    particles[:, 2] += u[0] + (randn(N) * std[0])
    particles[:, 2] %= 2 * np.pi

    # move in the (noisy) commanded direction
    dist = (u[1] * dt) + (randn(N) * std[1])
    particles[:, 0] += np.cos(particles[:, 2]) * dist
    particles[:, 1] += np.sin(particles[:, 2]) * dist

def estimate(particles, weights):
    """returns mean and variance of the weighted particles"""

    pos = particles[:, 0:2]
    mean = np.average(pos, weights=weights, axis=0)
    var  = np.average((pos - mean)**2, weights=weights, axis=0)
    return mean, var

def update(particles, weights, z, R, landmarks):
    for i, landmark in enumerate(landmarks):
        distance = np.linalg.norm(particles[:, 0:2] - landmark, axis=1)
        weights *= scipy.stats.norm(distance, R).pdf(z[i])

    weights += 1.e-300      # avoid round-off to zero
    weights /= sum(weights) # normalize

def neff(weights):
    return 1. / np.sum(np.square(weights))

def resample_from_index(particles, weights, indexes):
    particles[:] = particles[indexes]
    weights[:] = weights[indexes]
    weights.fill(1.0 / len(weights))

    
def run_pf1(N, iters=18, sensor_std_err=.1, 
            do_plot=True, plot_particles=False,
            xlim=(0, 20), ylim=(0, 20),
            initial_x=None):
    landmarks = np.array([[-1, 2], [5, 10], [12,14], [18,21]])
    NL = len(landmarks)
    
    plt.figure()
   
    # create particles and weights
    if initial_x is not None:
        particles = create_gaussian_particles(
            mean=initial_x, std=(5, 5, np.pi/4), N=N)
    else:
        particles = create_uniform_particles((0,20), (0,20), (0, 6.28), N)
    weights = np.ones(N) / N

    if plot_particles:
        alpha = .20
        if N > 5000:
            alpha *= np.sqrt(5000)/np.sqrt(N)           
        plt.scatter(particles[:, 0], particles[:, 1], 
                    alpha=alpha, color='g')
    
    xs = []
    robot_pos = np.array([0., 0.])
    for x in range(iters):
        robot_pos += (1, 1)

        # distance from robot to each landmark
        zs = (norm(landmarks - robot_pos, axis=1) + 
              (randn(NL) * sensor_std_err))

        # move diagonally forward to (x+1, x+1)
        predict(particles, u=(0.00, 1.414), std=(.2, .05))
        
        # incorporate measurements
        update(particles, weights, z=zs, R=sensor_std_err, 
               landmarks=landmarks)
        
        # resample if too few effective particles
        if neff(weights) < N/2:
            indexes = systematic_resample(weights)
            resample_from_index(particles, weights, indexes)
            assert np.allclose(weights, 1/N)
        mu, var = estimate(particles, weights)
        xs.append(mu)

        if plot_particles:
            plt.scatter(particles[:, 0], particles[:, 1], 
                        color='k', marker=',', s=1)
        p1 = plt.scatter(robot_pos[0], robot_pos[1], marker='+',
                         color='k', s=180, lw=3)
        p2 = plt.scatter(mu[0], mu[1], marker='s', color='r')
    
    xs = np.array(xs)
    #plt.plot(xs[:, 0], xs[:, 1])
    plt.legend([p1, p2], ['Actual', 'PF'], loc=4, numpoints=1)
    plt.xlim(*xlim)
    plt.ylim(*ylim)
    print('final position error, variance:\n\t', mu - np.array([iters, iters]), var)
    plt.show()

from numpy.random import seed
seed(2) 
run_pf1(N=5000, plot_particles=False)

The result is:
Figure 15. the result after particle filter 

No comments:

Post a Comment

[Research] Recurrent Neural Network (RNN)

1. Introduction As we all know, forward neural networks (FNN) have no connection between neurons in the same layer or in cross layers. Th...