You Raise Me Up: [Research] Particle Filter

1. Introduction

A stochastic model is a tool for estimating probability distributions of potential outcomes by allowing for random variation in one or more inputs over time. Stochastic models include the Gauss distribution based stochastic model and non-Gauss distribution based stochastic model. According to the stochastic model, Filters can be divided into two classes: 1) Bayes filter, particle filter with non-Gauss distribution based stochastic model; 2) Kalman filter, extended Kalman filter and Unscented Kalman filter with Gauss distribution based stochastic model.

Figure 1: classes of filters

Actually, when dealing with uncertain situations, we usually set the Gauss model as its stochastic model. However, that is not suitable for any situations. Thus, Bayes filter and particle filter are designed for cases using non-Gauss model. In the last post, I introduced Bayes filter, in this post, I will focus on the particle filter. In the beginning, I will give you an overview of particle filters. And then details will be given.

Figure 2. Bayes filter framework

Figure 3. Overview of the particle filter

2. Inference Model

Particle filter and Bayes filter share the same inference model. Because in the Bayes filter, the posterior probability is hard to compute. We usually use Monte Carlo sampling method to approximate this posterior probability. That is the main idea of the particle filter. Instead of directly computing the posterior probability, the particle filter is to generate many random variables (called particles) to approaching this probability to conduct filtering.

From figure 2, the framework of Bayes filter, we know in a dynamic system (u means the activation to keep the system dynamic), x means the state variables and z the measurement variables. At each time epoch, the last state can be transited into the next state using Bayes Theorem based on existing information, which is the so-called prediction. When there is a measurement at current epoch, it will be fused to compute the posterior, which is the so-called update.

However, there are some conditions for Bayes inference. 1) the system will be dynamic which is already met. 2) The transition of the system states should be followed first-order Markov chain. It means the current state is only determined by the last state. For example, p(xt|x1,x2,...,xt-1) = p(xt|xt-1). 3) variables of measurements are independent to each other. p(zt|z1,z2,...,zt-1) = p(zt).

Bayes filter uses a recursive form to approach the optimal solution of the posterior probability density function (filtering). The optimal estimation of the system state is implemented by using the posterior probability density function, generally like, Maximum a posterior estimate (MAP) and minimum mean square error (MMSE).

Figure 4. Estimation models

However, as for nonlinear and non-Guass system, it is hard to get the complete solution of the posterior probability. Thus, we usually use approximation methods to approach integration. the particle filter is a method to use the Monte Carlo sampling method to approximate this posterior probability.

3. Importance Sampling (IS)

In section 2, we talked about the principle of the particle filter is to use the Monte Carlo method to generate particles to approximate the posterior probability. The process is: assume we can extract a few samples meeting p(xt|yt) and they are independent and identically distributed, x_t^(i), i = 1,...,N, then the expectation estimation of an arbitrary function f(xt) can be approached by using the form of sum.

Figure 5. sum to approaching integration

Then we can sum up three steps of Monte Carlo sampling:

Build a probability model
Generate Sampling from the appointed probability model
Compute some estimators of these samples

Then there is a question, generally, we don't know the information about this posterior probability, how can we generate particles to approximate it? That is what we call importance sampling.

The process of importance sampling is to arbitrarily import a known and easy-to-sample importance probability density function q(xt|z(1:t)). Then we generate N samples from this probability and use their weighted sum to approximate the posterior probability. Then we have

Figure 6. the Importance Sampling

That is the main idea of the Importance Sampling. However, here are two problems.

How to distribute the weights?
At each epoch, we need to use all measurements to generate new samples and weights. It is very low-efficiency.

Then a sequential importance sampling method is proposed to solve these problems.

4. Sequential Importance Sampling (SIS)

Sequential important sampling is the basis of the particle filter, which is based on the sequential analysis in statistics. It is applied to Monte Carlo to estimate the posterior probability recursively. The process is:

Assume the importance probability density function is q(x0:t|z1:t), then we have:

Figure 7. the sequential Importance Sampling probability density function

According to first-order Markov chain, current state is only determined by last one, and each measurement is independent from each other.

Figure 8. Applied Markov feature

So then we need to get the sequential posterior probability density function.

Figure 9. the sequential posterior probability density function.

And then we get the sequential weights:

Figure 10. the sequential weights.

From figure 7, 9, 10, we get the sequential model of importance sampling function, posterior model function and weights. Among them, the importance sampling function is user-defined, the posterior model can be computed by known probability, and so does the weights. With all variables known, we just need to intialize the weights and importance function. Particle filter will be implemented recursively.

Based on aforesaid content, i give the process of the particle filter

Figure 11. particle filter process.

5. Weight Degeneracy

To get the correct state estimation, we hope the variance of weights is close to zero. However, sequential Monte Carlo faces the problem of weight degeneracy, which makes the variance of weights greater. And the meaning of weight degeneracy is after a few iterations, only a small number of them have greater weights and the others' weights are approximately zero. It leads to a decrease in efficient particles. The performance of particle filter degrades.

Generally, we use the number of efficient particles, Neff, to evaluate the degeneracy degree. The smaller of Neff, the severer of weight degeneracy.

Figure 12. the evaluation of weight degeneracy.

We set a threshold Nthresh, if the Neff is greater than Nthresh, there means weight degeneracy. We need to launch resampling to mitigate it.

Facing this weight degeneracy, there are two methods to mitigate it. 1) resampling; 2) chose more suitable importance density function.

6. Solution 1: Resampling

To solve the weight degeneracy problem, many kinds of resampling methods are used for it. 1) systematic resampling; 2) residual resampling; 3) stratified resampling; 4) multinomial resampling. There is a paper making a comparison of those resampling schemes and displaying the details of this method. Here I explain the process of them use code. The main rule of resampling is to generate the sub indexes and use it to chose particles.

6.1 systematic resampling

def systematic_resample(weights):
""" Performs the systemic resampling algorithm used by particle filters.
This algorithm separates the sample space into N divisions. A single random
offset is used to to choose where to sample from for all divisions. This
guarantees that every sample is exactly 1/N apart.
Parameters
----------
weights : list-like of float
list of weights as floats
Returns
-------
indexes : ndarray of ints
array of indexes into the weights defining the resample. i.e. the
index of the zeroth resample is indexes[0], etc.
"""
N = len(weights)
# make N subdivisions, and choose positions with a consistent random offset
positions = (random() + np.arange(N)) / N
indexes = np.zeros(N, 'i')
cumulative_sum = np.cumsum(weights)
i, j = 0, 0
while i < N:
if positions[i] < cumulative_sum[j]:
indexes[i] = j
i += 1
else:
j += 1
return indexes

6.2 residual resampling

def residual_resample(weights):

""" Performs the residual resampling algorithm used by particle filters.

Based on observation that we don't need to use random numbers to select

most of the weights. Take int(N*w^i) samples of each particle i, and then

resample any remaining using a standard resampling algorithm [1]

Parameters

----------

weights : list-like of float

list of weights as floats

Returns

-------

indexes : ndarray of ints

array of indexes into the weights defining the resample. i.e. the

index of the zeroth resample is indexes[0], etc.

References

----------

.. [1] J. S. Liu and R. Chen. Sequential Monte Carlo methods for dynamic

systems. Journal of the American Statistical Association,

93(443):1032–1044, 1998.

"""

N = len(weights)

indexes = np.zeros(N, 'i')

# take int(N*w) copies of each weight, which ensures particles with the

# same weight are drawn uniformly

num_copies = (np.floor(N*np.asarray(weights))).astype(int)

k = 0

for i in range(N):

for _ in range(num_copies[i]): # make n copies

indexes[k] = i

k += 1

# use multinormal resample on the residual to fill up the rest. This

# maximizes the variance of the samples

residual = weights - num_copies # get fractional part

residual /= sum(residual) # normalize

cumulative_sum = np.cumsum(residual)

cumulative_sum[-1] = 1. # avoid round-off errors: ensures sum is exactly one

indexes[k:N] = np.searchsorted(cumulative_sum, random(N-k))

return indexesdef residual_resample(weights):

N = len(weights)

indexes = np.zeros(N, 'i')

num_copies = (np.floor(N*np.asarray(weights))).astype(int)

k = 0

for i in range(N):

for _ in range(num_copies[i]): # make n copies

indexes[k] = i

k += 1

# use multinormal resample on the residual to fill up the rest. This

# maximizes the variance of the samples

residual = weights - num_copies # get fractional part

residual /= sum(residual) # normalize

cumulative_sum = np.cumsum(residual)

cumulative_sum[-1] = 1. # avoid round-off errors: ensures sum is exactly one

indexes[k:N] = np.searchsorted(cumulative_sum, random(N-k))

return indexes

6.3 stratified resampling

def stratified_resample(weights):
""" Performs the stratified resampling algorithm used by particle filters.
This algorithms aims to make selections relatively uniformly across the
particles. It divides the cumulative sum of the weights into N equal
divisions, and then selects one particle randomly from each division. This
guarantees that each sample is between 0 and 2/N apart.
Parameters
----------
weights : list-like of float
list of weights as floats
Returns
-------
indexes : ndarray of ints
array of indexes into the weights defining the resample. i.e. the
index of the zeroth resample is indexes[0], etc.
"""
N = len(weights)
# make N subdivisions, and chose a random position within each one
positions = (random(N) + range(N)) / N
indexes = np.zeros(N, 'i')
cumulative_sum = np.cumsum(weights)
i, j = 0, 0
while i < N:
if positions[i] < cumulative_sum[j]:
indexes[i] = j
i += 1
else:
j += 1
return indexes

6.4 multinomial resampling

def multinomial_resample(weights):
""" This is the naive form of roulette sampling where we compute the
cumulative sum of the weights and then use binary search to select the
resampled point based on a uniformly distributed random number. Run time
is O(n log n). You do not want to use this algorithm in practice; for some
reason it is popular in blogs and online courses so I included it for
reference.

Parameters
----------
weights : list-like of float
list of weights as floats
Returns
-------
indexes : ndarray of ints
array of indexes into the weights defining the resample. i.e. the
index of the zeroth resample is indexes[0], etc.
"""
cumulative_sum = np.cumsum(weights)
cumulative_sum[-1] = 1. # avoid round-off errors: ensures sum is exactly one
return np.searchsorted(cumulative_sum, random(len(weights)))

6.5 comparison of these resampling methods

To test the performance of these methods, we set the weight list is a, and draw the figures of resampling using different methods.

a = [.1, .2, .3, .4, .2, .3, .1]

np.random.seed(4)

plot_multinomial_resample(a)

plot_residual_resample(a)

plot_systematic_resample(a)

plot_stratified_resample(a)

Then I give the sampling performance of them:

Figure 13. comparison of four resampling methods

The performance of the multinomial resampling is quite bad. There is a very large weight that was not sampled at all. The largest weight only got one resample, yet the smallest weight was sample was sampled twice. Most tutorials on the net that I have read use multinomial resampling, and I am not sure why. Multinomial resampling is rarely used in the literature or for real problems. I recommend not using it unless you have a very good reason to do so.

The residual resampling algorithm does excellently at what it tries to do: ensure all the largest weights are resampled multiple times. It doesn't evenly distribute the samples across the particles - many reasonably large weights are not resampled at all.

Both systematic and stratified perform very well. Systematic sampling does an excellent job of ensuring we sample from all parts of the particle space while ensuring larger weights are proportionality resampled more often. Stratified resampling is not quite as uniform as systematic resampling, but it is a bit better at ensuring the higher weights get resampled more. In practice, both the stratified and systematic algorithms perform well and similarly across a variety of problems.

7. Solution 2: choosing the importance density function

The choice of the importance probability density function is very important to the performance of the particle filter. The standard to choose a importance sampling function is to make the variance of particles minimum.

In applications, we usually choose the transition probability function of the state as the proposal (importance) probability, p(xt|xt-1). The weight is in the form:

Figure 14. take state transition probability as the proposal probability

The state transition probability has a simple form and is easy to implement. In situations with not high-accuracy measurement, it can achieve a good filtering performance.

However, the state transition probability doesn't consider the latest measurement leading to a gap between samples and real posterior distribution. When the measurement model has high accuracy, this difference will be notable.

8. Practice

To let you be familiar with the particle filter. I found an example from filterpy.
filterpy is a open package which includes many kinds of filters, e.g. Bayes filter, particle filter, Kalman filter, extended Kalman filter and UKF.
from filterpy.monte_carlo import systematic_resample
from numpy.linalg import norm
from numpy.random import randn
import scipy.stats
import matplotlib.pyplot as plt
from numpy.random import uniform

def create_uniform_particles(x_range, y_range, hdg_range, N):
particles = np.empty((N, 3))
particles[:, 0] = uniform(x_range[0], x_range[1], size=N)
particles[:, 1] = uniform(y_range[0], y_range[1], size=N)
particles[:, 2] = uniform(hdg_range[0], hdg_range[1], size=N)
particles[:, 2] %= 2 * np.pi
return particles

def create_gaussian_particles(mean, std, N):
particles = np.empty((N, 3))
particles[:, 0] = mean[0] + (randn(N) * std[0])
particles[:, 1] = mean[1] + (randn(N) * std[1])
particles[:, 2] = mean[2] + (randn(N) * std[2])
particles[:, 2] %= 2 * np.pi
return particles

def predict(particles, u, std, dt=1.):
""" move according to control input u (heading change, velocity)
with noise Q (std heading change, std velocity)`"""

N = len(particles)
# update heading
particles[:, 2] += u[0] + (randn(N) * std[0])
particles[:, 2] %= 2 * np.pi

# move in the (noisy) commanded direction
dist = (u[1] * dt) + (randn(N) * std[1])
particles[:, 0] += np.cos(particles[:, 2]) * dist
particles[:, 1] += np.sin(particles[:, 2]) * dist

def estimate(particles, weights):
"""returns mean and variance of the weighted particles"""

pos = particles[:, 0:2]
mean = np.average(pos, weights=weights, axis=0)
var = np.average((pos - mean)**2, weights=weights, axis=0)
return mean, var

def update(particles, weights, z, R, landmarks):
for i, landmark in enumerate(landmarks):
distance = np.linalg.norm(particles[:, 0:2] - landmark, axis=1)
weights *= scipy.stats.norm(distance, R).pdf(z[i])

weights += 1.e-300 # avoid round-off to zero
weights /= sum(weights) # normalize

def neff(weights):
return 1. / np.sum(np.square(weights))

def resample_from_index(particles, weights, indexes):
particles[:] = particles[indexes]
weights[:] = weights[indexes]
weights.fill(1.0 / len(weights))


def run_pf1(N, iters=18, sensor_std_err=.1,
do_plot=True, plot_particles=False,
xlim=(0, 20), ylim=(0, 20),
initial_x=None):
landmarks = np.array([[-1, 2], [5, 10], [12,14], [18,21]])
NL = len(landmarks)

plt.figure()

# create particles and weights
if initial_x is not None:
particles = create_gaussian_particles(
mean=initial_x, std=(5, 5, np.pi/4), N=N)
else:
particles = create_uniform_particles((0,20), (0,20), (0, 6.28), N)
weights = np.ones(N) / N

if plot_particles:
alpha = .20
if N > 5000:
alpha *= np.sqrt(5000)/np.sqrt(N)
plt.scatter(particles[:, 0], particles[:, 1],
alpha=alpha, color='g')

xs = []
robot_pos = np.array([0., 0.])
for x in range(iters):
robot_pos += (1, 1)

# distance from robot to each landmark
zs = (norm(landmarks - robot_pos, axis=1) +
(randn(NL) * sensor_std_err))

# move diagonally forward to (x+1, x+1)
predict(particles, u=(0.00, 1.414), std=(.2, .05))

# incorporate measurements
update(particles, weights, z=zs, R=sensor_std_err,
landmarks=landmarks)

# resample if too few effective particles
if neff(weights) < N/2:
indexes = systematic_resample(weights)
resample_from_index(particles, weights, indexes)
assert np.allclose(weights, 1/N)
mu, var = estimate(particles, weights)
xs.append(mu)

if plot_particles:
plt.scatter(particles[:, 0], particles[:, 1],
color='k', marker=',', s=1)
p1 = plt.scatter(robot_pos[0], robot_pos[1], marker='+',
color='k', s=180, lw=3)
p2 = plt.scatter(mu[0], mu[1], marker='s', color='r')

xs = np.array(xs)
#plt.plot(xs[:, 0], xs[:, 1])
plt.legend([p1, p2], ['Actual', 'PF'], loc=4, numpoints=1)
plt.xlim(*xlim)
plt.ylim(*ylim)
print('final position error, variance:\n\t', mu - np.array([iters, iters]), var)
plt.show()

from numpy.random import seed
seed(2)
run_pf1(N=5000, plot_particles=False)

The result is:

Figure 15. the result after particle filter

Reference

https://nbviewer.jupyter.org/github/rlabbe/Kalman-and-Bayesian-Filters-in-Python/blob/master/12-Particle-Filters.ipynb
https://wenku.baidu.com/view/88896d2b453610661ed9f4b4.html
https://blog.csdn.net/heyijia0327/article/details/40899819
https://www.bilibili.com/video/av32636259/?p=2
https://wenku.baidu.com/view/150db14ac850ad02de8041a9.html

You Raise Me Up

Wednesday, 13 March 2019

[Research] Particle Filter