The 1st assumption of Lucas Kanade is the brightness assumption, which assumes that the displaced pixel remains at the same brightness level. With u and v are the displacements of the pixel at $(x,y)$, the 1st assumption gives rise to
$$0 = I(x+u,y+v,t+1) - I(x,y,t) (1)$$To estimate the amount of displacement, we can analyze the behavior of I(t) and I(t+1) at the vicinity of (x,y). The simplest way to do this is to exhautively search for the values of u and v that satisfy equation (1). This way is computationally expensive. A better method is to use the linear Taylor approximation of I(t+1) at (x,y). The 2nd assumption of Lucas Kanade is the linearity of I(t+1) at (x,y), which gives rise to
$$0 = I(x,y,t+1) + I_x(x,y,t+1) u + I_y(x,y,t+1) v - I(x,y,t)$$or
$$0 = I_t(t) + I_x(t+1)u + I_y(t+1)v$$Now at every pixel, we have to solve for both u and v with only one equation. This situation only happens in the 2D case and is usually referred to as the Aperture Problem. To address this problem, we can make the 3rd assumption that all pixels in the confine of a window are to translate by the same amount of u an v. The consequence of this assumption is that we will have the same number of equations as number of pixels in that window, which is more than enough to solve for u and v. The 3rd assumption gives rise to
$$\bigg[ \begin{matrix} \sum I_x^2 & \sum I_x I_y\\ \sum I_x I_y & \sum I_y^2 \end{matrix} \Bigg] \bigg[ \begin{matrix} u \\ v \end{matrix} \bigg] = -\bigg[ \begin{matrix} \sum I_x I_t \\ \sum I_y I_t \end{matrix} \bigg]$$An implementation of Lucas Kanade¶
import numpy as np
from matplotlib import pyplot as plt
import cv2
def reduce(image, level = 1):
result = np.copy(image)
for _ in range(level-1):
result = cv2.pyrDown(result)
return result
def expand(image, level = 1):
return cv2.pyrUp(np.copy(image))
def compute_flow_map(u, v, gran = 8):
flow_map = np.zeros(u.shape)
for y in range(flow_map.shape[0]):
for x in range(flow_map.shape[1]):
if y%gran == 0 and x%gran == 0:
dx = 10*int(u[y,x])
dy = 10*int(v[y,x])
if dx > 0 or dy > 0:
cv2.arrowedLine(flow_map, (x,y), (x+dx,y+dy), 255, 1)
return flow_map
def lucas_kanade_np(im1, im2, win=7):
I_x = np.zeros(im1.shape)
I_y = np.zeros(im1.shape)
I_t = np.zeros(im1.shape)
I_x[1:-1, 1:-1] = (im1[1:-1, 2:] - im1[1:-1, :-2])/2
I_y[1:-1, 1:-1] = (im1[2:, 1:-1] - im1[:-2, 1:-1])/2
I_t[1:-1, 1:-1] = im1[1:-1, 1:-1] - im2[1:-1, 1:-1]
params = np.zeros(im1.shape + (5,))
params[..., 0] = cv2.GaussianBlur(I_x * I_x, (5,5), 3)
params[..., 1] = cv2.GaussianBlur(I_y * I_y, (5,5), 3)
params[..., 2] = cv2.GaussianBlur(I_x * I_y, (5,5), 3)
params[..., 3] = cv2.GaussianBlur(I_x * I_t, (5,5), 3)
params[..., 4] = cv2.GaussianBlur(I_y * I_t, (5,5), 3)
cum_params = np.cumsum(np.cumsum(params, axis=0), axis=1)
win_params = (cum_params[2 * win + 1:, 2 * win + 1:] -
cum_params[2 * win + 1:, :-1 - 2 * win] -
cum_params[:-1 - 2 * win, 2 * win + 1:] +
cum_params[:-1 - 2 * win, :-1 - 2 * win])
u = np.zeros(im1.shape)
v = np.zeros(im1.shape)
I_xx = win_params[..., 0]
I_yy = win_params[..., 1]
I_xy = win_params[..., 2]
I_xt = -win_params[..., 3]
I_yt = -win_params[..., 4]
M_det = I_xx*I_yy - I_xy**2
temp_u = I_yy* (-I_xt) + (-I_xy)*(-I_yt)
temp_v = (-I_xy)*(-I_xt) + I_xx*(-I_yt)
op_flow_x = np.where(M_det != 0, temp_u/M_det, 0)
op_flow_y = np.where(M_det != 0, temp_v/M_det, 0)
u[win + 1: -1 - win, win + 1: -1 - win] = op_flow_x[:-1, :-1]
v[win + 1: -1 - win, win + 1: -1 - win] = op_flow_y[:-1, :-1]
return u, v
plt.figure(figsize = (15, 8))
image_l = cv2.imread('/home/andy/Desktop/computer_vision/ps5/input/TestSeq/Shift0.png')
image_r = cv2.imread('/home/andy/Desktop/computer_vision/ps5/input/TestSeq/ShiftR2.png')
image_l = cv2.cvtColor(image_l, cv2.COLOR_RGB2GRAY)
image_r = cv2.cvtColor(image_r, cv2.COLOR_RGB2GRAY)
u, v = lucas_kanade_np(image_l, image_r)
flow_map = compute_flow_map(u,v,8)
plt.subplot(131), plt.imshow(image_l, cmap='gray')
plt.subplot(132), plt.imshow(flow_map, cmap='gray')
plt.subplot(133), plt.imshow(image_r, cmap='gray')
plt.show()
Resolution Pyramid and Iterative Estimation¶
When the displacement is large, the 2nd assumption becomes not quite true. The result is that the solved u and v are not a good estimate of the actual shift. The trick to address this issue is to iterate the process of solving for u and v through alternating steps of solving and warping. This process stops when we can obtain a good enough estimate of the shift which can almost warp to construct the second image from the first image using the estimated shift.
When the displacement is too large, the local analysis is not even true. We need to use another trick to fix the Lucas Kanade algorithm. We notice that when the original images are downsampled, a large movement in a high resolution image may correspond to the displacement of only a few pixels in the lower resolution images. So the trick to fix this issue of too large displacement is to run the plain Lucas Kanade on a pyramid of images of different scales. The basic principle here is that it is easier to precisely construct the optical flows with lower resolution images.
def warp(image, u, v):
map_y, map_x = np.mgrid[0:image.shape[0], 0:image.shape[1]]
map_x = map_x.astype('float32')
map_y = map_y.astype('float32')
map_x -= u
map_y -= v
return cv2.remap(image, map_x, map_y, cv2.INTER_NEAREST)
plt.figure(figsize = (20, 30))
image_l = cv2.imread('/home/andy/Desktop/computer_vision/ps5/input/TestSeq/Shift0.png')
image_r = cv2.imread('/home/andy/Desktop/computer_vision/ps5/input/TestSeq/ShiftR10.png')
image_l = cv2.cvtColor(image_l, cv2.COLOR_RGB2GRAY)
image_r = cv2.cvtColor(image_r, cv2.COLOR_RGB2GRAY)
max_lev = 4
cur_lel = max_lev
while cur_lel > 0:
l_k = reduce(image_l, cur_lel)
r_k = reduce(image_r, cur_lel)
if cur_lel == max_lev:
u = np.zeros(l_k.shape)
v = np.zeros(l_k.shape)
else:
u = 2*expand(u)
v = 2*expand(v)
w_k = warp(l_k, u, v)
dx, dy = lucas_kanade_np(l_k, r_k)
u = u + np.uint8(dx)
v = v + np.uint8(dy)
flow_map = compute_flow_map(u,v,8)
plt.subplot(max_lev+1,3,3*(max_lev-cur_lel)+1), plt.imshow(l_k, cmap='gray')
plt.subplot(max_lev+1,3,3*(max_lev-cur_lel)+2), plt.imshow(flow_map, cmap='gray')
plt.subplot(max_lev+1,3,3*(max_lev-cur_lel)+3), plt.imshow(r_k, cmap='gray')
cur_lel -= 1
plt.show()