Automatic Tracking of Player Locations from Video Image of Football Game

where ${I}_{\mathrm{ijk}}\left(i={c}_x-{\delta}_x,\cdots, {c}_x+{\delta}_x,j={c}_y-{\delta}_y,\cdots, {c}_y+{\delta}_y\right)$ is RGB $\left(k=1,2,3\right)$ values of the pixel (i, j) of the image, c _x, c _yand δ _x, δ _yare the center coordinates of the rectangular tracking area and its sizes, and B _ijis the pixel value of the binary image. T _skand T _tkare RGB values of the shirt and trunks of the uniform and Δ is the threshold value in the image binarization. h is distance in pixel from the shirt of the player to the trunks. The RGB values of the uniform are specified manually to the player of each team at the beginning of the tracking process. This image binarization is an extension of conventional binarization of the gray scale image. Figure 28.1 shows examples of the original color image and the binary image.

Fig. 28.1

Example of image binarization procedure. Left is the original color image and right is the binary image

The video coordinates of the player location is determined by calculating the centroid location to the binary image as

$\begin{array}{l}{x}_g={\displaystyle \sum_{i,j}i}{B}_{ij}/{\displaystyle \sum_{ij}{B}_{ij}}\\ {}{y}_g={\displaystyle \sum_{ij}j{B}_{ij}}/{\displaystyle \sum_{ij}{B}_{ij}}\end{array}$

Figure 28.2 shows an example of the detected player location. It should be noted here that the detected position is located around the breast of the player but not the foot position. The conversion from the breast position to the foot position is described in the next session.

Fig. 28.2

Player location obtained by centroid computation of the binary image

The location of the tracking area for each player is specified manually at the beginning of the tracking, and then is updated according to the detected player location as

$\begin{array}{l}{c}_x(t)={c}_x\left(t-1\right)+k\left({x}_g(t)-{c}_x\left(t-1\right)\right)\\ {}{c}_y(t)={c}_y\left(t-1\right)+k\left({y}_g(t)-{c}_y\left(t-1\right)\right)\end{array}$

where k is a updating coefficient and set at 0.8 experimentally. Figure 28.3 shows the sequence of tracking areas in the tracking process.

Fig. 28.3

Adaptation of the location of the tracking area

28.2.2 Adaptation of Parameters

The size of the tracking area has an important influence on the tracking performance. If the size is enough large, the player mostly exists in the area during the tracking process. However, in situations where multiple players are inside the area, the chance for tracking errors increases, namely alternation of the trucked player with another player. Contrarily, if the size is too small, the probability for the player to be outside the area when the player moves at a high speed, is increased. This leads to follow-up errors in the tracking process. In order to reduce these errors, it is necessary to change the size of the tracking area during the tracking process. The size is changed according to the number of white pixels in the binary image. If the number is small, the player location is possibly outside the tracking area. Contrarily, if the number is sufficiently large, the tracking area should be reduced to prevent other players from being inside the area. The tracking area sizes δ _x, δ _yare initially set at their minimum values δ _x/y min. If the number of white pixels of the binary image at a video frame is less than the threshold value (typically 1), the size used for the next frame is expanded by multiplying by a constant value larger than 1 within a range not exceeding the maximum value δ _x/y max. If the number exceeds the threshold value at a video frame, the size at the next frame is reduced by multiplying by a constant value less than 1 within a range not exceeding the minimum value δ _x/y min. The adaptation procedure of the tracking area size is shown in Fig. 28.4.

Fig. 28.4

Adaptation of the tracking area size

The tracking procedure is performed for every player. A tracking error often occurs when two players of the same team are close each other and enter inside the tracking area. The error involves an alternation of the location of the target player with that of the other player. In order to reduce such alternation errors, when the tracking area for a player overlaps those of other players for which the tracking is completed, the overlapped area is eliminated from the tracking area for the current player as shown in Fig. 28.5.

Fig. 28.5

Elimination of the overlapped tracking area

RGB values of the uniform color are dependent on the location of the field pitch because the lighting conditions are slightly different as shown in Fig. 28.6. This figure shows the variances of the RGB values of the uniform color during the tracking process. Therefore, the color threshold Δ used in the image binarization is adapted within the range of Δ_min to Δ_max according to the number of white pixels in the binary image in a similar way to the adaptation procedure for the tracking area size.

Fig. 28.6

Temporal changes of the RGB values of the uniform color

28.2.3 Transformation from Video Location to Field Location

The field coordinates of player location are obtained from the video coordinates by using the two-dimensional DLT method. The detected video coordinates are located around the breast of the player. To determine the correct field coordinates by using the DLT method, it must know the video coordinates of the foot position. Thus, the foot position is estimated from the breast position by modifying the video coordinates of Y-axis as ${y}_g-{d}_{bf}\left({x}_g,{y}_g\right)$ . Here, d _bf(x _g, y _g) is the distance from the breast position to the foot position in the video coordinates. This distance is obtained by converting the average distance in the field coordinates into that in the video coordinates by using an inverse DLT method which involves mapping the distance in field coordinates into those video coordinates as a function of the video coordinates (x _g, y _g).

The field coordinates of player location are obtained from the video coordinates of the player’s foot location by using the 2-D DLT method (Abdel-Aziz and Karara 1971). The relationship between the video coordinates (x _g, y _g) and the field coordinates (x _f, y _f) is shown as

$\begin{array}{l}{x}_g=\frac{L_1{x}_f+{L}_2{y}_f+{L}_3}{L_7{x}_f+{L}_8{y}_f+1},\\ {}{y}_g=\frac{L_4{x}_f+L5{y}_f+{L}_6}{L_7{x}_f+{L}_8{y}_f+1},\end{array}$

where ${L}_i\ \left(i=1,2,\cdots, 8\right)$ are DLT parameters. Then, the field coordinates are obtained by solving the linear equations as

$\begin{array}{l}\left({L}_1-{x}_g{L}_7\right){x}_f+\left({L}_2-{x}_g{L}_8\right){y}_f={x}_g-{L}_3\\ {}\left({L}_4-{y}_g{L}_7\right){x}_f+\left({L}_5-{y}_g{L}_8\right){y}_f={y}_g-{L}_6.\end{array}$