Real-time perspective correction in video stream

The paper describes an algorithm used for software perspective correction. The algorithm uses the camera’s orientation angles and transforms the coordinates of pixels on a source image to coordinates on a virtual image form the camera whose focal plane is perpendicular to the gravity vector. This algorithm can be used as a low-cost replacement of a gyrostabilazer in specific applications that restrict using movable parts or heavy and pricey equipment.


Introduction
Various computer vision tasks may require a video camera images to satisfy a number of conditions.One of such conditions is a constantly horizontal orientation of the camera, which is especially relevant for video-based measurements.This means the camera should be always pointed down even if the object that holds it is leaned.The mentioned requirement can be satisfied using a gyrostabilizer, however, this solution implies fixing a camera on a movable platform, which greatly increases the system complexity and reduces its reliability.
Solving this problem in software seems to be a good option for wide-angle cameras.The general idea is to acquire the camera's orientation relative to the earth and wrap the perspective according to it.

Description
Consider an object that holds a video camera.Its orientation is defined by three angles: roll (γ), pitch (ϑ) and yaw (ψ) as shown on figure 1.
In the simplest case, the γ and ϑ angles can be determined using a micromechanical accelerometer, although the accelerometer-only measurements are the subject to significant errors due to linear acceleration.A complex attitude control system is a better option.
The γ and ϑ Euler angles determine the position of an object relative to the surface, which allows for transformation that compensate the perspective wrap.
The required transformation is performed with the matrix operation (1), which transforms raw pixel coordinates (x 0 , y 0 ) into the (x, y) coordinates and compensates the perspective wrap.
where: z c depends on camera's viewing angle and should be treated as a response of the perspective wrapper on angle values; f -scaling coefficient that normally equals to the lens' focal length, but may be changed for scaling purposes.
The formula assumes that the origin of coordinates is located in the optical center (c x , c y ) and the image is not distorted.These requirements can be achieved with the camera calibration.
The exact values of z c and f parameters are different for each camera and can be determined experimentally, because the analytic expression includes the camera's viewing angle ω c x which is also an experimental value (in general case).
The rotation for a yaw angle is missing because the yaw stabilization is not a common task.If you need to stabilize the ψ angle, multiply the source vector (y 0 , z c , x 0 ) τ by the matrix (2) before applying the formula (1).
The described method enables you to recalculate the points of interest coordinates or change the image by wrapping the perspective based on the four angle points' coordinates.

Implementation
In order to test the described method, a smartphone application was developed.Any modern smartphone has accelerometer and camera, which makes them a great platform for testing such methods.The application was created for an Android smartphone using the Kotlin language.

Testing
The presented code was tested on a Sony XPERIA mini ST15i smartphone.The figure 2 shows the screenshots taken during the testing with different angles.To make the stabilization clearer, three images from figure 2 were overlapped, the result is shown on figure 3. Sample shots were taken by hands, thus there were linear movements of the device between shots.These movements were compensated with the operations of translation, scaling, and rotation.Objects remain approximately at the same place on the image regardless of the smartphone's tilt.The accuracy of tilt compensation depends on camera calibration quality, the orientation angles' precision and the precision of the zc parameter.

Conclusion
The described method is applicable when there is no need in the "untilted" image.For instance, some applications [4] require only the points of interest coordinates without the full image.In this case, the resulting coordinates can be calculated only for the points of interest and avoid the resource-demanding image transformations.
This also works for the undistortion procedure: it is possible to calculate the undistorted coordinates only for the points of interest without processing the entire image [5].

Fig. 3 .
Fig. 3.The combination of three images acquired by multiplication of the images with 80% opacity