An Algorithm of Extracting I-Frame in Compressed Video

B-frame


INTRODUCTION
With the increasing development of electronic and information, the digital video data sharply increases every day, for example, electronic library of video data, TV programs, video on demand, video surveillance information and so on. Multimedia information of image, video and audio has gradually become the main form of information media in the field of information processing. Nowadays, according to the scientific statistics, the massive information obtained to human beings and the text information may be less than 20 percent, but the multimedia information may occupy more than 80 percent of the total outcomes [1]. Thus, the research of browsing and searching of the video database is a pressing duty. The technology of shot segmentation and key frame extraction is the foundation of the video analysis and video retrieval based on content. Key frame is the key image to describe a shot, which is composed of one or more I-frames. Thus, the extraction of I-frame in video processing is the most critical and the most basic step.

MPEG STANDARD
The MPEG is the abbreviation of Moving Picture Experts Group which is established by the international organization for standardization in 1988. It is committed to draw up the international standard of the moving image compression coding. The MPEG has announced the MPEG-1 standard at first, and then it further introduced the MPEG-2, the MPEG-4, the MPEG-7 and other standards [1,2]. This paper presents the method called I-frame Extraction Algorithm, which is an approach of directly extracting I-frames from the elementary stream based on the MPEG-2.

MPEG-2 standard
The MPEG-2 which is introduced after MPEG-1 standard stands for the standard of the generic coding for moving pictures and audio information. The MPEG-2 standard includes four parts: the system, the video, the audio and the detection and testing of video, and the audio and system stream.
The MPEG-2 stream is divided into three layers, that is: the basic stream, the packet elementary stream and the multiplexed transport streams, the program stream.

VES structure
As shown in figure 1, the VES structure includes six layers: a picture sequence (PS), a group of picture (GOP), a picture, a slice, a macroblock (MB), and a block [2].

MPEG-2 frame
The MPEG-2 standard is used to define three types of frames: the I-frame, the P-frame and the B-frame which uses different compression coding.
I-frame is adopted the DCT to encode with using the correlation of image itself based on intra-frame coding, but it does not need other frames as a reference [4,5].So, the correlation of time is not considered in the intra-frame coding image. I-frame recorded the main information is the basic frame in the MPEG international standard, and it can be referred to the other types of images.

An Algorithm of Extracting I-Frame in Compressed Video
Yaling Zhu, Xiaobin Li & Na Chen College of Software Engineering, Lanzhou Institute of Technology, Lanzhou, Gansu, China ABSTRACT: The MPEG video data includes three types of frames, that is: I-frame, P-frame and B-frame. However, the I-frame records the main information of video data, the P-frame and the B-frame are just regarded as motion compensations of the I-frame. This paper presents the approach which analyzes the MPEG video stream in the compressed domain, and find out the key frame of MPEG video stream by extracting the I-frame. Experiments indicated that this method can be automatically realized in the compressed MPEG video and it will lay the foundation for the video processing in the future.

Web of Conferences MATEC
P-frame and B-frame are the predicted images used for motion compensation. Both of all use interframe-coding, but the P-frame is used in the previous I-frame or the previous P-frame to encode, the B-frame requires I-frame or P-frame which comes from two directions that is before and after as a reference [2,3].

Idea of algorithm
The main idea of the algorithm can directly read out the header of image sequence, the header of the group of image and the header of picture in turn from the MPEG-2 binary stream without decompression. If the frame type of the header of picture is I-frame, then it can further analyze the slice and the macroblock [6]. Due to the different format of color image, the number of blocks consisting of a macroblock is also not equal. Finally, the approach can parse the data of each block. Each block data is the DCT of the I-frame displayed in the matrix, and then stored in the database. The flow chart of the algorithm is shown in figure 2.

Realization of algorithm
From the main idea of algorithm, the extraction of Iframe from compressed MPEG stream consists of the following steps. Input: A MPEG video stream. Output: I-frame and the number of I-frame.
Step1: In order to obtain the data of picture sequence, it needs to find the value of 0X000001E0 from the video binary stream binary that is start tag of the video data, and then return the binary data stream of picture sequence.
Step2: According to the binary data stream obtained by the previous step, it looks for the value of 0X000001B3 marked the start of picture sequence. The value of 0X000001B8 indicated the beginning of the group of picture, and the value of 0X00000100 signed the start of the picture.
Step3: Having a start tag of the picture, it analyzes the data downward layer by layer. The value of picture-coding-type of the binary stream data of picture determines whether the image is I-frame. If it is, it will continue to parse the picture. Otherwise, it will look for the next 0X000001 which is the marker of next picture.
Step4: If it discovers the value of 0X00000101 ~ 0X000001AF signed the beginning of the slice, it will analyze the slice consisted of many macroblock, then obtain the macroblock, and further resolve the data of macroblock [6].
Step5: The format of the color image identifies the number of blocks included in each macroblock. Therefore, we can read out the data of each block of every macroblock of I-frame, and store the data in database with SQL Server, then analyze them.
Step6: If it meets the value of 0X000001B7 labeled at the end of the sequence, the algorithm will terminate. Otherwise, it will return to step 2 to read out the next group of picture.
The pseudo code of the algorithm is shown as follows: #define

EXPERIMENTAL RESULTS
The algorithm is implemented in a console with vc6.0 and SQLServer2000 using lots of trial [7]. In order to evaluate the algorithm proposed in this paper, we establish 3G video's database which includes five different types of video, that is: animation, sports, stories, news and views [8,9]. Some of the experimental results are shown in the Table 1.