Skip to content

Commit 4b44302

Browse files
committed
updated readme
1 parent 1781b30 commit 4b44302

File tree

1 file changed

+60
-19
lines changed

1 file changed

+60
-19
lines changed

README.md

+60-19
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,82 @@
1-
# H.264 Motion Vector Capture
1+
# Motion Vector Extractor
22

3-
This class is a replacement for OpenCV's [VideoCapture](https://docs.opencv.org/4.1.0/d8/dfe/classcv_1_1VideoCapture.html) and can be used to read and decode video frames from a video stream. Special about it is, that it returns additional values for each frame:
4-
- H.264 motion vectors
3+
This tool extracts frames, motion vectors, frame types and timestamps from H.264 and MPEG-4 Part 2 encoded videos.
4+
5+
This class is a replacement for OpenCV's [VideoCapture](https://docs.opencv.org/4.1.0/d8/dfe/classcv_1_1VideoCapture.html) and can be used to read and decode video frames from a H.264 or MPEG-4 Part 2 encoded video stream/file. It returns the following values for each frame:
6+
- decoded frame as BGR image
7+
- motion vectors
58
- Frame type (keyframe, P- or B-frame)
69
- (for RTSP streams): UTC wall time of the moment the sender sent out a frame (as opposed to an easily retrievable timestamp for the frame reception)
710

811
These additional features enable further projects, such as fast visual object tracking or synchronization of multiple RTSP streams. Both a C++ and a Python API is provided. Under the hood [FFMPEG](https://github.com/FFmpeg/FFmpeg) is used.
912

10-
The image below shows a snapshot of a video frame with extracted motion vectors overlaid,
13+
The image below shows a video frame with extracted motion vectors overlaid,
1114

1215
![motion_vector_demo_image](mvs.png)
1316

14-
A usage example can be found in `video_cap_test.py`.
17+
A usage example can be found in `test.py`.
1518

1619

17-
## Requirements
20+
## Installation
1821

19-
- Linux (test with Ubuntu 18.04)
20-
- Python 3.6 or 3.7
21-
- Numpy
22-
- OpenCV
22+
#### Using Docker
2323

24-
## Installation
24+
Install [Docker](https://docs.docker.com/).
25+
Clone the repo to your machine
26+
```
27+
git clone https://github.com/LukasBommes/mv-extractor.git
28+
```
29+
Open a terminal inside the repo and build the docker container with the following command (note: this can take more than one hour)
30+
```
31+
sudo docker build . --tag=mv_extractor
32+
```
33+
Now, run the docker container with
34+
```
35+
sudo docker run -it --ipc=host --env="DISPLAY" -v $(pwd):/home/video_cap -v /tmp/.X11-unix:/tmp/.X11-unix:rw mv_extractor /bin/bash
36+
```
37+
Test if everything is sucesfully installed by running the demo script
38+
```
39+
python3 test.py
40+
```
41+
42+
#### Alternative: Using Docker Compose
2543

44+
Install [Docker](https://docs.docker.com/) and [Docker Compose](https://pypi.org/project/docker-compose/).
45+
Clone the repo to your machine
2646
```
27-
pip3 install video_cap
47+
git clone https://github.com/LukasBommes/mv-extractor.git
2848
```
29-
If this does not work, try `pip` instead of `pip3`. You can also install the package into a [virtual environment](https://virtualenv.pypa.io/en/latest/userguide/). This also installs Numpy 1.17.0 and OpenCV 4.1.0.25.
49+
Open a terminal inside the repo and start the container with
50+
```
51+
sudo docker-compose up -d
52+
```
53+
Note that this will build the container when run for the first time (note: this can take more than one hour). Later this command simply starts the container without building.
54+
Once the container runs enter an interactive shell prompt inside of the container via
55+
```
56+
sudo docker exec -it video_cap_dev bash
57+
```
58+
Test if everything is sucesfully installed by running the demo script
59+
```
60+
python3 test.py
61+
```
62+
63+
#### Remove Intermediate Build Images to Free Disk Space (Optional)
64+
65+
Building the image leaves some intermediate images behind which can be deleted via
66+
```
67+
sudo docker rmi -f $(sudo docker images -f "dangling=true" -q)
68+
```
69+
3070

31-
## Usage
71+
### Usage
3272

33-
Download `video_cap_test.py` and `vid.mp4` and place them into any directory of your machine. Then, open a terminal in this directory and run the example with
73+
If you want to use the motion vector extractor in your own Python script import it via
3474
```
35-
python3 video_cap_test.py
75+
from mv_extractor import VideoCap
3676
```
77+
You can then use it according to the example in `test.py`.
3778

38-
A H.264 encoded video file is opened by `VideoCap.open()` and frames, motion vectors, frame types and timestamps are read by calling `VideoCap.read()` repeatedly. Extracted motion vectors are drawn onto the video frame (see image above). Before exiting the program, the video file is closed by `VideoCap.release()`.
79+
Generally, a video file is opened by `VideoCap.open()` and frames, motion vectors, frame types and timestamps are read by calling `VideoCap.read()` repeatedly. Before exiting the program, the video file has to be closed by `VideoCap.release()`. For a more detailed explanation see the API documentation below.
3980

4081

4182
## Python API
@@ -88,8 +129,8 @@ Takes no input arguments and returns a tuple with the elements described in the
88129
| Index | Name | Type | Description |
89130
| --- | --- | --- | --- |
90131
| 0 | success | bool | True in case the frame and motion vectors could be retrieved sucessfully, false otherwise or in case the end of stream is reached. When false, the other tuple elements are set to empty numpy arrays or 0. |
91-
| 1 | frame | numpy array | Array of dtype uint8 shape (h, w, 3) containing the decoded video frame. w and h are the width and height of this frame in pixels. If no frame could be decoded an empty numpy ndarray of shape (0, 0, 3) and dtype uint8 is returned. |
92-
| 2 | motion vectors | numpy array | Array of dtype int64 and shape (N, 10) containing the N motion vectors of the frame. Each row of the array corresponds to one motion vector. The columns of each vector have the following meaning (also refer to [AVMotionVector](https://ffmpeg.org/doxygen/4.1/structAVMotionVector.html) in FFMPEG documentation): <br>- 0: source: Where the current macroblock comes from. Negative value when it comes from the past, positive value when it comes from the future.<br>- 1: w: Width and height of the vector's macroblock.<br>- 2: h: Height of the vector's macroblock.<br>- 3: src_x: x-location of the vector's origin in source frame (in pixels).<br>- 4: src_y: y-location of the vector's origin in source frame (in pixels).<br>- 5: dst_x: x-location of the vector's destination in the current frame (in pixels).<br>- 6: dst_y: y-location of the vector's destination in the current frame (in pixels).<br>- 7: motion_x: src_x = dst_x + motion_x / motion_scale<br>- 8: motion_y: src_y = dst_y + motion_y / motion_scale<br>- 9: motion_scale: see definiton of columns 7 and 8<br>Note: If no motion vectors are present in a frame, e.g. if the frame is an `I` frame an empty numpy array of shape (0, 10) and dtype int64 is returned. |
132+
| 1 | frame | numpy array | Array of dtype uint8 shape (h, w, 3) containing the decoded video frame. w and h are the width and height of this frame in pixels. Channels are in BGR order. If no frame could be decoded an empty numpy ndarray of shape (0, 0, 3) and dtype uint8 is returned. |
133+
| 2 | motion vectors | numpy array | Array of dtype int64 and shape (N, 10) containing the N motion vectors of the frame. Each row of the array corresponds to one motion vector. If no motion vectors are present in a frame, e.g. if the frame is an `I` frame an empty numpy array of shape (0, 10) and dtype int64 is returned. The columns of each vector have the following meaning (also refer to [AVMotionVector](https://ffmpeg.org/doxygen/4.1/structAVMotionVector.html) in FFMPEG documentation): <br>- 0: source: Offset of the reference frame from the current frame. The reference frame is the frame where the motion vector points to and where the corresponding macroblock comes from. If source < 0, the reference frame is in the past. For s > 0 the it is in the future (in display order).<br>- 3: src_x: x-location (in pixels) where the motion vector points to in the reference frame.<br>- 4: src_y: y-location (in pixels) where the motion vector points to in the reference frame.<br>- 5: dst_x: x-location of the vector's origin in the current frame (in pixels). Corresponds to the x-center coordinate of the correspdoning macroblock.<br>- 6: dst_y: y-location of the vector's origin in the current frame (in pixels). Corresponds to the y-center coordinate of the correspdoning macroblock.<br>- 7: motion_x = motion_scale * (src_x - dst_x)<br>- 8: motion_y = motion_scale * (src_y - dst_y)<br>- 9: motion_scale: see definiton of columns 7 and 8. Used to scale up the motion components to integer values. E.g. if motion_scale = 4, motion components can be integer values but encode a float with 1/4 pixel precision. |
93134
| 3 | frame_type | string | Unicode string representing the type of frame. Can be `"I"` for a keyframe, `"P"` for a frame with references to only past frames and `"B"` for a frame with references to both past and future frames. A `"?"` string indicates an unknown frame type. |
94135
| 4 | timestamp | double | UTC wall time of each frame in the format of a UNIX timestamp. In case, input is a video file, the timestamp is derived from the system time. If the input is an RTSP stream the timestamp marks the time the frame was send out by the sender (e.g. IP camera). Thus, the timestamp represents the wall time at which the frame was taken rather then the time at which the frame was received. This allows e.g. for accurate synchronization of multiple RTSP streams. In order for this to work, the RTSP sender needs to generate RTCP sender reports which contain a mapping from wall time to stream time. Not all RTSP senders will send sender reports as it is not part of the standard. If IP cameras are used which implement the ONVIF standard, sender reports are always sent and thus timestamps can always be computed. |
95136

0 commit comments

Comments
 (0)