Please use this identifier to cite or link to this item: http://10.1.7.192:80/jspui/handle/123456789/12061
Title: Intelligent Video Analytics Based Framework For Multiview Video Summarization
Authors: Parikh, Vishal U.
Keywords: Theses
CE Theses
Theses CE
Theses IT
Dr. Priyanka Sharma
15EXTPHDE139
ITFCE021
TT000131
Issue Date: Dec-2022
Publisher: Institute of Technology
Series/Report no.: 15EXTPHDE139;TT000131
Abstract: Video Surveillance systems are used to monitor, observe and intercept the changes in activities, features and behavior of objects, people or places. With technological advancement, the quality of life of people has improved. Also, with technological advancement, large amounts of data are produced by people. The data is in the forms of text, images and videos. A multi-view surveillance system incorporates a network of video cameras, which are utilized to capture the required features and use them for pattern recognition, object identification, traffic management, object tracking, and so on. Hence, there is a need for significant efforts and means of devising methodologies for analyzing and summarizing them to manage with the space constraints. Video summaries can be generated either by keyframes or by skim/shot. The proposal is to develop an efficient camera placement algorithm for deciding placement of multiple video cameras at junctions and intersections in a multi-view surveillance system which will be capable of providing maximum coverage of the area under surveillance, which will leads to complete elimination or reduction of blind zones in a surveillance area, maximizing the view of subjects, and minimizing occlusions in closed room scenarios. The keyframe extraction is done based on deep learning based object detection techniques. Various object detection algorithms have been reviewed for generating and selecting the best possible frames as keyframes. A set of frames is extracted out of the original video sequence and based on the technique used, one or more frames of the set are decided as a keyframe, which then becomes the part of the summarized video. The thesis discusses the selection of various keyframe extraction techniques in detail. Furthermore, the proposal is to develop a video summarization algorithm which can be used to create summaries of the videos captured in a multi-view surveillance system. Such a video summarization algorithm can be used further for object detection, motion tracking, traffic segmentation, etc. v vi ABSTRACT in a multi-view surveillance system. A multi-view surveillance system captures the scenic details from a different perspective, defined by camera placements. The recorded data is used for feature extraction, which can be further utilized for various pattern-based analytic processes like object detection, event identification, and object tracking. The thesis is focused on the summary generation for office surveillance videos. The major focus of the summary generation is based on various keyframe extraction techniques. For the same, various training models like Mobilenet, SSD, and YOLO are used. A comparative analysis of the efficiency for the same showed that YOLO gives better performance as compared to the other models. Keyframe selection techniques like sufficient content change, maximum frame coverage, minimum correlation, curve simplification, and clustering based on human presence in the frame have been implemented. Variable and fixed length video summaries were generated and analyzed for each keyframe selection techniques for office surveillance videos. The analysis shows that the output video obtained after using the clustering and the curve simplification approaches is compressed to half the size of the actual video but requires considerably less storage space. The technique depending on the change of frame content between consecutive frames for keyframe selection produces the best output for office room scenarios. In this thesis, we present a method for creating a network of the optimal number of video cameras, to cover the maximum overlapping area under surveillance. In the proposed work, the focus is on developing algorithms for deciding efficient camera placement of multiple cameras at various junctions and intersections to generate a video summary based on the multiple views. Deep learning models like YOLO have been used for object detection based on the generation of a large number of bounding boxes and the associated search technique for generating rankings based on the views of the multiple cameras. Based on the view quality, the dominant views will be located. Further, keyframes are selected based on maximum frame coverage from these views. A video summary will be generated based on these keyframes. Thus, the video summary is generated through solving a multi-objective optimization problem based on keyframe importance evaluated using a maximum frame coverage.
URI: http://10.1.7.192:80/jspui/handle/123456789/12061
Appears in Collections:Ph.D. Research Reports

Files in This Item:
File Description SizeFormat 
15EXTPHDE139.pdf15EXTPHDE1393.81 MBAdobe PDFThumbnail
View/Open


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.