Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment

Title	Convolutional Networks for Semantic Heads Segmentation using Top-View Depth Data in Crowded Environment
Publication Type	Conference Paper
Year of Publication	2018
Authors	Liciotti D, Paolanti M, Pietrini R., Frontoni E, Zingaretti P
Conference Name	2018 24th International Conference on Pattern Recognition (ICPR)
Date Published	Aug
Keywords	Cameras, Computer architecture, Fractals, Head, Image segmentation, Semantics, Training
Abstract	Detecting and tracking people is a challenging task in a persistent crowded environment (i.e. retail, airport, station, etc.) for human behaviour analysis of security purposes. This paper introduces an approach to track and detect people in cases of heavy occlusions based on CNNs for semantic segmentation using top-view depth visual data. The purpose is the design of a novel U-Net architecture, U-Net3, that has been modified compared to the previous ones at the end of each layer. In particular, a batch normalization is added after the first ReLU activation function and after each max-pooling and up-sampling functions. The approach was applied and tested on a new and public available dataset, TVHeads Dataset, consisting of depth images of people recorded from an RGB-D camera installed in top-view configuration. Our variant outperforms baseline architectures while remaining computationally efficient at inference time. Results show high accuracy, demonstrating the effectiveness and suitability of our approach.
DOI	10.1109/ICPR.2018.8545397