We have developed a novel form of deep neural networks that combines the advantages of compositional hierarchies and deep convolutional networks (ConvNets or CNNs). Both approaches have its benefits and its drawbacks. For instance, compositional hierarchies have explicit structure that directly explains what is a feature or a category. This makes approach partially generative as it can generate new instances of specific parts. However, learning compositional hierarchy has always presented its own challenges. Having a good optimization function to learn compositions that are good for discriminative tasks, such as image classification, has always been a difficult proposition. On the other hand deep convolutional networks have always had a well defined cost function and a well defined learning scheme through back-propagation. However, main drawback of deep learning has always been understanding what is happening inside. The network structure is quite opaque and complex visualization techniques are needed to gain any understanding into the feature being learned.
We have now introduced a novel network architecture that combines the advantages of both approaches and can act as a bridge between deep networks and compositional hierarchies. The proposed network combines explicit structure of compositions with a powerful discriminative training of deep networks. We term this network Deep Compositional Network. This architecture introduces several intriguing properties into deep networks:
- fully adjustable receptive fields through spatially-adjustable filter units
- new visualization capabilities by explicitly following the compositions
- reduced parameters for spatial coverage
- efficient inference
We achieved this by replacing 3x3 filter units with novel compositional unit that is implemented with Gaussian distribution. We term this unit Displaced Aggregation Unit or DAU for short. DAU has three parameters, i.e., importance weight, offset (mu) and spatial aggregation perimeter (variance), that are all learned through back-propagation in deep learning framework. Having ability to learn the importance weight and offsets allows us to achieve fully adjustable receptive fields that can replace existing 3x3 convolutions in deep networks.
Direct replacement for Conv2D
The proposed network with DAUs is fully compatible with any other deep learning models and can be used as direct replacement of convolutional layers in ConvNets. DAU layers can be arbitrarily combined with standard convolutional layers to form any kind of network. DAUs can be implemented for any kind of deep architectures, such as AlexNet, VGG16 or ResNet. In fact, we provide the pre-trained AlexNet variant of deep compositional network available for download.
Adjustable receptive fields
Ability of DAUs to arbitrarily learn offsets allows deep compositional network to adjust receptive fields to any problem at hand. Network is flexible enough to adjust certain features to have large receptive fields while adjusting other features to have receptive field that are smaller. This can be used as replacement of dilated convolution since that is particularly useful for semantic segmentation where context information is important.
Adjustable receptive fields are reflected in significantly larger receptive field sizes when applied to AlexNet architecture trained for semantic segmentation:
Reduced number of parameters
Our analysis revealed that 9 units/parameters in 3x3 filters still uses too many units for spatial coverage. With DAU it is possible to significantly reduce the number of units and parameters to only a few units per filter kernel.
Compared to classic ConvNets our deep compositional network with DAUs can achieve:
- 70% less parameters
- 90% less units
- Only 0.5% performance difference
Replacement for dilated Conv2D (ASPP) with a unified model for classification and segmentation
Since DAUs inherently provide adjustable receptive field sizes, it becomes a natural fit for a popular semantic segmentation model, DeepLab (Chen et al, , 2017), where large receptive fields are achieved with hand-tuned dilation.
Having fully adjustable units removes the need for fine-tuning dilation factors for different problems (segmentation or classification). A single DAU layer can replace multiple dilated convolutions from Atrous Spatial Pyramid Pooling (ASPP from DeepLab), since units will converge further out if required during the learning. This results in simplified deep pathway, reduced number of learning parameters and reduced number of hyper-parameters.
Experiments on ImageNet showed that the same DAU network designed for classification (ResNet-101) significantly outperforms standard ResNet-101 without having any adjustments (ASPP) for segmentation. DAU-ResNet-101 outperformed ResNet-101 with 4% better mIoU on Cityscape dataset. Adding additional DAU layer with 6 units (DAU-6U) to replace the ASPP layer (with no need for hyper-parameter tuning) further improves the results and slightly outperforms comparable DeepLab-v3+ architecture.
Replacing large kernel convolutions in blind image de-blurring models
Displaced Aggregation Units improve models for image de-blurring where convolutions with large kernels are often used. When applying DAUs to state-of-the-art model (SRN-DeblurNet, Tao et al. 2018) with 43-layer U-Net architecture in a scale-recurrent approach significantly reduces model footprint and number of parameters.
We experimented with DAU-SRN-DeblurNet where 5 x 5 convolutions are replaced with two displaced aggregation units per convolution filter. The replacements are made in all but four layers: we retain two de-convolution layers and the first and the last layers as classical convolutions. This results in a much more efficient network with 4x fewer parameters than SRN-DeblurNet, and in per-filter adapted receptive field sizes:
Code
Code for all our models is publicly available on our GitHub repositories.
- DAU-ConvNet: Self-contained DAU layer implementation (C++ and CUDA). Use this library to implement DAU layers in any deep learning frameworks.
- DAU-ConvNet TensorFlow: DAU-ConvNet contains TensorFlow wrapper as well (build using BUILD_TENSORFLOW_PLUGIN=on).
- DAU-ConvNet-caffe: Caffe implmenetation of DAU-ConvNet using upper library. See DAUConvolution layer for details how to implement DAU-ConvNet library.
- caffe: Older ICPR2016 version of Deep Compositional Network (GaussianConvLayers) without constraints on unit variance but significantly slower implementation.
Please feel free to use our code in your research projects or implementing it in any deep learning framework using DAU-ConvNet library. Please cite our IJCV2020 paper when using our code.
Pre-trained models
We provide ImageNet pre-trained models for Caffe framework. Models are based on AlexNet architecture where conv3,conv4 and conv5 are implemented with DAU convolutions. Models compatible with our DAU-ConvNet-caffe implementation are available to download at:
- AlexNet-DAU-ConvNet (default) (56.9% top-1 accuracy, 0.7 mio DAU units)
- AlexNet-DAU-ConvNet-small (56.4% top-1 accuracy, 0.3 mio DAU units)
- AlexNet-DAU-ConvNet-large (57.3% top-1 accuracy, 1.5 mio DAU units)