Features play a crucial role in computer vision. Initially designed to detect salient elements by means of handcrafted algorithms, features now are often learned using different layers in convo-lutional neural networks (CNNs). This paper develops a generic computer vision system based on features extracted from trained CNNs. Multiple learned features are combined into a single structure to work on different image classification tasks. The proposed system was derived by testing several approaches for extracting features from the inner layers of CNNs and using them as inputs to support vector machines that are then combined by sum rule. Several dimensionality reduction techniques were tested for reducing the high dimensionality of the inner layers so that they can work with SVMs. The empirically derived generic vision system based on applying a discrete cosine transform (DCT) separately to each channel is shown to significantly boost the performance of standard CNNs across a large and diverse collection of image data sets. In addition, an ensemble of different topologies taking the same DCT approach and combined with global mean thresholding pooling obtained state-of-the-art results on a benchmark image virus data set.


Information Technology and Cybersecurity

Document Type




Rights Information

© 2021 The authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).


Deep learning, Ensemble of descriptors, Global mean thresholding pooling, Support vector machines, Transfer learning

Publication Date


Journal Title

Journal of Imaging