A Study of The Convolutional Neural Networks Applications
At present, deep learning is widely used in a broad range of arenas. A convolutional neural networks (CNN) is becoming the star of deep learning as it gives the best and most precise results when cracking real-world problems. In this work, a brief description of the applications of CNNs in two areas will be presented: First, in computer vision, generally, that is, scene labeling, face recognition, action recognition, and image classification; Second, in natural language processing, that is, the fields of speech recognition and text classification.
Bhandare, A., Bhide, M., Gokhale, P. & Chandavarkar, R., (2016). Applications of convolutional neural networks. International Journal of Computer Science and Information Technologies, 7(5), 2206-2215.
Boureau, Y. L., Ponce, J. & LeCun, Y. (2010). A theoretical analysis of feature pooling in visual recognition. In Proceedings of the 27th international conference on machine learning (ICML-10), 111-118.
Chellapilla, K., Puri, S. & Simard, P. (2006). High performance convolutional neural networks for document processing. Tenth International Workshop on Frontiers in Handwriting Recognition, Université de Rennes 1, Oct 2006, La Baule (France). ⟨inria-00112631⟩
Chéron, G., Laptev, I. & Schmid, C. (2015). P-cnn: Pose-based cnn features for action recognition. In Proceedings of the IEEE international conference on computer vision, 3218-3226.
Cireşan, D., Meier, U. & Schmidhuber, J. (2012). Multi-column deep neural networks for image classification. arXiv preprint arXiv:1202.2745.
Dalal, N. & Triggs, B. (2005). Histograms of oriented gradients for human detection.
Farabet, C., Couprie, C., Najman, L. & LeCun, Y. (2012). Learning hierarchical features for scene labeling. IEEE transactions on pattern analysis and machine intelligence, 35(8), 1915-1929.
Fukushima, K. (1980). Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological cybernetics, 36(4), 193-202.
Girshick, R., Donahue, J., Darrell, T. & Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 580-587.
Gkioxari, G., Girshick, R. & Malik, J. (2015). Contextual action recognition with r* cnn. ICCV '15 Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), 1080-1088. DOI: 10.1109/ICCV.2015.129.
Graves, A., Mohamed, A.R. & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In 2013 IEEE international conference on acoustics, speech and signal processing, 6645-6649.
Gu, J., Wang, Z., Kuen, J., Ma, L., Shahroudy, A., Shuai, B., Liu, T., Wang, X., Wang, G., Cai, J. & Chen, T. (2018). Recent advances in convolutional neural networks. Pattern Recognition, 77, 354-377.
He, K., Zhang, X., Ren, S. & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770-778.
Hecht-Nielsen, R. (1988). Theory of the backpropagation neural network. Neural Networks, 1, 445.
Hinton, G.E., Osindero, S. & Teh, Y.W. (2006). A fast learning algorithm for deep belief nets. Neural computation, 18(7), 1527-1554.
Huang, J.T., Li, J. & Gong, Y. (2015). An analysis of convolutional neural networks for speech recognition. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4989-4993.
Hubel, Hubel, D.H. & Wiesel, T.N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of physiology, 160(1), 106-154.
Ji, S., Xu, W., Yang, M. & Yu, K. (2012). 3D convolutional neural networks for human action recognition. IEEE transactions on pattern analysis and machine intelligence, 35(1), 221-231.
Johnson, R. & Zhang, T. (2014). Effective use of word order for text categorization with convolutional neural networks. arXiv preprint arXiv:1412.1058.
Johnson, R. & Zhang, T. (2015). Semi-supervised convolutional neural networks for text categorization via region embedding. In Advances in neural information processing systems, 919-927.
Kim, Y., Jernite, Y., Sontag, D. & Rush, A.M. (2016). Character-aware neural language models. In Thirtieth AAAI Conference on Artificial Intelligence.
Krizhevsky, A., Sutskever, I. & Hinton, G.E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 1097-1105.
Lawrence, S., Giles, C.L., Tsoi, A.C. & Back, A.D. (1997). Face recognition: A convolutional neural-network approach. IEEE transactions on neural networks, 8(1), 98-113.
LeCun, Y., Boser, B.E., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W.E. & Jackel, L.D. (1990). Handwritten digit recognition with a back-propagation network. In Advances in neural information processing systems, 396-404.
LeCun, Y., Bottou, L., Bengio, Y. & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278-2324.
LeCun, Y. & Bengio, Y. (1995). Convolutional networks for images, speech, and time series. The handbook of brain theory and neural networks, 3361(10), 1995.
Long, J., Shelhamer, E. & Darrell, T. (2015). Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3431-3440.
Lowe, D.G. (1999). Object recognition from local scale-invariant features. In iccv, 99(2), 1150-1157.
Mao, Q., Dong, M., Huang, Z. & Zhan, Y. (2014). Learning salient features for speech emotion recognition using convolutional neural networks. IEEE transactions on multimedia, 16(8), 2203-2213.
Mikolov, T., Chen, K., Corrado, G. & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Nair, V. & Hinton, G.E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10), 807-814.
Palaz, D. & Collobert, R. (2015). Analysis of cnn-based speech recognition system using raw speech as input (No. REP_WORK). Idiap.
Pennington, J., Socher, R. & Manning, C. (2014). Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532-1543.
Pinheiro, P.H. & Collobert, R. (2014). Recurrent convolutional neural networks for scene labeling. In 31st International Conference on Machine Learning (ICML) (No. CONF).
Qian, Y., Bi, M., Tan, T. & Yu, K. (2016). Very deep convolutional neural networks for noise robust speech recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(12), 2263-2276.
Rajeswar, M.S., Sankar, A.R., Balasubramaniam, V.N. & Sudheer, C.D. (2015). Scaling up the training of deep CNNs for human action recognition. In 2015 IEEE International Parallel and Distributed Processing Symposium Workshop, 1172-1177.
Ranzato, M.A., Huang, F.J., Boureau, Y.L. & LeCun, Y. (2007). Unsupervised learning of invariant feature hierarchies with applications to object recognition. In 2007 IEEE conference on computer vision and pattern recognition, 1-8.
Santos, C.D. & Zadrozny, B. (2014). Learning character-level representations for part-of-speech tagging. In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1818-1826.
Shamsaldin, A., Rashid, T., Al-Rashid Agha, R., Al-Salihi, N. and Mohammadi, M. (2019). Donkey and smuggler optimization algorithm: A collaborative working approach to path finding. Journal of Computational Design and Engineering, 6(4), pp.562-583.
Simonyan, K. & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
Simonyan, K. & Zisserman, A. (2014). Two-stream convolutional networks for action recognition in videos. In Advances in neural information processing systems, 568-576.
Song, W. & Cai, J. (2015). End-to-end deep neural network for automatic speech recognition. Standford CS224D Reports.
Steinkraus, D., Buck, I. & Simard, P.Y. (2005). Using GPUs for machine learning algorithms. In Eighth International Conference on Document Analysis and Recognition (ICDAR'05), 1115-1120.
Strigl, D., Kofler, K. & Podlipnig, S. (2010). Performance and scalability of GPU-based convolutional neural networks. In 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, 317-324.
Swietojanski, P. & Arnab G. (2014). Convolutional neural networks for recognition. IEEE Signal Processing Letters 21.9 (2014), 1120-1124.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1-9.
Uetz, R. & Behnke, S. (2009). Large-scale object recognition with CUDA-accelerated hierarchical neural networks. In 2009 IEEE international conference on intelligent computing and intelligent systems, 1, 536-541.
Wang, K., Wang, X., Lin, L., Wang, M. & Zuo, W. (2014). 3d human activity recognition with reconfigurable convolutional neural networks. In Proceedings of the 22nd ACM international conference on Multimedia, 97-106.
Wang, P., Cao, Y., Shen, C., Liu, L. & Shen, H.T. (2016). Temporal pyramid pooling-based convolutional neural network for action recognition. IEEE Transactions on Circuits and Systems for Video Technology, 27(12), 2613-2622.
Xia, L. & Aggarwal, J. K. (2013). Spatio-temporal depth cuboid similarity feature for activity recognition using depth camera. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2834-2841.
Xiao, T., Xu, Y., Yang, K., Zhang, J., Peng, Y. & Zhang, Z. (2015). The application of two-level attention models in deep convolutional neural network for fine-grained image classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 842-850.
Yan, Z., Zhang, H., Piramuthu, R., Jagadeesh, V., DeCoste, D., Di, W. & Yu, Y. (2015). HD-CNN: hierarchical deep convolutional neural networks for large scale visual recognition. In Proceedings of the IEEE international conference on computer vision, 2740-2748.
Yang, J., Yu, K., Gong, Y. & Huang, T. (2009). Linear spatial pyramid matching using sparse coding for image classification. In 2009 IEEE Conference on computer vision and pattern recognition, 1794-1801.
Zeiler, M.D. & Fergus, R. (2014). Visualizing and understanding convolutional networks. In European conference on computer vision. Springer, Cham., 818-833.
Zhang, Z. (2012). Microsoft kinect sensor and its effect. IEEE multimedia, 19(2), 4-10.
Zhang, X., Zhao, J. & LeCun, Y. (2015). Character-level convolutional networks for text classification. In Advances in neural information processing systems, 649-657.
Zheng, W.Q., Yu, J.S. & Zou, Y.X. (2015). An experimental study of speech emotion recognition based on deep convolutional neural networks. In 2015 international conference on affective computing and intelligent interaction (ACII), 827-831.
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Authors who publish with this journal agree to the following terms:
1. Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License [CC BY-NC-ND 4.0] that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
2. Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
3. Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See The Effect of Open Access).