(44-2) 19 * << * >> * Russian * English * Content * All Issues
  
Reducing computational costs in deep learning 
    on almost linearly separable training data
    
    I.M. Kulikovsvkikh 1,2,3
 1 Samara National Research University, 
   443086, Russia, Samara, Moskovskoe Shosse 34,
   2 Faculty of Electrical Engineering and Computing, University of Zagreb, 
   10000, Croatia, Zagreb, Unska 3,
   3 Rudjer Boskovic Institute,   10000, Croatia, Zagreb, Bijenicka cesta 54
   
 
 PDF, 1294 kB
  PDF, 1294 kB
DOI: 10.18287/2412-6179-CO-645
Pages: 282-289.
Full text of article: Russian language.
 
Abstract:
Previous research in  deep learning indicates that iterations of the gradient descent, over separable  data converge toward the L2 maximum margin solution. Even in the absence of  explicit regularization, the decision boundary still changes even if the  classification error on training is equal to zero. This feature of the so-called “implicit regularization” allows gradient  methods to use more aggressive learning rates that result in substantial  computational savings. However, even if the gradient descent method generalizes well, going toward the optimal  solution, the rate of convergence to this solution is much slower than the rate  of convergence of a loss function itself with a fixed step size. The present  study puts forward the generalized logistic loss function that involves the  optimization of hyperparameters, which results in a faster convergence rate  while keeping the same regret bound as the gradient descent method. The results  of computational experiments on MNIST and Fashion MNIST benchmark datasets for  image classification proved the viability of the proposed approach to reducing computational costs and  outlined directions for future research.
Keywords:
implicit regularization, gradient method, convergence rate, linear separability, image classification.
Citation:
  Kulikovskikh IM. Reducing computational costs in deep learning on almost linearly separable training data. Computer Optics 2020; 44(2): 282-289. DOI: 10.18287/2412-6179-CO-645.
Acknowledgements:
This work was supported by the Russian Federation President's grant (Project No. MK-6218.2018.9), the Ministry of Education and Science of the Russian Federation (Project No. 074-U01), RFBR (Project No. 18-37-00219), and the Centre of Excellence project “DATACROSS”, co-financed by the Croatian Government and the European Union through the European Regional Development Fund - the Competitiveness and Cohesion Operational Programme (KK.01.1.1.01.0009).
References:
  - 
    LeCun Y, Bengio Y. Deep learning. Nature 2015; 521(7553): 436-444. DOI:  10.1038/nature14539.
 
- Goodfellow I, Bengio Y, Courville Y. Deep learning. Cambridge, London: The MIT Press;  2016. ISBN: 978-0-262-03561-3.
 
- Neyshabur B, Tomioka R, Srebro N. In search of the real  inductive bias: On the role of implicit regularization in deep learning.  Source: <https://arxiv.org/abs/1412.6614>.
 
- Soudry D, Hoffer  E, Nacson MS, Gunasekar S, Srebro N. The implicit bias of gradient descent on  separable data. J Mach Learn Res 2018; 19: 1-57.
 
- Zhang C, Bengio S, Recht M, Vinyals O. Understanding deep learning requires  rethinking generalization. Source: <https://arxiv.org/abs/1611.03530>.
 
- Hoffer E, Hubara I, Soudry D. Train longer, generalize better:  closing the generalization gap in large batch training of neural networks.  Source: <https://arxiv.org/abs/1705.08741>.
 
- Nacson MS, Lee J, Gunasekar S, Srebro N, Soudry D.  Convergence of gradient descent on separable data. 2019 22nd International  Conference on Artificial Intelligence and Statistics (AISTATS) 2019; PMLR 89:  3420-3428.
 
- Gunasekar S, Lee J, Soudry D, Srebro N. Characterizing implicit bias in terms of  optimization geometry. 2018 35th International Conference on  Machine Learning (ICML) 2018; PMLR 80: 1832-1841.
 
- Ma C, Wang  K, Chi Y, Chen Y. Implicit regularization in nonconvex statistical estimation:  Gradient descent converges linearly for phase retrieval and matrix completion. 2018  35th International Conference on Machine Learning (ICML) 2018; PMLR  80: 3345-3354.
 
- Kingma DP, Ba JL. Adam: A method for stochastic optimization.  Source: <https://arxiv.org/abs/1412.6980>.
 
- Duchi J, Hazan  E, Singer Y. Adaptive  subgradient methods for online learning and stochastic optimization. J  Mach Learn Res 2011; 12: 2121-2159.
 
- Zeiler MD. ADADELTA: An adaptive learning rate method.  Source: <https://arxiv.org/abs/1212.5701>.
 
- Kim HS, Kang JH, Park WM, Ko SH, Cho YH, Yu DS, Song  YS, Choi JW. Convergence  analysis of optimization algorithms. Source: <https://arxiv.org/abs/1707.01647>.
 
- Ruder S. An overview of gradient descent optimization algorithms.  Source: <https://arxiv.org/abs/1609.04747>..
 
- Wilson AC, Roelofs R, Stern M, Srebro N, Recht B. The  marginal value of adaptive gradient methods in machine learning. 2017 31st  Conference on Neural Information Processing Systems (NIPS) 2017: 1-11.
 
- Vorontsov KV. Mathematical methods for supervised learning (machine  learning theory) [In Russian]. Source: <http:// www.machinelearning.ru/wiki/images/6/6d/Voron-ML-1.pdf>. 
 
- Castaneda ARS, Torres ER, Goris NAV, González MM, Reyes JB, González  VGS, et al. New formulation of the Gompertz equation to describe the kinetics  of untreated tumors. PLoS ONE 2019; 14(11): e0224978.
 
- Kulikovskikh I, Prokhorov S, Lipić T, Legović T, Šmuc T. BioGD:  Bio-inspired robust gradient descent. PLoS ONE 2019; 14(7): e0219004.
 
- Kulikovskikh I,  Prokhorov S, Legović T, Šmuc T. An SGD-based meta-learner with “growing”  descent. J Phys: Conf Ser 2019; 1368: 052008.
 
- Savchenko AV.  Maximum-likelihood dissimilarities in image recognition with deep neural  networks. Computer Optics 2017; 41(3): 422-430. DOI: 10.18287/2412-6179-2017-41-3-422-430.
 
- An S, Boussaid F, Bennamoun M. How can deep rectifier  networks achieve linear separability and preserve distances? 2015 32nd  International Conference on Machine Learning (ICML) 2015; PMLR 375: 514-523.
    
- Bergstra J, Bengio Y. Random search  for hyperparameter optimization. J Mach  Learn Res 2012; 13: 281-305. 
  
  © 2009, IPSI RAS
  151, Molodogvardeiskaya str., Samara, 443001, Russia; E-mail: ko@smr.ru ; Tel: +7 (846) 242-41-24 (Executive secretary), +7 (846) 332-56-22 (Issuing editor), Fax: +7 (846) 332-56-20