CROSS-DISCIPLINARY PHYSICS AND RELATED AREAS OF SCIENCE AND TECHNOLOGY |
|
|
|
|
A Linear Frequency Principle Model to Understand the Absence of Overfitting in Neural Networks |
Yaoyu Zhang1,2, Tao Luo1, Zheng Ma1, and Zhi-Qin John Xu1* |
1School of Mathematical Sciences, Institute of Natural Sciences, MOE-LSC, and Qing Yuan Research Institute, Shanghai Jiao Tong University, Shanghai 200240, China 2Shanghai Center for Brain Science and Brain-Inspired Technology, Shanghai 200031, China
|
|
Cite this article: |
Yaoyu Zhang, Tao Luo, Zheng Ma et al 2021 Chin. Phys. Lett. 38 038701 |
|
|
Abstract Why heavily parameterized neural networks (NNs) do not overfit the data is an important long standing open question. We propose a phenomenological model of the NN training to explain this non-overfitting puzzle. Our linear frequency principle (LFP) model accounts for a key dynamical feature of NNs: they learn low frequencies first, irrespective of microscopic details. Theory based on our LFP model shows that low frequency dominance of target functions is the key condition for the non-overfitting of NNs and is verified by experiments. Furthermore, through an ideal two-layer NN, we unravel how detailed microscopic NN training dynamics statistically gives rise to an LFP model with quantitative prediction power.
|
|
Received: 27 September 2020
Published: 02 March 2021
|
|
PACS: |
07.05.Mh
|
(Neural networks, fuzzy logic, artificial intelligence)
|
|
87.85.dq
|
(Neural networks)
|
|
02.30.Nw
|
(Fourier analysis)
|
|
02.70.Rr
|
(General statistical methods)
|
|
|
Fund: Supported by the National Key R&D Program of China (Grant No. 2019YFA0709503), the Shanghai Sailing Program, the Natural Science Foundation of Shanghai (Grant No. 20ZR1429000), the National Natural Science Foundation of China (Grant No. 62002221), Shanghai Municipal of Science and Technology Project (Grant No. 20JC1419500), and the HPC of School of Mathematical Sciences at Shanghai Jiao Tong University |
|
|
[1] | Aurisano A, Radovic A, Rocco D, Himmel A, Messier M D, Niner E, Pawloski G, Psihas F, Sousa A and Vahle P 2016 J. Instrum. 11 P09001 |
[2] | Zhang L, Han J, Wang H, Car R and E W 2018 Phys. Rev. Lett. 120 143001 |
[3] | Guest D, Cranmer K and Whiteson D 2018 Annu. Rev. Nucl. Part. Sci. 68 161 |
[4] | Radovic A, Williams M, Rousseau D, Kagan M, Bonacorsi D, Himmel A, Aurisano A, Terao K and Wongjirad T 2018 Nature 560 41 |
[5] | Levine Y, Sharir O, Cohen N and Shashua A 2019 Phys. Rev. Lett. 122 065301 |
[6] | Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L and Zdeborová L 2019 Rev. Mod. Phys. 91 045002 |
[7] | Mehta P, Bukov M, Wang C H, Day A G R, Richardson C, Fisher C K and Schwab D J 2019 Phys. Rep. 810 1 |
[8] | Breiman L 1995 The Mathematics of Generalization (Addison Wesley Reading, MA) XX 11 |
[9] | Zdeborová L 2020 Nat. Phys. 16 1 |
[10] | Zhang C, Bengio S, Hardt M, Recht B and Vinyals O 2017 The International Conference on Learning Representations (Toulon, France 24–26 April 2017) |
[11] | Simonyan K and Zisserman A 2014 arXiv:1409.1556 [cs.CV] |
[12] | Brown T B, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A et al. 2020 arXiv:2005.14165 [cs.CL] |
[13] | Dyson F 2004 Nature 427 297 |
[14] | Saxe A M, McClelland J L and Ganguli S 2014 The International Conference on Learning Representations (Banff, Canada 14–16 April 2014) |
[15] | Saxe A M, Bansal Y, Dapello J, Advani M, Kolchinsky A, Tracey B D and Cox D D 2019 J. Stat. Mech.: Theory Exp. 2019 124020 |
[16] | Lampinen A K and Ganguli S 2019 The International Conference on Learning Representations (New Orleans, United States 6–9 May 2019) |
[17] | Engel A and Broeck C V D 2001 Statistical Mechanics ofLearning (Cambridge: Cambridge University Press) |
[18] | Aubin B, Maillard A, barbier J, Krzakala F, Macris N and Zdeborová L 2018 Advances in Neural Information Processing Systems (NeurIPS 2018) (Publisher: Curran Associates, Inc.) vol 31 p 3223 |
[19] | Choromanska A, Henaff M, Mathieu M, Arous G B and LeCun Y 2015 Artificial Intelligence and Statistics (Publisher: Curran Associates, Inc.) p 192 |
[20] | Mei S, Montanari A and Nguyen P M 2018 Proc. Natl. Acad. Sci. USA 115 E7665–E7671 |
[21] | Rotskoff G and Vanden-Eijnden E 2018 Advances in Neural Information Processing Systems (NeurIPS 2018) (Publisher: Curran Associates, Inc.) vol 31 p 7146 |
[22] | Chizat L and Bach F 2018Advances in Neural Information Processing Systems (NeurIPS 2018) (Publisher: Curran Associates, Inc.) vol 31 p 3036 |
[23] | Sirignano J and Spiliopoulos K 2020 Stochastic Processes and Their Applications 130 1820 |
[24] | Jacot A, Gabriel F and Hongler C 2018 Advances in Neural Information Processing Systems (NeurIPS 2018) (Publisher: Curran Associates, Inc.) vol 31 p 8571 |
[25] | Lee J, Xiao L, Schoenholz S, Bahri Y, Novak R, Sohl-Dickstein J and Pennington J 2019 Advances in Neural Information Processing Systems (NIPS 2019) (Publisher: Curran Associates, Inc.) vol 32 p 8572 |
[26] | Arpit D, Jastrzbski S, Ballas N, Krueger D, Bengio E, Kanwal M S, Maharaj T, Fischer A, Courville A, Bengio Y et al. 2017 Proceedings of the 34th International Conference on Machine Learning PMLR 70 p 233 |
[27] | Kalimeris D, Kaplun G, Nakkiran P, Edelman B, Yang T, Barak B and Zhang H 2019 Advances in Neural Information Processing Systems (NIPS 2019) (Publisher: Curran Associates, Inc.) vol 32 p 3496 |
[28] | Valle-Perez G, Camargo C Q and Louis A A 2019 The International Conference on Learning Representations (New Orleans, United States 6–9 May 2019) |
[29] | Xu Z Q J, Zhang Y and Xiao Y 2019 Neural Information Processing in Lecture Notes in Computer Science p 264 |
[30] | Xu Z Q J, Zhang Y, Luo T, Xiao Y and Ma Z 2020 Commun. Comput. Phys. 28 1746 |
[31] | Rahaman N, Baratin A, Arpit D, Draxler F, Lin M, Hamprecht F, Bengio Y and Courville A 2019 International Conference on Machine Learning PMLR 97 p 5301 |
[32] | Ronen B, Jacobs D, Kasten Y and Kritchman S 2019 Advances in Neural Information Processing Systems (NIPS 2019) (Publisher: Curran Associates, Inc.) vol 32 p 4763 |
[33] | Rabinowitz N C 2019 arXiv:1905.01320[cs.LG] |
[34] | Jagtap A D, Kawaguchi K and Karniadakis G E 2020 J. Comput. Phys. 404 109136 |
[35] | Yang G and Salman H 2019 arXiv:1907.10599 [cs.LG] |
[36] | Cao Y, Fang Z, Wu Y, Zhou D X and Gu Q 2019 arXiv:1912.01198 [cs.LG] |
[37] | Cai W, Li X and Liu L 2019 arXiv:1909.11759 [cs.LG] |
[38] | Biland S, Azevedo V C, Kim B and Solenthaler B 2019 arXiv:1912.08776 [cs.LG] |
[39] | Biland S, Azevedo V C, Kim B and Solenthaler B 2020 Eurographics Conferences (Publisher: The Eurographics Association) |
[40] | Liu Z, Cai W and Xu Z Q J 2020 Commun. Comput. Phys. 28 1970 |
[41] | Li X A, Xu Z Q J and Zhang L 2020 Commun. Comput. Phys. 28 1886 |
[42] | Wang B, Zhang W and Cai W 2020 Commun. Comput. Phys. 28 2139 |
[43] | Zhang Y, Xu Z Q J, Luo T and Ma Z 2019 arXiv:1905.07777 [cs.LG] |
[44] | Weinan E, Ma C and Wu L 2019 Commun. Math. Sci. 17 1407 |
[45] | Minsky M and Papert S A 2017 Perceptrons: An introduction to Computational Geometry (Massachusetts: MIT Press) |
[46] | Allender E 1996 International Conference on Foundations of Software Technology and Theoretical Computer Science (Berlin: Springer) p 1 |
[47] | Arora S, Du S, Hu W, Li Z and Wang R 2019 International Conference on Machine Learning PMLR 97 p 322 |
[48] | Weinan E, Ma C and Wu L 2020 Sci. Chin. Math. 63 1235 |
[49] | Cai Z and Liu J 2018 Phys. Rev. B 97 035116 |
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|