Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks

doi:10.1088/0256-307X/40/8/080202

摘要
图/表
参考文献
相关文章 (15)

全文: PDF (1487 KB) HTML (1 KB)
输出: BibTeX | EndNote (RIS)

	服务

	把本文推荐给朋友
	加入我的书架
	加入引用管理器
	E-mail Alert
	RSS
	作者相关文章

Abstract：Stochastic gradient descent (SGD), a widely used algorithm in deep-learning neural networks, has attracted continuing research interests for the theoretical principles behind its success. A recent work reported an anomaly (inverse) relation between the variance of neural weights and the landscape flatness of the loss function driven under SGD [Feng Y and Tu Y Proc. Natl. Acad. Sci. USA 118 e2015617118 (2021)}]. To investigate this seeming violation of statistical physics principle, the properties of SGD near fixed points are analyzed with a dynamic decomposition method. Our approach recovers the true “energy” function under which the universal Boltzmann distribution holds. It differs from the cost function in general and resolves the paradox raised by the anomaly. The study bridges the gap between the classical statistical mechanics and the emerging discipline of artificial intelligence, with potential for better algorithms to the latter.

收稿日期: 2023-05-19 Editors' Suggestion 出版日期: 2023-08-04

PACS:	02.50.-r	(Probability theory, stochastic processes, and statistics)
	02.50.Ey	(Stochastic processes)
	05.10.-a	(Computational methods in statistical physics and nonlinear dynamics)
	07.05.Mh	(Neural networks, fuzzy logic, artificial intelligence)

引用本文:

. [J]. 中国物理快报, 2023, 40(8): 80202-.
Xia Xiong, Yong-Cong Chen, Chunxiao Shi, and Ping Ao. Stochastic Gradient Descent and Anomaly of Variance-Flatness Relation in Artificial Neural Networks. Chin. Phys. Lett., 2023, 40(8): 80202-.

链接本文:

https://cpl.iphy.ac.cn/CN/10.1088/0256-307X/40/8/080202 或 https://cpl.iphy.ac.cn/CN/Y2023/V40/I8/80202