Chinese Physics Letters, 2020, Vol. 37, No. 8, Article code 080501 Machine Learning for Many-Body Localization Transition Wen-Jia Rao (饶文嘉)* Affiliations School of Science, Hangzhou Dianzi University, Hangzhou 310027, China Received 10 May 2020; accepted 28 May 2020; published online 28 July 2020 Supported by the National Natural Science Foundation of China (Grant Nos. 11904069 and 11847005).
*Corresponding author. Email: wjrao@hdu.edu.cn
Citation Text: Rao W J 2020 Chin. Phys. Lett. 37 080501    Abstract We employ the methods of machine learning to study the many-body localization (MBL) transition in a 1D random spin system. By using the raw energy spectrum without pre-processing as training data, it is shown that the MBL transition point is correctly predicted by the machine. The structure of the neural network reveals the nature of this dynamical phase transition that involves all energy levels, while the bandwidth of the spectrum and nearest level spacing are the two dominant patterns and the latter stands out to classify phases. We further use a comparative unsupervised learning method, i.e., principal component analysis, to confirm these results. DOI:10.1088/0256-307X/37/8/080501 PACS:05.30.Rt, 75.10.Pq, 89.20.Ff, 05.70.Jk © 2020 Chinese Physics Society Article Text The quantum phases in isolated many-body systems have attracted a great deal of attention in the past decade, and it is now widely accepted that two generic phases in such systems exist: a thermal phase and a many-body localized (MBL) phase.[1,2] Physically, the thermal phase is ergodic and can act as heat bath for itself. The eigenstate wavefunctions in thermal phase are extended, the matrix elements of the Hamiltonian between different eigenstates are non-zero, resulting in a correlated spectrum with level repulsion. On the other hand, thermalization fails in the MBL phase as localization persists in the presence of interactions. The nature of the thermal/MBL phase is often revealed by numerical studies of spectral statistics, especially from the view of quantum entanglement. In thermal phase, the reduced density operator from bipartition of the system plays the role of thermal density operator for its subsystem, hence the entanglement entropy is extensive and follows volume law. In the MBL phase, however, the absence of thermalization leads to small entanglement that satisfies an area law. The scaling of quantum entanglement and its evolution after a local quench are widely used in the study of the MBL system.[3–13] More recently, there is a growing attention to the area of machine learning (ML),[14] which is an efficient algorithm to extract hidden features in data and to make predictions about new ones. Conceptually, ML is divided into unsupervised learning and supervised learning, the former is a collection of exploratory methods that extract hidden patterns in the input data without prior knowledge, while in supervised learning the input data are accompanied by correct labels. The spirit of ML is very similar to the study of phase transitions, where we use order parameters to distinguish different phases. Therefore it is unsurprising that ML has found various applications in condensed matter physics.[15–33] As to the study of an MBL system, there are also existing works that utilize machine-learning algorithms.[34–37] In most of those works, the eigenvalue spectrum of the reduced density matrix of a subsystem, i.e., the entanglement spectrum (ES),[38] is used as the training data. Practically, the ES is obtained by bipartitioning the ground state wave function and tracing out a subsystem, which can be thought of as an efficient dimension reduction to extract physical information in quantum many-body systems. The great success of machine learning in various problems from image recognition to language translation with raw data hence it hints as us using lower-level training data with little or no pre-processing. Given the energy spectrum of an isolated quantum many-body system, trained physicists would explore their spectral statistics, especially the distribution of nearest level spacings (gaps between nearest energy levels). In the thermal phase, due to level repulsion, the level spacing will follow a Wigner–Dyson (WD) distribution,[39,40] while levels in the MBL phase are independent and level spacing forms a Poisson distribution. The difference in the level spacing distribution is widely used to study MBL transition.[41–44] However, this approach also requires careful pre-processing on the energy spectrum. The machine learning methodology, on the other hand, enable us to learn physics directly through the raw energy spectrum without any pre-processing. It has been shown in an earlier work of the author that even the simplest neural network can correctly capture the thermal-MBL transition with energy spectrum being training data,[45] but the underlying mechanism on how the machine works was not explored, and this is the main purpose of the present study. In this work, we employ machine learning to study the thermal-MBL transition in a 1D random spin chain, with the raw energy spectrum as training data. Except for demonstrating the efficiency of our method, we focus on digging out “what” the machine learns to distinguish phases. We begin by introducing the spin-1/2 Heisenberg model with random external field, and present the simulations of the level spacing distribution. Next, we utilize a combined version of supervised and supervised learning algorithm to study the MBL transition, and by analyzing the resulting neural network we demonstrate the underlying mechanism on how the machine learns. We then employ a comparative unsupervised learning algorithm, i.e., principal component analysis (PCA),[20] to support our findings. The conclusion and discussions come last. Model and method. We consider the spin-$1/2$ Heisenberg model with random external fields, which is the canonical model for studying MBL transition, $$ H=\sum_{i=1}^{L}{\boldsymbol S}_{i}\cdot {\boldsymbol S}_{i+1}+h\sum_{i=1}^{L}\sum_{\alpha =x,z}\varepsilon _{i}^{\alpha }S_{i}^{\alpha },~~ \tag {1} $$ where coupling strength is set to be $1$ and a periodic boundary condition is assumed in the Heisenberg term, $\varepsilon _{i}^{\alpha}$'s are random numbers within the range $\left[ -1,1\right] $, and $h$ is referred as the randomness strength. We choose the case where randomness appears in both $x$ and $z$ directions, hence the model is described by a real and symmetric Hamiltonian matrix and belongs to the Gaussian orthogonal ensemble (GOE). Note that although our model [Eq. (1)] breaks the time-reversal symmetry due to the external field, an anti-unitary symmetry comprised of time reversal and a rotation by $\pi $ of all spins remains about the $z$ axis which leaves the Hamiltonian unchanged. The phase diagram is controlled by the randomness strength $h$: the system is in a thermal phase when $h$ is small, while in MBL when $h$ is large, the transition point is $h_{\rm c}\simeq 3$, according to the previous studies.[5,34,44] The universal behavior of such a random Hamiltonian is revealed by random matrix theory (RMT),[46] and contains several statistical features that are determined only by the symmetry but independent of microscopic details. The most widely used quantity is the distribution of nearest level spacings $\{ s_{i}=E_{i+1}-E_{i}\} $, i.e., the gaps between neighboring energy levels: in thermal phase, $\{s_{i}\}$ follows a Gaussian orthogonal ensemble (GOE) distribution $P(s) =\frac{\pi s}{2}\exp (-\pi s^{2}/4) $, which reflects the linear level repulsion between energy levels. On the other hand, the levels become independent in the MBL phase, resulting in the Poisson distribution $P(s) =\exp (-s) $. We choose an $L=10$ system to present a numerical demonstration, and prepare $4000$ samples at $h=1$ and $h=5$. In Fig. 1(a) we plot the density of states (DOS) for the $h=1$ case. We can see that the DOS is much more uniform in the middle part of the spectrum, which is also the case for $h=5$. Therefore we choose the middle half of energy levels to perform the spacing counting, and the results are shown in Fig. 1(b). We observe a clear GOE distribution for $h=1$ and Poisson for $h=5$ as expected. It is noted that the fitting for Poisson distribution has minor deviations around the region $s\sim 0$, this is due to finite size effect since in a finite system there will always remain a small but non-zero correlation between levels. There are two other things that are worth pointing out here. Firstly, the formula of GOE distribution is strictly correct only for $2\times 2$ matrices and approximately correct for larger systems, although the difference is small (see Sec. 4.12 in Ref. [46]). Secondly, in counting the level spacings, another strategy called the unfolding procedure is also widely used,[47] where the whole spectrum is unfolded to make the DOS almost uniform in all energy ranges. This procedure suffers from subtle ambiguity raised by the concrete unfolding strategy.[48] Nevertheless, we have also tested the unfolding procedure using cubic spline interpolation for this model, and the fitting results are almost the same.[45]
cpl-37-8-080501-fig1.png
Fig. 1. (a) The density of states (DOS) of the random field Heisenberg model, $\rho(E)$, in Eq. (1) at $L=10$ and $h=1$. The DOS is more uniform in the middle part, we therefore choose the middle half-levels to perform level statistics. (b) Distribution of nearest level spacings $P(E_{i+1}-E_i)$. We can see a GOE distribution in the thermal phase ($h = 1$) and Poisson distribution in the MBL phase ($h = 5$).
As can be seen, the level spacing distributions faithfully reflect the difference between the thermal and MBL phases, and are commonly used to analyze the MBL transition.[5,41,43,49] However, this methodology requires a few steps of pre-processing and careful treatment about normalization, while machine learning provides a more direct way to study the thermal-MBL transition in such a system, that is, through raw energy spectrum without pre-processing. The efficiency of such a method in an earlier work of the author has been reported,[45] but the underlying mechanism was not explored, which is the main purpose of this work. Before that, we would like to give a self-contained description of our method. In this work, we employ the simplest neural network (NN) architecture, namely the three-layer feed-forward NN with fully connected layers, whose structure is shown in Fig. 2(a). The input layer is, in our case, comprised of a raw energy spectrum $\left\{ E_{i}\right\} $ without pre-processing. A middle layer of sigmoid neurons $\left\{ h_{j}\right\} $ is introduced to perform highly nonlinear transformation on input data, which can be viewed as the pattern extractor. Finally, an output layer of two neurons $\left\{ y_{k}\right\} $ tells us the prediction on the labels of input data. Mathematically, this NN reads $$\begin{align} &h_{j}=\sigma \Big(\sum_{i}E_{i}W_{ij}^{[1]}+B_{j}^{[1]}\Big),\\ &y_{k}=\sigma \Big(\sum_{j}h_{j}W_{jk}^{[2]}+B_{k}^{[2]}\Big),~~ \tag {2} \end{align} $$ where the sigmoid function is $\sigma \left(x\right) =1/\left(1+e^{-x}\right) $, $W^{[1/2]}$ is the coupling coefficients between neurons in neighboring layers, $B^{[1/2]}$ is the introduced external bias. The output $y_{0},y_{1}$ is the probability that NN predicts the data to be in the thermal or MBL phase, and the machine will identify the label with the largest $y_{k}$. Given a set of training data with labels $y_{-}$, the NN is trained to fit the labels by updating coefficients in a standard supervised learning manner.[14] For our training, we assume that there is no prior knowledge about the concrete location of phase transition, and we use the so-called confusion scheme[27,34] to detect the critical point. The confusion scheme is a combined version of supervised and unsupervised learning techniques, whose idea is as follows.[34] Suppose that the training data (raw energy spectrum) with randomness $h$ are prepared along the range $h\in [h_{1},h_{2}]$, and a certain critical point exists at an unknown point $h_{\rm c}$. We manually guess the critical point to be $h_{\rm c}'$ and label all the data with randomness strength smaller than $h_{\rm c}'$ to be phase $0$ (thermal) and the rest to be $1$ (MBL), then we use the labeled data set to perform supervised learning. The performance (testing accuracy) of NN $T(h_{\rm c}'$) will depend on the choice of $h_{\rm c}'$: (1) when $h_{\rm c}'=h_{1}/h_{2}$, the performance will be close to $1$ since all the data is thought to be in the same class; (2) when $h_{\rm c}'=h_{\rm c}$, i.e., the guessed point is the correct one, the performance will also be close to $1$ provided that the NN fully captures the features within data; (3) when $h_{\rm c}'$ is taken to be other values, the performance will be significantly smaller since there will always be data that share the same features while labeled differently—that is, the machine is confused. Therefore, the performance of NN $T\left(h_{\rm c}'\right) $ will form a characteristic W-shape curve with the middle peak being the correct critical point. Learning results. We are now to use the confusion scheme to study the MBL transition in the model of Eq. (1) with system size $L=10$. The training data set is prepared along $h\in \lbrack 1,5]$ with interval $0.1$, where $4000$ samples are collected at each point. In preparation of each sample, we use the standard exact diagonalization algorithm, and keep all the $2^{10}=1024$ energy levels. We then guess the critical point $h_{\rm c}'$ along the same range with interval $0.2$, and take the following training parameters: the number of hidden neurons $M=100$, learning rate $\eta =0.0005$ with auto-decay coefficient $0.9999$, an L2 regularization term of weight $0.1$ is introduced to prevent over-fitting, and batch size is kept to be $100 $ samples. Figure 2(b) shows the resulting evolution of NN performance $T(h_{\rm c}')$, we observe a clear W-shape curve with the critical point identified to be $h_{\rm c}\simeq 3.2\pm 0.2$, in good agreement with the previous studies. It is worth noting that the correct critical point is roughly at the center of parameter range (which is also the case in Ref. [34]). This is because the training result is most stable in the present case.
cpl-37-8-080501-fig2.png
Fig. 2. (a) Three-layer feed-forward NN that is used to perform machine learning, where the training data in the input layer is the raw energy spectrum $\{E_i\}$. (b) Evolution of NN's performance $T(h_{\rm c}')$ as a function of the guessed critical point $h_{\rm c}'$, the middle peak in the W-shape curve indicates the critical point is $h_{\rm c}\simeq 3.2\pm0.2$.
To pursue the underlying mechanism about how the machine learns, we must look deeply into the structure of coupling coefficient $W^{[1]}$ between the input layer and hidden layer, which plays the role of feature extractor. Concretely, we select four representative points $h_{\rm c}'=1,2.2,2.8,3.2$ in the W-shape curve, and draw the corresponding heatmaps of $W^{[1]}$, the results are presented in Fig. 3, where we also choose a typical node $j$ and draw the corresponding distribution $w^{j}(i) \equiv W_{ij}^{[1]}$, which is denoted by the black dashed line in the figure. As can be seen from Fig. 3, when the guessed critical point is $h_{\rm c}'=1$, the $W^{[1]}$ is random with no patterns. The reason is that, when $h_{\rm c}'$ is close to one side of the randomness range, the number of data with one label is overwhelmed by the other, therefore the machine tends to recognize all the data to be in the same phase, which is acceptable, hence the training becomes trivial. When $h_{\rm c}'=2.2$, the number of data in the two classes becomes comparable, and the machine begins to learn non-trivial features. From the $w^{j}(i) $ distribution we observe peaks at the ends of spectrum and a quasi-linear behavior at the middle, the latter gives a signal reading $\sum_{i}w^{j}(i) E_{i}\sim \sum_{i}E_{i}^{2}\sim \Delta (\{ E_{i}\}) $, where $\Delta (\{ E_{i}\}) $ is the variance of the spectrum, corresponding to the bandwidth of the energy spectrum. This means that the bandwidth is the first signal to be captured. However, as can be seen from the Hamiltonian in Eq. (1), the mean bandwidth is growing monotonically with randomness strength $h$, hence the bandwidth can be used to distinguish a low-disorder spectrum from a high-disorder one, but it cannot determine the transition point without prior knowledge. When $h_{\rm c}'$ goes closer to the middle, more features are recognized, and consequently $W_{ij}^{[1]}$ gets more noisy. Finally, when $h_{\rm c}'=3.2$, $W^{[1]}$ becomes seemingly random again, which reflects that the thermal-MBL transition is a dynamical transition that involves all energy levels in a complicated way.
cpl-37-8-080501-fig3.png
Fig. 3. Heatmaps of the neural network's coupling matrix $W^{[1]}$ at typical guessed critical points $h_{\rm c}'=1,2.2,2.8,3.2$, denoted by big red dots in the W-shape curve and guided by a black arrow. The black dashed lines in the heatmaps stands for one typical chosen hidden node $j $, and the corresponding distribution of coupling coefficients $w^{j}(i)=W_{ij}^{[1]}$ is also plotted, guided by a red arrow.
To further explore the patterns hidden in the NN, we perform the Fourier transformation for the $w^{j}(i) $, that is, $$ F_{W}^{j}(k) =\Big\vert \frac{1}{N}\sum_{m=1}^{N}e^{imk}w^{j}(m) \Big\vert .~~ \tag {3} $$ Different modes $k$ read different physical signals. For example, the mode $F_{W}^{j}(k=0) $ gives the mean value of $w^{j}(m) $, when acting on the energy spectrum, it reads the mean energy of the spectrum, which is obviously a trivial signal. The smallest non-zero momentum mode $k=\frac{2\pi }{2^{10}}$, on the other hand, gives a quasi-linear dispersion of $w^{j}(m) $, which reads the bandwidth as described in the above paragraph. The mode $k=\pi $ gives a signal of $\sum_{m=1}^{N}(-1) ^{m}w^{j}(m) $, i.e., an alternating-sign between neighboring nodes, which reads nothing but the nearest level spacings, i.e., the “correct” signal to differentiate phases. We plot the distribution of $F_{W}^{j}(k) $ with a typical node $j$ in the cases of $h_{\rm c}'=2.2$ and $3.2$, the results are displayed in Figs. 4(a) and 4(b), respectively. As can be seen, $F_{W}^{j}(k) $ at $h_{\rm c}'=2.2$ has a dominant mode with smallest non-zero momentum $k=\frac{2\pi }{2^{10}}$, while all other modes are significantly smaller (actually, most of them are close to zero), which means that it reads the bandwidth, consistent with the above discussion. Meanwhile, $F_{W}^{j}(k) $ at $h_{\rm c}'=3.2$ is seemingly random again with no apparent dominant modes. This is a restatement that the thermal-MBL transition is a dynamical one that involves all energy levels. Notably, the mode $F_{W}^{j}(k=\pi) $ is large, meaning that the nearest level spacing is recognized. To proceed, we want to see the evolution of these two modes during the training process. For this purpose, we calculate the average value of the Fourier modes $\overline{F_{W}(k)}$ with $k=\frac{2\pi }{2^{10}}$ and $k=\pi $ over all hidden nodes, that is, $$ \overline{F_{W}(k)} =\frac{1}{M}\sum_{j=1}^{M}F_{W}^{j}( k) ,~~ \tag {4} $$ where $M=100$ is the number of hidden nodes. We evaluate the evolution of $\overline{F_{W}(k)}$ with $h_{\rm c}'$, the results are displayed in Figs. 4(c) and 4(d) for $k=\frac{2\pi }{2^{10}}$ and $k=\pi $, respectively, where we restrict to the range $h_{\rm c}'\in [ 2.2,4] $ since only in this range the machine learns non-trivial patterns. As can be seen, the signal of the $k=\frac{2\pi}{2^{10}}$ drops at $h_{\rm c}'=3.2$, i.e., the identified transition point, while the $k=\pi$ mode reaches its peak at the same time. This result indicates that the machine utilizes the signal of nearest level spacing to defeat the trivial signal of bandwidth to identify the phase transition, which is physically correct.
cpl-37-8-080501-fig4.png
Fig. 4. The Fourier modes distribution of $F_{W}^{j}(k) $ at (a) $h_{\rm c}'=2.2$ and (b) $h_{\rm c}'=3.2$. In the former case, the lowest non-zeros mode is dominant; in the latter case $F_{W}^{j}(k) $ shows a noisy distribution with no apparent dominant modes, indicating all levels contribute to the MBL transition. The mean value of the Fourier mode with (c) $k=\frac{2\pi }{2^{10}}$ and (d) $k=\pi$, which reads bandwidth and nearest level spacing respectively. At the correct point $h_{\rm c}'=3.2$, the former is suppressed and the latter is amplified.
At this stage, we have confirmed that the machine manages to capture the nearest level spacings to distinguish the thermal phase from the MBL phase. However, from the results in Figs. 4(c) and 4(d) we can see the magnitude of mode $k=0$ is still larger than $k=\pi $ even at $h_{\rm c}'=3.2$, which means the signal of bandwidth is always stronger than that of level spacing. This is due to the training data we used in this work, i.e., the whole energy spectrum: the universal behavior of level spacing is restricted to the middle part of the spectrum where the DOS is almost uniform, while the bandwidth comes from the whole spectrum. In addition, compared to the efficiency of machine learning, the analysis we used is relatively complicated, this is because the fully connected NN we used contains a large number of parameters. Hence extracting a simple and clear signal is not easy. Such a paradox reveals the black-box nature of machine learning, that is, ML is designed to detect features in data, not to understand the meaning of them. To overcome this obstacle, an updated NN that contains a built-in feature extractor is desired, for example, the convolutional neural network (CNN).[50] In the above analysis, we mainly focus on the modes $k=\frac{2\pi}{2^{10}}$ and $k=\pi$, corresponding to the signal of bandwidth and nearest level spacing respectively. This is not only because they reflect the nature of MBL transition in our model, but also because they are the two dominant patterns in the energy spectrum, as we will demonstrate in the following. PCA Analysis. The analysis about the NN confirms the machine manages to capture the nearest level spacing to distinguish the thermal phase from the MBL phase, although a stronger “trivial” signal corresponding to bandwidth always exists. Our analysis only concentrates on the discussion of these two signals because both of them are the most important patterns in the energy spectrum. To show this we employ a new machine learning algorithm, i.e., PCA, to study the energy spectrum. PCA is a classical unsupervised learning algorithm whose aim is to extract the important patterns (principal components) from high-dimensional data, hence achieving the purpose of dimension reduction. PCA is most widely used in the area of image compression, and has been used in studying the phase transitions of 2D Ising models.[20] To perform PCA, we collect the samples of the energy spectrum in the randomness range $h\in [ 1,5] $ with interval $0.2$, and select $2000$ samples for each point. In each spectrum, we pick the middle half levels, that is, $512$ energy levels, this is because the DOS is more uniform in this range while the bandwidth information is also kept. The whole data is then arranged into a large data matrix $$ X=\left( \begin{array}{cccc} E_{1}^{\left(1\right) } & E_{2}^{\left(1\right) } & \cdots & E_{512}^{\left(1\right) } \\ E_{1}^{\left(2\right) } & E_{2}^{\left(2\right) } & \cdots & E_{512}^{\left(2\right) } \\ \vdots & \vdots & & \vdots \\ E_{1}^{\left(N_{\rm s}\right) } & E_{2}^{\left(N_{\rm s}\right) } & \cdots & E_{512}^{\left(N_{\rm s}\right) }\end{array} \right),~~ \tag {5} $$ where the super index is the sample index, whose total number is $N_{\rm s}=26000 $, and the subindex refers to the energy level index. We then calculate the covariance matrix $C=X^{T}X$, and perform the diagonalization $$ CW_{i}=\lambda _{i}W_{i},~~ \tag {6} $$ where the (unnormalized) eigenvalue $\lambda _{i}$ reflects the weight of the corresponding component (eigenvector) $W_{i}$. The six largest components and their weights are displayed in Fig. 5. The weight of first component $\lambda _{1}$ is dominant, and the corresponding $W_{1}$ is linear with energy level index. As discussed above, this component corresponds to the bandwidth of the spectrum. The weight of the second component $\lambda _{2}$ is much smaller than $\lambda _{1}$, while still being dominant compared to the remained components. The behavior of $W_{2}$ is a half trigonometric function, which corresponds to the $\overline{F_{W}} (k=\pi) $ mode discussed above and reads the nearest level spacing. The weight of other components are negligible compared to $W_{1}$ and $W_{2}$, and their structure is unstable, that is, their behavior will change when the shape of the data matrix $X$ changes (which can be achieved by, for instance, changing the number of chosen levels or the number of samples). On the other hand, the behaviors of the first two components are quite stable, confirming that the bandwidth and nearest level spacings are indeed the most important patterns in the energy spectrum.
cpl-37-8-080501-fig5.png
Fig. 5. The six largest components of sample matrix $X$ in Eq. (5), where each subfigure's title gives the corresponding weights. The dominant component reads the bandwidth and the second component reads the nearest-neighbor level spacing, and other components are significantly less important. The first two components are stable when the shape of the input data changes.
Conclusion and discussion. We have used three-layer feed-forward neural network (NN) to study the thermal-MBL transition in random spin systems, with the raw energy spectrum as the training data. The structure of trained NN reveals the nature of MBL transition is a dynamical transition that involves all energy levels. By the evaluation of the Fourier modes of the coupling coefficients $W^{[1]}$, it is confirmed the machine manages to capture the nearest level spacing to distinguish phases, which is consistent with conventional analyses from level statistics. Our analysis mainly concerns two signals, i.e., bandwidth and level spacing, which are the two most important patterns in the energy spectrum as confirmed by the PCA analysis. Compared to existing works utilizing ES to detect the MBL transition by machine learning, the training data we used, i.e., the raw energy spectrum, is much “lower level” that requires no pre-processing. On the other hand, due to the nature of dynamical phase transition that involves all energy levels, analytical description of MBL transition based on the energy spectrum is complicated and, to the best of our knowledge, has not appeared yet, while the NN manage to learn it in a direct while efficient way. Actually, when feeding the energy spectrum, the NN essentially describes the interaction between energy levels. This is a manifestation of the level dynamics description of random matrix,[51,52] which explains the efficiency of our method in a qualitative way. Meanwhile, every coin has two sides, the power of machine learning roots in the complex structure of NN that contains a large number of parameters, which prevents us from extracting a simple physical signal that machine learns. This is in fact unsurprising since modern ML architecture is designed to learn features rather than to understand the meaning of such features. The analysis we performed in this work is then an attempt to break into the black box of machine learning. The ML architecture we used in this work is the simplest fully connected feed-forward NN, the learned feature is then stored in the whole NN that makes it relatively difficult to extract. We can update the method by employing an NN that has a built-in feature extractor, one example is the convolution neural network (CNN), where the filter channel (also called kernel in machine learning terminology) that performs convolution on the energy spectrum is a direct feature extractor.[50] In general, our approach works for any dynamical transition that has a characteristic energy spectrum. For example, the MBL models with more than one critical point,[4,37,53] or the quantum many-body systems with periodic driving,[54–57] i.e., the floquet systems, where the effective energy spectrum takes place in the conventional energy spectrum. Also, it is possible to employ machine learning to study MBL from an entanglement point of view. To date, it has been shown that the ground state wavefunction of various quantum many-body systems can be written into an NN structure,[58–61] and their entanglement properties have also been studied.[62,63] These are all promising directions for future study. W.-J. Rao acknowledges Xin Wan and Rubah Kausar for the stimulating discussion.
References Interacting Electrons in Disordered Wires: Anderson Localization and Low- T TransportDephasing and Weak Localization in Disordered Luttinger LiquidMetal–insulator transition in a weakly interacting many-electron system with localized single-particle statesUnbounded Growth of Entanglement in Models of Many-Body LocalizationMany-Body Localization in a Disordered Quantum Ising ChainMany-body localization and thermalization: Insights from the entanglement spectrumCharacterizing the many-body localization transition using the entanglement spectrumBimodal entanglement entropy distribution in the many-body localization transitionTwo-Component Structure in the Entanglement Spectrum of Highly Excited StatesPower-Law Entanglement Spectrum in Many-Body Localized PhasesMany-body localization transition: Schmidt gap, entanglement length, and scalingMany-Body Localization and Quantum Nonergodicity in a Model with a Single-Particle Mobility EdgeCriterion for Many-Body Localization-Delocalization Phase TransitionMany-body localization and delocalization in large quantum chainsDetection of Phase Transition via Convolutional Neural NetworksMachine learning phases of matterLearning thermodynamics with Boltzmann machinesQuantum Loop Topography for Machine LearningMachine learning Z 2 quantum spin liquids with quasiparticle statisticsDiscovering phase transitions with unsupervised learningMachine Learning Phases of Strongly Correlated FermionsKernel methods for interpretable machine learning of order parametersApplications of neural networks to the studies of phase transitions of two-dimensional Potts modelsDeep Learning the Quantum Phase Transitions in Random Two-Dimensional Electron SystemsDeep Learning the Quantum Phase Transitions in Random Electron Systems: Applications to Three DimensionsDiscovering phases, phase transitions, and crossovers through unsupervised machine learning: A critical examinationDiscriminative Cooperative Networks for Detecting Phase TransitionsIdentifying product order with restricted Boltzmann machinesExtracting critical exponents by finite-size scaling with convolutional neural networksVisualizing Correlations in the 2D Fermi-Hubbard Model with AISelf-learning Monte Carlo methodSelf-learning Monte Carlo method and cumulative update in fermion systemsSelf-learning quantum Monte Carlo method in interacting fermion systemsLearning phase transitions by confusionLearning phase transitions from dynamicsProbing many-body localization with neural networksMachine Learning Out-of-Equilibrium Phases of MatterEntanglement Spectrum as a Generalization of Entanglement Entropy: Identification of Topological Order in Non-Abelian Fractional Quantum Hall Effect StatesOn a Class of Analytic Functions from the Quantum Theory of CollisionsCharacteristic Vectors of Bordered Matrices With Infinite DimensionsOn the Distribution of the Roots of Certain Symmetric MatricesStatistical Theory of the Energy Levels of Complex Systems. IMany-body localization phase transitionMany-Body Localization in Imperfectly Isolated Quantum SystemsMany-body localization edge in the random-field Heisenberg chainFloquet thermalization: Symmetries and random matrix ensemblesMachine learning the many-body localization transition in random spin systemsLevel statistics of XXZ spin chains with a random magnetic fieldMisleading signatures of quantum chaosLevel statistics in a Heisenberg chain with random magnetic fieldLearning What a Machine Learns in a Many-Body Localization TransitionA Brownian‐Motion Model for the Eigenvalues of a Random MatrixSpectral statistics across the many-body localization transitionLocalization-protected quantum orderFate of Many-Body Localization Under Periodic DrivingPhase Structure of Driven Quantum SystemsDiscrete Time Crystals: Rigidity, Criticality, and RealizationsElectron-electron interaction effects in Floquet topological superconducting chains: Suppression of topological edge states and crossover from weak to strong chaosSolving the quantum many-body problem with artificial neural networksMachine learning topological statesNeural network representation of tensor network and chiral statesEquivalence of restricted Boltzmann machines and tensor network statesQuantum Entanglement in Neural Network StatesDeep Learning and Quantum Entanglement: Fundamental Connections with Implications to Network Design
[1] Gornyi I V, Mirlin A D and Polyakov D G 2005 Phys. Rev. Lett. 95 206603
Gornyi I V, Mirlin A D and Polyakov D G 2005 Phys. Rev. Lett. 95 046404
[2] Basko D M, Aleiner I L and Altshuler B L 2006 Ann. Phys. 321 1126
[3] Bardarson J H, Pollmann F and Moore J E 2012 Phys. Rev. Lett. 109 017202
[4] Kjall J A, Bardarson J H and Pollmann F 2014 Phys. Rev. Lett. 113 107204
[5] Geraedts S D, Nandkishore R and Regnault N 2016 Phys. Rev. B 93 174202
[6] Geraedts S D, Regnault N and Nandkishore R M 2017 New J. Phys. 19 113021
[7] Yu X, Luitz D J and Clark B K 2016 Phys. Rev. B 94 184202
[8] Yang Z C, Chamon C, Hamma A and Mucciolo E R 2015 Phys. Rev. Lett. 115 267206
[9] Serbyn M, Michailidis A A, Abanin M A and Papic Z 2016 Phys. Rev. Lett. 117 160601
[10] Gray J, Bose S and Bayat A 2018 Phys. Rev. B 97 201105
[11] Li X, Ganeshan S, Pixley J H and Das Sarma S 2015 Phys. Rev. Lett. 115 186601
[12] Serbyn M, Papic Z and Abanin D A 2015 Phys. Rev. X 5 041047
[13] Doggen E V H, Schindler F, Tikhonov K S, Mirlin A D, Neupert T, Polyakov D G and Gornyi I V 2018 Phys. Rev. B 98 174202
[14]Goodfellow I, Bengio Y and Courville A 2016 Deep Learning (Cambridge: MIT Press)
[15] Tanaka A and Tomiya A 2017 J. Phys. Soc. Jpn. 86 063001
[16] Carrasquilla J and Melko R G 2017 Nat. Phys. 13 431
[17] Torlai G and Melko R G 2016 Phys. Rev. B 94 165134
[18] Zhang Y and Kim E A 2017 Phys. Rev. Lett. 118 216401
[19] Zhang Y, Melko R G and Kim E A 2017 Phys. Rev. B 96 245119
[20] Wang L 2016 Phys. Rev. B 94 195105
[21] Ch'ng K, Carrasquilla J, Melko R G and Khatami E 2017 Phys. Rev. X 7 031038
[22]Morningstar A and Melko R G 2018 J. Mach. Learn. Res. 18 5975
[23] Ponte P and Melko R G 2017 Phys. Rev. B 96 205146
[24] Li C D, Tan D R and Jiang F J 2018 Ann. Phys. 391 312
[25] Ohtsuki T and Ohtsuki T 2016 J. Phys. Soc. Jpn. 85 123706
Ohtsuki T and Ohtsuki T 2017 J. Phys. Soc. Jpn. 86 044708
[26] Hu W, Singh R R P and Scalettar R T 2017 Phys. Rev. E 95 062122
[27] Liu Y H and van Nieuwenburg E P L 2018 Phys. Rev. Lett. 120 176401
[28] Rao W J, Li Z, Zhu Q, Luo M and Wan X 2018 Phys. Rev. B 97 094207
[29] Li Z, Luo M and Wan X 2019 Phys. Rev. B 99 075418
[30] Khatami E, Guardado-Sanchez E, Spar B M, Carrasquilla J F, Bakr W S and Scalettar R T 2020 arXiv:2002.12310 [cond-mat.str-el]
[31] Liu J, Qi Y, Meng Z Y and Fu L 2017 Phys. Rev. B 95 041101
[32] Liu J, Shen H, Qi Y, Meng Z Y and Fu L 2017 Phys. Rev. B 95 241104
[33] Xu X Y, Qi Y, Liu J, Fu L and Meng Z Y 2017 Phys. Rev. B 96 041119(R)
[34] van Nieuwenburg E P L, Liu Y H and Huber S D 2017 Nat. Phys. 13 435
[35] van Nieuwenburg E, Bairey E and Refael G 2018 Phys. Rev. B 98 060301
[36] Schindler F, Regnault N and Neupert T 2017 Phys. Rev. B 95 245134
[37] Venderley J, Khemani V and Kim E A 2018 Phys. Rev. Lett. 120 257204
[38] Li H and Haldane F D M 2008 Phys. Rev. Lett. 101 010504
[39] Wigner E P 1951 Ann. Phys. (N.Y.) 53 36
Wigner E P 1955 Ann. Phys. (N.Y.) 62 548
Wigner E P 1958 Ann. Phys. (N.Y.) 67 325
[40] Dyson F J 1962 J. Math. Phys. 3 140
[41] Pal A and Huse D A 2010 Phys. Rev. B 82 174411
[42] Johri S, Nandkishore R and Bhatt R N 2015 Phys. Rev. Lett. 114 117401
[43] Luitz D J, Laflorencie N and Alet F 2015 Phys. Rev. B 91 081103(R)
[44] Regnault N and Nandkishore R 2016 Phys. Rev. B 93 104203
[45] Rao W J 2018 J. Phys.: Condens. Matter 30 395902
[46]Haake F 2001 Quantum Signatures of Chaos (Berlin: Springer)
[47] Kudo K and Deguchi T 2004 Phys. Rev. B 69 132404
[48] Gomez J M G, Molina R A, Relano A and Retamosa J 2002 Phys. Rev. E 66 036209
[49] Avishai Y, Richert J and Berkovits R 2002 Phys. Rev. B 66 052416
[50] Kausar R, Rao W J and Wan X 2020 arXiv:2005.00721 [cond-mat.dis-nn]
[51] Dyson F J 1962 J. Math. Phys. 3 1191
[52] Serbyn M and Moore J E 2016 Phys. Rev. B 93 041424(R)
[53] Huse D A, Nandkishore R, Oganesyan V, Pal A and Sondhi S L 2013 Phys. Rev. B 88 014206
[54] Lazarides A, Das A and Moessner R 2015 Phys. Rev. Lett. 115 030402
[55] Khemani V, Lazarides A, Moessner R and Sondhi S L 2016 Phys. Rev. Lett. 116 250401
[56] Yao N Y, Potter A C, Potirniche I D and Vishwanath A 2017 Phys. Rev. Lett. 118 030401
[57] Su W, Chen M N, Shao L B, Sheng L and Xing D Y 2016 Phys. Rev. B 94 075145
[58] Carleo G and Troyer M 2017 Science 355 602
[59] Deng D L, Li X and Das Sarma S 2017 Phys. Rev. B 96 195145
[60] Huang Y and Moore J E 2017 arXiv:1701.06246 [cond-mat.dis-nn]
[61] Chen J, Cheng S, Xie H, Wang L and Xiang T 2018 Phys. Rev. B 97 085104
[62] Deng D L, Li X and Das Sarma S 2017 Phys. Rev. X 7 021021
[63] Levine Y, Yakira D, Cohen N and Shashua A 2017 arXiv:1704.01552 [cs.LG]