Chinese Physics Letters, 2022, Vol. 39, No. 5, Article code 050303 Quantum Continual Learning Overcoming Catastrophic Forgetting Wenjie Jiang (蒋文杰)1, Zhide Lu (鲁智徳)1, and Dong-Ling Deng (邓东灵)1,2* Affiliations 1Center for Quantum Information, IIIS, Tsinghua University, Beijing 100084, China 2Shanghai Qi Zhi Institute, Shanghai 200232, China Received 17 March 2022; accepted 13 April 2022; published online 29 April 2022 *Corresponding author. Email: dldeng@tsinghua.edu.cn Citation Text: Jiang W J, Lu Z D, and Deng D L 2022 Chin. Phys. Lett. 39 050303    Abstract Catastrophic forgetting describes the fact that machine learning models will likely forget the knowledge of previously learned tasks after the learning process of a new one. It is a vital problem in the continual learning scenario and recently has attracted tremendous concern across different communities. We explore the catastrophic forgetting phenomena in the context of quantum machine learning. It is found that, similar to those classical learning models based on neural networks, quantum learning systems likewise suffer from such forgetting problem in classification tasks emerging from various application scenes. We show that based on the local geometrical information in the loss function landscape of the trained model, a uniform strategy can be adapted to overcome the forgetting problem in the incremental learning setting. Our results uncover the catastrophic forgetting phenomena in quantum machine learning and offer a practical method to overcome this problem, which opens a new avenue for exploring potential quantum advantages towards continual learning.
cpl-39-5-050303-fig1.png
cpl-39-5-050303-fig2.png
cpl-39-5-050303-fig3.png
DOI:10.1088/0256-307X/39/5/050303 © 2022 Chinese Physics Society Article Text Humans are able to incrementally acquire knowledge and skills from interacting experiences with the real world throughout their lifespan, which is functioned by a rich set of neurophysiological processes and biological mechanisms.[1,2] Likewise, artificially constructed computational systems may also be exposed to continuous streams of data and interactions and are desired to learn information from new experiences as well as preserving previously learned information.[3–5] The ability to sequentially accumulate knowledge over time is referred to as continual learning or lifelong learning.[6] Machine learning algorithms applied to a number of difficult problems have achieved enormous success,[7–12] and they can achieve human-level performance or even outperform human beings on many specific tasks such as playing Atari[7,8] and Go.[9,10] Nevertheless, most machine learning algorithms are designed to capture the solution of a predefined problem and thus can hardly be reused when data of multiple tasks comes progressively. The main issue preventing those learning models from continual learning is the catastrophic forgetting,[13–15] a fact that the performances on those earlier trained tasks abruptly decrease after learning new tasks because of the oblivion of the information gathered from previous data.[13] Catastrophic forgetting is widely believed to be a crucial obstacle for achieving artificial general intelligence with neural networks.[16,17] In recent years, a number of variational quantum machine learning algorithms have been proposed to solve problems coming from the real world[18–21] and some of them have been demonstrated in proof-of-principle experiments to display their possible potential on practical applications.[22–25] Those quantum learning algorithms exploit intrinsic properties underlying quantum computation systems like quantum superposition and quantum entanglement and promise exponential advantages compared with their classical counterparts.[26–32] Despite those potential advantages and growing exciting results, there are still many unexplored aspects of quantum machine learning algorithms, which advocate for sustained research and development.[33–35] In particular, similar to their classical counterparts, most quantum machine learning algorithms are meant to accomplish a predefined task and thus can hardly be generalized to learn multiple tasks sequentially. This prevents quantum learning agents from the ability of continual learning and remains a vital barrier to the accomplishment of quantum artificial general intelligence. In order to address this issue, it is crucial to investigate catastrophic forgetting in quantum machine learning models and to endow our learning agent based on quantum computer systems with the capacity of continual learning. In this Letter, we investigate the forgetting phenomena with a focus on a specific kind of learning models called quantum classifiers (see Fig. 1 for a pictorial illustration). We show that, similar to traditional classifiers based on classical neural networks, quantum classifiers based on variational quantum circuits likewise forget information after new parameter updating. To determine whether our quantum classifiers are affected by the similarities among tasks, we examine two different types of relations between tasks: one in which the tasks are functionally identical but with permuted formats of the input and the other in which the tasks are dissimilar in essential ways. Numerical experiments show that when trained on tasks one by one, both relations above can lead obvious decrease of the model's performances on those previously trained tasks. Furthermore, we interpret this problem more precisely and mitigate its influence via taking advantages of the local geometrical information in the loss function landscape of our adapted learning model, which are also intensively explored in this community.[36–38] Based on our numerical results, we find that in certain situations, catastrophic forgetting in quantum machine learning can be overcome via the correct description of the local information and quantum continual learning is possible.
cpl-39-5-050303-fig1.png
Fig. 1. Illustration of quantum continual learning and the elastic weight consolidation (EWC) strategy adapted to overcome catastrophic forgetting. (a) Illustration of quantum continual learning. Here different tasks are sequentially learned by the quantum classifier and the measurement result of a prefixed qubit denotes the output of this quantum classifier. (b) Geometric picture of the EWC method. The purple surface represents the loss landscape of task $1$ and the green surface represents the loss landscape of task $2$. After finding solution point A for task $1$, retraining our quantum classifier for task $2$ using original training strategy leads to a significant increase in the loss for the previous task, which is shown as the forgetting (blue) path. In contrast, with the EWC method, retraining results in only a mild increase in the loss for task $1$ while achieving a low loss for the current task $2$, as depicted by the EWC (red) path.
Catastrophic Forgetting in Quantum Learning. Quantum classifiers are a common set of variational quantum circuits targeted to accomplish classification tasks[39] and have been explored from many perspectives such as vulnerability and relation to feature space.[40–46] However, there are also many aspects of quantum classifiers that remain unclear and in which catastrophic forgetting is a crucial one. The main purpose of classification tasks is to identify and sort a number of data into several categories by recognizing and extracting meaningful fundamental features underlying data. Many practical problems can be abstracted as classification tasks, such as face recognition and object detection.[47] To illustrate catastrophic forgetting phenomena in quantum machine learning, we use a pre-fixed variational quantum circuit to learn two classification tasks sequentially and observe the performance of our quantum classifier on the first classification task before and after the training process of the second task.
cpl-39-5-050303-fig2.png
Fig. 2. Catastrophic forgetting phenomena for quantum classifiers. (a) Learning curve for the original MNIST images. The blue line stands for the accuracy of the training set during the whole training process and the orange line for the accuracy of the testing set. (b) Forgetting curve for two tasks with large similarity. Here, $\gamma_{\rm old}$ ($\gamma_{\rm new}$) represents the accuracy of the quantum classifier on the old (new) task. (c) Learning curve of time-of-flight (TOF) images for the task of classifying topological phases. (d) Forgetting curve of two dissimilar tasks: the classifications of topological phases and hand-written digits.
To qualitatively understand this kind of forgetting, we directly construct a simple pair of tasks with similar underlying distributions and similar difficulties. One of those tasks is to classify hand-written digit images (0 and 9) randomly sampled from MNIST dataset[48] and the other task is to classify the same images with a pre-fixed pixel-permutation. Because the variational quantum circuit adapted here is generally ordinary and has no architecture exploiting graph information in images, we assume reasonably that a pre-fixed permutation of pixels does not change the original classification task significantly. We carry out extensive numerical simulations with a classical computer. The classification performances are plotted in Figs. 2(a) and 2(b). We firstly train our variational quantum classifier using original MNIST images and after 40 epochs of parameter updating, the accuracy of our classifier on target task is high (larger than 95%), which indicates that this circuit actually learns how to classify hand-written digit images. Then, we use the trained classifier to learn how to distinguish those permuted hand-written digit images. After several rounds of parameter updating, the accuracy of our classifier on the new task is also larger than 95% even when their pixels are not aligned in normal order. Unfortunately, with the growth of the accuracy of our quantum classifier on this on-going new classification task, the accuracy on the previous task is getting worse and worse, indicating that the pre-learned information is being forgotten when our classifier is learning new information. After the training process with permuted images is finished, it is evident that information learned by the quantum classifier is almost refreshed and little knowledge from the previous task is left. This implies that catastrophic forgetting extends to quantum classifiers, even for learning tasks that share similar underlying structures. To further illustrate the forgetting problem for quantum classifiers, we also consider the scenario of learning intrinsically dissimilar tasks. We choose two classification tasks emerging from disjoint areas in order to make sure they are not significantly related to each other. One of them is to classify topological phases of matter and the other remains as classifying MNIST hand-written digit images. Classifying different phases of matter is one of the central problems in modern condensed matter physics and many machine learning algorithms are proposed to deal with this task,[29,31,39] giving rise to a disciplinary research frontier connecting both machine learning and condensed matter physics. Here, we focus on classifying topological phases which are widely considered to be more challenging than that for the conventional symmetry-broken phases. We consider a two-dimensional (2D) square-lattice model for quantum anomalous Hall effect with following Hamiltonian:[49] $$\begin{align} H_{\mathrm{QAH}} ={}& J_x \sum_{\boldsymbol{r}}(c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{r}+\hat{x} \downarrow}-c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{r}-\hat{x} \downarrow})+\mathrm{h.c.} \\ &+ i J_y \sum_{\boldsymbol{r}}(c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{r}+\hat{y} \downarrow}-c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{r}-\hat{y} \downarrow})+\mathrm{h.c.} \\ &- t \sum_{\langle\boldsymbol{r}, \boldsymbol{s}\rangle}(c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{s} \uparrow}-c_{\boldsymbol{r} \downarrow}^† c_{\boldsymbol{s} \downarrow})\\ &+\mu \sum_{\boldsymbol{r}}(c_{\boldsymbol{r} \uparrow}^† c_{\boldsymbol{r} \uparrow}-c_{\boldsymbol{r} \downarrow}^† c_{\boldsymbol{r} \downarrow}),~~ \tag {1} \end{align} $$ where $c_{\boldsymbol{r} \sigma}^†\,\left(c_{\boldsymbol{r} \sigma}\right)$ is the fermionic creation (annihilation) operator with pseudo-spin $\sigma=(\uparrow, \downarrow)$ at site $\boldsymbol{r},$ and $\hat{x}, \hat{y}$ are unit lattice vectors along the $x, y$ directions. $J_x$ and $J_y$ are parameters characterizing the spin-orbit coupling strength along the $x$ and $y$ directions. The third and fourth terms describe the spin-conserved nearest-neighbor hopping (with strength $t$) and the on-site Zeeman interaction (with strength $\mu$), respectively. In the momentum space, this Hamiltonian has two Bloch bands and their topological properties can be diagnosed using the first Chern number. In our numerical simulations, we first train a binary quantum classifier to classify the time-of-flight images obtained from the two distinct topological quantum states respectively, and then use the trained classifier to learn how to identify hand-written digits. Our results are plotted in Figs. 2(c) and 2(d), from which the catastrophic forgetting problem is manifested. We observe that, although the learning curve of time-of-flight images is not very smooth, the accuracy is reasonably high (larger than 95%). However, during the learning process of classifying MNIST images, the accuracy for the previous task has an abrupt decrease with the growth of the accuracy for the new task. At the end, the performance of the quantum classifier on learning topological phases becomes poor. Strategy Overcoming Catastrophic Forgetting. Artificial general intelligence requires not only powerful representation architectures to produce complicated probability distributions, but also needs to preserve those experiences encountered before in order to imitate natural creatures. Quantum machine learning promises a potentially exponentially enlarged representation space to embed real-life distributions.[18,27,39] Nevertheless, the above results show the undesirable fact that catastrophic forgetting phenomena occur commonly in variational quantum classifiers and thus continual learning in the quantum machine learning domain cannot be gained for free. To overcome catastrophic forgetting is inevitable in the way towards quantum artificial general intelligence. Recent research suggests that avoiding forgetting when animals are learning new tasks is related to the protection of some specific excitatory synapses strengthened by history experiences.[50] Adapting similar philosophy, we can presume that some parameters in variational quantum circuits are more important than others and should be protected carefully in following learning processes. This inspires a successful method overcoming forgetting phenomena in classical neural networks, i.e., the elastic weight consolidation method (EWC).[51] The spirit of this method is inherited in our work, and the method is generalized to overcome catastrophic forgetting in quantum classifiers as well. We mention that this method is experimentally friendly for near-term quantum devices. From a high-level perspective, learning to assign different labels to data is a searching process in parameter space by updating parameters in the variational quantum circuit in order to optimize performance on training data. Usually, a machine learning algorithm predefines a loss function assessing how good the performance of current parameters is, and minimizes this loss function via some carefully-designed optimizers such as stochastic gradient descent[52] and adaptive moment estimation.[53] This optimization procedure in the learning process is usually implemented on a high dimensional manifold and thus is highly nontrivial.[54] Despite the difficulty for achieving the global minimum, finding a satisfactory local minimum in loss function landscape is likely accomplished in practice. Furthermore, for neural network based classifiers, there typically exist multiple local minimum points connected by simple curves,[55] which form a substantially rich set of possible solutions for the target task. Thus, the aim of the training process is to achieve a satisfactory local minimum in the predefined loss function landscape. Let us use the two-task (tasks A and B) scenario as an example to discuss the EWC method adapted here to overcome the catastrophic forgetting problem in quantum machine learning. From the maximum likelihood estimation in statistical learning,[56] we should maximize the likelihood function $p(\boldsymbol{\theta}|\varSigma)$ of a model characterized by parameter $\boldsymbol{\theta}$ conditioned on the joint dataset $\varSigma=\varSigma_{\rm A}+\varSigma_{\rm B}$, where $\varSigma_{\rm A}$ and $\varSigma_{\rm B}$ are datasets for the task A and B, respectively. This likelihood function can be computed from the probabilities of given datasets by using Bayes' rule under the assumption that tasks A and B are independent of each other (it might be slightly violated for classifying the original and permuted MNIST images we discuss in the Supplemental Material): $$ \log p(\boldsymbol{\theta}|\varSigma)=\log p(\varSigma_{\rm B}|\boldsymbol{\theta})+\log p(\boldsymbol{\theta}|\varSigma_{\rm A})-\log p(\varSigma_{\rm B}).~~ \tag {2} $$ Expanding the second term $\log p(\boldsymbol{\theta}|\varSigma_{\rm A})$ around the local minimum $\boldsymbol{\theta}^*_{\rm A}$ for the task A, we have the following expression using Hessian matrix ${H}_{\boldsymbol{\theta}^*_{\rm A}}$ with high order terms neglected: $$ \log p(\boldsymbol{\theta}|\varSigma_{\rm A})=\log p(\boldsymbol{\theta}^*_{\rm A}|\varSigma_{\rm A})+\frac{1}{2}(\boldsymbol{\theta}-\boldsymbol{\theta}^*_{\rm A})^T{H}_{\boldsymbol{\theta}^*_{\rm A}}(\boldsymbol{\theta}-\boldsymbol{\theta}^*_{\rm A}).~~ \tag {3} $$ The Hessian matrix ${H}_{\boldsymbol{\theta}^*_{\rm A}}$ is equal to the minus of the Fisher information matrix $F$ under specific regularity conditions,[57] which is an important concept in statistical learning[57–59] and has been introduced to the quantum domain recently.[60,61] This can be evaluated using the gradients of the loss function of those training samples at the optimal point for previous task, and the gradients can be obtained directly through measuring some observables in experiment[62] (see the Supplemental Material for more details). We thus approximate this posterior probability as a Gaussian distribution with the mean value given by $\boldsymbol{\theta}^*_{\rm A}$ and the diagonal precision matrix given by the diagonal elements of $F$, and rewrite the loss function of the task B as $$ \mathcal{L}(\boldsymbol{\theta}) = \mathcal{L}_{\rm B}(\boldsymbol{\theta}) + \lambda\sum_i F_{i}\cdot(\theta_i-\theta_{_{\scriptstyle A,i}}^{*})^2.~~ \tag {4} $$ Here, $\mathcal{L}_{\rm B}(\boldsymbol{\theta})$ is the original loss function for the second task B, $F_i$ is the $i$th diagonal element of the Fisher information matrix at the optimal point $\boldsymbol{\theta}^*_{\rm A}$ for the previous task A, and $\lambda$ is a hyper-parameter controlling the strength of this EWC restriction. We refer to the Supplemental Material for more details. We can interpret this method more intuitively from a geometrical perspective, as illustrated in Fig. 1(b). The target of continual learning is to learn an adequate performance on the new task B with no significant decrease of the performance on the previous task A. Based on this consideration, we can add a regularization to the original loss function when training on task B to punish the deviation from the obtained optimal solution of task A according to the importance of each parameter. To qualitatively evaluate the importance of different parameters in the quantum classifier, the Fisher information matrix of our trained quantum classifier is computed, and its diagonal elements are used as weights of the penalties for the changes of different parameters. Under some mild regularity conditions, the Fisher information matrix characters the corresponding Hessian matrix, which describes the local curvature of the loss function landscape. Informally speaking, the gradient of each parameter nearly vanishes at the local minimum point in the loss function landscape of quantum classifiers, and the diagonal elements of corresponding Hessian matrix indicate the local curvatures along different directions at this local minimum point. Those curvatures explicitly suggest the significance of different parameters: a large curvature means that the loss function value increases significantly even with a small shift of the corresponding parameter, and a small curvature means that the loss function value changes relatively mildly if the corresponding parameter is shifted a little bit. Thus, this regularity term forces those parameters in quantum classifier to update near the optimal solution of the previous task and punishes the alteration according to its local curvatures. We remark that this interpretation can help us extend this method to a more general scenario in a straightforward manner. In order to continually learn more than two tasks, we can simply compute the diagonal elements of the Fisher information matrix at the solution point of each task after finishing the corresponding training process and add a new regularization term according to those values to protect the quantum classifier's performance on the corresponding task when learning following tasks. As a result, the total loss will become very large if the current parameters are far away from those obtained optimal values for previous tasks. Consequently, minimizing the regulated loss function could achieve not only a decent accuracy for the new task, but also maintain a favorable performance on the previous ones. Numerical Experiments. To benchmark the EWC method for quantum classifiers, we use the incremental learning setting above and adapt the similar learning settings on two pairs of classification tasks with different relations. The first pair of tasks is to classify MNIST digit images and their randomly pixel-permuted ones. We plot the continual learning result in Fig. 3(a) and compare it with the forgetting result we discuss above. The upper panel of Fig. 3(a) shows the full learning process of classifying original MNIST images with and without the EWC method respectively. During the second training phase targeting at classifying pixel-permuted MNIST images, the EWC method preserves the high accuracy of our quantum classifier on the previous task where evident performance reduction is avoided. Meanwhile, the accuracy on the current task grows to the similar level as that of the quantum classifier trained without the regularization, as shown in the lower panel of Fig. 3(a). The second pair of tasks involves classifying time-of-flight images and hand-written digit images, as discussed above. As for this pair of dissimilar classification tasks, our numerical results are plotted in Fig. 3(b), where an analogous performance-preserved behavior is clearly observed. We further test the EWC method on three dissimilar classification tasks emerging from different fields, and observe their learning curves in different training phases. In addition to classifying time-of-flight images and MNIST hand-written digits, we now add a third task, i.e., classifying the symmetry protected topological phases. We consider the following Hamiltonian:[63] $$ H(h)=-\sum_i\sigma^x_{i}\sigma^z_{i+1}\sigma^x_{i+2} +h\sum_i\sigma^y_{i}\sigma^y_{i+1},~~ \tag {5} $$ where $\sigma^{x,y,z}_i$ are the Pauli matrices acting on the $i$th spin and $h$ is a parameter describing the strength of the nearest-neighbor interaction. This Hamiltonian is exactly solvable and carries two well-studied quantum phases: one is the $\mathbb{Z}_2\times \mathbb{Z}_2$ symmetry protected phase characterized by a nonzero string order for $h < 1$, and the other is an antiferromagnetic phase with long-range order for $h>1$. A quantum phase transition between these two phases occurs at $h=1$. Our results for learning three tasks sequentially with and without the EWC strategy are plotted in Fig. 3(c), from which the effectiveness of the EWC method is clearly manifested. Without adaption of the EWC strategy, the performance of the quantum classifier on classifying time-of-flight images and hand-written digits decreases notably as we train the classifier for the third task. In contrast, after the adaption of the EWC method the quantum classifier will maintain a reasonably good performance even at the end of the training process. We stress that those three tasks are coming from three distinct research areas, and thus should share no significant underlying structure. Even in this situation, the proposed EWC strategy can still overcome catastrophic forgetting, which provides a possible way to achieve quantum continual learning in the future.
cpl-39-5-050303-fig3.png
Fig. 3. Performance benchmarking for the EWC strategy. (a) Learning curves of two similar tasks: classifying the original and pixel-permuted MNIST images. Blue lines plot the accuracies for the two tasks respectively trained without the EWC strategy, whereas orange lines show the corresponding results with using the EWC strategy. (b) Learning curves of two dissimilar tasks: classifying time-of-flight images from different topological phases and classifying the original MNIST hand-written images. (c) Learning curves of three dissimilar tasks: classifying time-of-flight images, the original MNIST images, and the quantum states from different symmetry protected topological (SPT) phases.
Conclusion and Outlook. We have investigated the catastrophic forgetting phenomena in the emergent interdisciplinary field of quantum machine learning. In particular, we present that the catastrophic forgetting problem shows up commonly in quantum learning as well, and this problem could be overcome through the application of the EWC method emergent from classical machine learning community. We remark that the prior condition of the EWC method is that the chosen learning models can perform well for different tasks simultaneously, and the target of this method is to find a possible public solution. For concreteness, we carry out extensive numerical simulations involving a diverse spectrum of learning tasks, such as identifying real-life handwritten digit images, classifying time-of-flight images routinely obtained from cold-atom experiments, and classifying quantum data for different symmetry protected topological phases. Our results not only reveal the notable catastrophic forgetting problem for quantum learning systems, but also apply an intriguing method based on Fisher information, which is inherited from classical machine learning community, to overcome this problem. This work represents only a preliminary step in the direction of quantum continual learning. Many important questions remain unexplored and deserve further investigations. First, in this work we have only considered the case of supervised learning. How to extend our results to unsupervised and reinforcement learning remains unclear. We remark that such an extension may highly nontrivial give the fact that, in this scenario, less or no priori knowledge will be available, or sometimes even the task boundaries are poorly defined.[64] Second, quantum machine learning holds the intriguing potential of exhibiting exponential advantages.[26–32,65–67] For instance, it is believed that there are potential advantages of variational quantum circuits for representing complex distributions. From a high-level perspective, sequentially learning several tasks requires the learning model to represent a complex joint distribution. Thus, the potential advantages in the representation power of the variational quantum circuits also lead to the potential advantages in the continual learning scenario. In addition, it is proved that the learnability of unitary processes can be improved by entangled datasets, which implies that quantum entanglement is valuable in quantum machine learning.[68] Such advantages may also be utilized when we adapt variational quantum circuits to deal with quantum datasets. It is of great significance to understand the potential advantages of quantum entanglement in quantum continual learning scenario. Yet, these advantages have only been explored in the context of a predefined learning task. In the future, it would be interesting and important to investigate how to unambiguously demonstrate quantum advantages in the continual learning scenario. In addition, recently quantum learning systems have been shown to be notably vulnerable to carefully crafted adversarial examples and perturbations.[40–42] Along this line, it would be interesting and important to explore how quantum continual learning behaves under different adversarial settings. In particular, it would be of both theoretical and practical importance to study whether there exist universal perturbations that could deceive the quantum continual learning system for all the sequential tasks. Finally, an experimental demonstration of quantum continual learning, especially with quantum advantages, should be a crucial step toward the long-term holy grail of achieving quantum artificial general intelligence. Acknowledgments. We thank Peixin Shen and Weikang Li for helpful discussions. This work was supported by the Start-Up Fund from Tsinghua University (Grant No. 53330300320), the National Natural Science Foundation of China (Grant No. 12075128), and the Shanghai Qi Zhi Institute.
References Multisensory Processes: A Balancing Act across the LifespanThe temporal paradox of Hebbian learning and homeostatic plasticityUniversal Intelligence: A Definition of Machine IntelligenceAn Approximation of the Universal Intelligence MeasureMeasuring universal intelligence: Towards an anytime intelligence testContinual lifelong learning with neural networks: A reviewPlaying Atari with Deep Reinforcement LearningHuman-level control through deep reinforcement learningMastering the game of Go with deep neural networks and tree searchMastering the game of Go without human knowledgeImageNet classification with deep convolutional neural networksA general reinforcement learning algorithm that masters chess, shogi, and Go through self-playPsychology of Learning and MotivationCatastrophic Forgetting, Rehearsal and PseudorehearsalCatastrophic forgetting in connectionist networksAn Empirical Investigation of Catastrophic Forgetting in Gradient-Based Neural NetworksMeasuring Catastrophic Forgetting in Neural NetworksQuantum algorithms for supervised and unsupervised machine learningQuantum Generative Adversarial LearningQuantum Boltzmann MachineQuantum convolutional neural networksBasic protocols in quantum reinforcement learning with superconducting circuitsImplementable Quantum Classifier for Nonlinear DataQuantum generative adversarial learning in a superconducting quantum circuitExperimental quantum speed-up in reinforcement learning agentsQuantum discriminant analysis for dimensionality reduction and classificationQuantum machine learningA quantum machine learning algorithm based on generative modelsMachine learning meets quantum physicsRead the fine printMachine learning and the physical sciencesA rigorous and robust quantum speed-up in supervised machine learningQuantum Computer Systems for Scientific DiscoveryDevelopment of Quantum Interconnects (QuICs) for Next-Generation Information TechnologiesQuantum Simulators: Architectures and OpportunitiesLearnability of Quantum Neural NetworksStochastic gradient descent for hybrid quantum-classical optimizationMachine learning & artificial intelligence in the quantum domain: a review of recent progressQuantum adversarial machine learningVulnerability of quantum classification to adversarial perturbationsUniversal Adversarial Examples and Perturbations for Quantum ClassifiersQuantum Machine Learning in Feature Hilbert SpacesHierarchical quantum classifiersQuantum classifier with tailored quantum kernelQuantum noise protects quantum classifiers against adversariesExperimental Observation of the Quantum Anomalous Hall Effect in a Magnetic Topological InsulatorStably maintained dendritic spines are associated with lifelong memoriesOvercoming catastrophic forgetting in neural networksLecture Notes in Computer ScienceAdam: A Method for Stochastic OptimizationMaximum likelihood estimation using the empirical fisher information matrixA Tutorial on Fisher InformationLimitations of the Empirical Fisher Approximation for Natural Gradient DescentINTRODUCTION TO QUANTUM FISHER INFORMATIONQuantum Fisher information matrix and multiparameter estimationQuantum generative adversarial networks with multiple superconducting qubitsStatistical mechanics of the cluster Ising modelContinual Unsupervised Representation LearningExpressive power of parametrized quantum circuitsInformation-Theoretic Bounds on Quantum Advantage in Machine LearningProvably efficient machine learning for quantum many-body problemsReformulation of the No-Free-Lunch Theorem for Entangled Datasets
[1] Murray M M, Lewkowicz D J, Amedi A, and Wallace M T 2016 Trends Neurosci. 39 567
[2] Zenke F, Gerstner W, and Ganguli S 2017 Curr. Opin. Neurobiol. 43 166
[3] Legg S and Hutter M 2007 Minds & Machines 17 391
[4] Legg S and Veness J 2011 arXiv:1109.5951 [cs.AI]
[5] Hernández-Orallo J and Dowe D L 2010 Artif. Intell. 174 1508
[6] Parisi G I, Kemker R, Part J L, Kanan C, and Wermter S 2019 Neural Networks 113 54
[7] Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, and Riedmiller M 2013 arXiv:1312.5602 [cs.LG]
[8] Mnih V, Kavukcuoglu K, Silver D et al. 2015 Nature 518 529
[9] Silver D, Huang A, Maddison C J et al. 2016 Nature 529 484
[10] Silver D, Schrittwieser J, Simonyan K et al. 2017 Nature 550 354
[11] Krizhevsky A, Sutskever I, and Hinton G E 2017 Commun. ACM 60 84
[12] Silver D, Hubert T, Schrittwieser J et al. 2018 Science 362 1140
[13] McCloskey M and Cohen N J 1989 Psychology of Learning and Motivation (San Diego: Academic Press) vol 24 p 109
[14] Robins A 1995 Connect. Sci. 7 123
[15] French R M 1999 Trends Cognit. Sci. 3 128
[16] Goodfellow I J, Mirza M, Xiao D, Courville A, and Bengio Y 2013 arXiv:1312.6211 [stat.ML]
[17] Kemker R, McClure M, Abitino A, Hayes T, and Kanan C 2017 arXiv:1708.02072 [cs.AI]
[18] Lloyd S, Mohseni M, and Rebentrost P 2013 arXiv: 1307.0411 [quant-ph]
[19] Lloyd S and Weedbrook C 2018 Phys. Rev. Lett. 121 040502
[20] Amin M H, Andriyash E, Rolfe J, Kulchytskyy B, and Melko R 2018 Phys. Rev. X 8 021050
[21] Cong I, Choi S, and Lukin M D 2019 Nat. Phys. 15 1273
[22] Lamata L 2017 Sci. Rep. 7 1609
[23] Du Y, Hsieh M H, Liu T, and Tao D 2018 arXiv:1809.06056 [quant-ph]
[24] Hu L, Wu S H, Cai W et al. 2019 Sci. Adv. 5 eaav2761
[25] Saggio V, Asenbeck B E, Hamann A et al. 2021 Nature 591 229
[26] Cong I and Duan L 2016 New J. Phys. 18 073011
[27] Biamonte J, Wittek P, Pancotti N, Rebentrost P, Wiebe N, and Lloyd S 2017 Nature 549 195
[28] Gao X, Zhang Z Y, and Duan L M 2018 Sci. Adv. 4 eaat9004
[29] Sarma S D, Deng D L, and Duan L M 2019 Phys. Today 72 48
[30] Aaronson S 2015 Nat. Phys. 11 291
[31] Carleo G, Cirac I, Cranmer K, Daudet L, Schuld M, Tishby N, Vogt-Maranto L, and Zdeborová L 2019 Rev. Mod. Phys. 91 045002
[32] Liu Y, Arunachalam S, and Temme K 2021 Nat. Phys. 17 1013
[33] Alexeev Y, Bacon D, Brown K R et al. 2021 PRX Quantum 2 017001
[34] Awschalom D, Berggren K K, Bernien H et al. 2021 PRX Quantum 2 017002
[35] Altman E, Brown K R, Carleo G et al. 2021 PRX Quantum 2 017003
[36] Du Y, Hsieh M H, Liu T, You S, and Tao D 2021 PRX Quantum 2 040337
[37] Sweke R, Wilde F, Meyer J, Schuld M, Faehrmann P K, Meynard-Piganeau B, and Eisert J 2020 Quantum 4 314
[38]You X and Wu X 2021 Proceedings of the 38th International Conference on Machine Learning (PMLR) vol 139 pp 12144–12155
[39] Dunjko V and Briegel H J 2018 Rep. Prog. Phys. 81 074001
[40] Lu S, Duan L M, and Deng D L 2020 Phys. Rev. Res. 2 033212
[41] Liu N and Wittek P 2020 Phys. Rev. A 101 062331
[42] Gong W and Deng D L 2021 arXiv:2102.07788 [quant-ph]
[43] Schuld M and Killoran N 2019 Phys. Rev. Lett. 122 040504
[44] Grant E, Benedetti M, Cao S, Hallam A, Lockhart J, Stojevic V, Green A G, and Severini S 2018 npj Quantum Inf. 4 65
[45] Blank C, Park D K, Rhee J K K, and Petruccione F 2020 npj Quantum Inf. 6 41
[46] Du Y X, Hsieh M H, Liu T L, Tao D C, and Liu N N 2021 Phys. Rev. Res. 3 023153
[47]Russell S and Norvig P 2020 Artificial Intelligence: A Modern Approach (Pearson)
[48]LeCun Y, Cortes C, and Burges C 1998 MNIST Handwritten Digit Database
[49] Chang C Z, Zhang J, Feng X et al. 2013 Science 340 167
[50] Yang G, Pan F, and Gan W B 2009 Nature 462 920
[51] Kirkpatrick J, Pascanu R, Rabinowitz N et al. 2016 arXiv:1612.00796 [cs.LG]
[52] Bottou L 2004 Advanced Lectures on Machine Learning: ML Summer Schools, Canberra, Australia, 2–14 February 2003, Revised Lectures, Tübingen, Germany, 4–16 August 2003 (Berlin: Springer) p 146
[53] Kingma D P and Ba J 2014 arXiv:1412.6980 [cs.LG]
[54]Goodfellow I, Bengio Y, and Courville A 2016 Deep Learning (Cambridge: MIT Press)
[55]Garipov T, Izmailov P, Podoprikhin D, Vetrov D P, and Wilson A G 2018 Advances in Neural Information Processing Systems (Curran Associates, Inc.)
[56] Scott W A 2002 J. Stat. Comput. Simul. 72 599
[57] Ly A, Marsman M, Verhagen J, Grasman R, and Wagenmakers E J 2017 arXiv:1705.01064 [math.ST]
[58] Kunstner F, Balles L, and Hennig P 2019 arXiv:1905.12558 [cs.LG]
[59]Frieden B R 1998 Physics from Fisher Information: A Unification (Cambridge: Cambridge University Press)
[60] Petz D and Ghinea C 2011 Quantum Probab. Relat. Top. 27 261
[61] Liu J, Yuan H, Lu X M, and Wang X 2019 J. Phys. A 53 023001
[62] Huang K, Wang Z A, Song C et al. 2021 npj Quantum Inf. 7 165
[63] Smacchia P, Amico L, Facchi P, Fazio R, Florio G, Pascazio S, and Vedral V 2011 Phys. Rev. A 84 022304
[64] Rao D, Visin F, Rusu A A, Teh Y W, Pascanu R, and Hadsell R 2019 arXiv:1910.14481 [cs.LG]
[65] Du Y, Hsieh M H, Liu T, and Tao D 2020 Phys. Rev. Res. 2 033125
[66] Huang H Y, Kueng R, and Preskill J 2021 Phys. Rev. Lett. 126 190505
[67] Huang H Y, Kueng R, Torlai G, Albert V V, and Preskill J 2021 arXiv:2106.12627 [quant-ph]
[68] Sharma K, Cerezo M, Holmes Z, Cincio L, Sornborger A, and Coles P J 2022 Phys. Rev. Lett. 128 070501