Chinese Physics Letters, 2020, Vol. 37, No. 6, Article code 068701 Characterization of the Topological Features of Catalytic Sites in Protein Coevolution Networks * Xiu-Lian Xu (徐秀莲)**, Jin-Xuan Shi (史瑾璇) Affiliations School of Physics Science and Technology, Yangzhou University, Yangzhou 225002, China Received 6 February 2020, online 26 May 2020 *Supported by the National Natural Science Foundation of China (Grant No. 11305139).
**Corresponding author. Email: xuxl@yzu.edu.cn
Citation Text: Xu X L and Shi J X 2020 Chin. Phys. Lett. 37 068701    Abstract The knowledge of sequence and structural properties of residues in the catalytic sites of enzymes is important for understanding the physiochemical basis of enzymatic catalysis. We reveal new features of the catalytic sites by analyzing the coevolutionary behavior of amino acid sequences. By performing direct coupling analysis of the sequences of homologous proteins, we construct the coevolution networks at the residue level. Based on the analysis of the topological features of the coevolution networks for a dataset including 20 enzymes, we show that there is significant correlation between the catalytic sites and topological features of protein coevolution networks. Residues at the catalytic center often correspond to the nodes with high values of centralities in the networks as characterized by the degree, betweenness, closeness, and Laplacian centrality. The results of this work provide a possible way to extract key coevolutionary information from the sequences of enzymes, which is useful in the prediction of catalytic sites of enzymes. DOI:10.1088/0256-307X/37/6/068701 PACS:87.15.-v, 87.14.E-, 89.75.Fb, 89.75.-k © 2020 Chinese Physics Society Article Text Revealing the sequence-structure-function relationship of proteins is one of the central topics in the field of molecular biophysics.[1] Enzymatic catalysis represents the most typical biological functions of proteins.[2] With the help of enzymes, the otherwise very slow biochemical reactions in cells can occur within biologically relevant timescales. Therefore, enzymes are not only important for regulating biological processes in cells, but also can be used as effective tools to control and modify biological systems.[3] In addition, enzymes are often the targets of drug design in treating certain diseases.[4] Consequently, understanding the physiochemical basis of enzymatic catalysis attracts much attention from the fields of physics, chemistry, and biology.[5–12] In previous structural and biochemical studies, the three-dimensional structures for a number of enzymes have been resolved and the catalytic sites have been identified.[13] Meanwhile, the catalytic kinetics under various conditions have been well characterized for many enzymes.[2] In recent years, the single-molecule techniques have been widely used to directly measure the conformational dynamics of the enzyme molecules during catalysis.[14–16] These studies, together with various computational works, not only provided the structural and physiochemical basis of the enzymatic catalysis, but also revealed the key roles of the enzyme conformational dynamics in the catalytic cycle.[17–21] On the other hand, many works have been devoted to analyzing the features of the residues directly involved in catalysis and to developing methods for predicting the catalytic sites of enzymes of unknown function based on sequence information,[22–26] which can be important since the available sequence data of proteins are much more abundant than the structural data. To meet the functional requirements, the catalytic residues need to satisfy certain mechanical and physiochemical restraints. Therefore the locations of the catalytic cites are largely encoded in the structural and sequence information. Extracting such information has been the effort of many groups.[22–26] For example, in an early study, Thornton and coworkers developed an algorithm to predict the catalytic sites of enzymes by using the neural network method combining the structure and sequence information.[24] In Ref. [25], Sankararaman and coworkers showed that considering the evolutionary information can significantly improve the prediction of the catalytic residues. In Refs. [22,26], using the Gaussian elastic network model (GNM), Yang et al. showed that catalytic sites often locate at the hinge centers with low translational mobility, revealing the coupling between catalytic sites and protein collective dynamics. All these works provided useful ways to extract the key information from the structures and sequences of enzymes in order to identify the residues of the catalytic sites and establish the physicochemical basis of enzyme catalysis. Recently, coevolution information of protein families has been widely used in protein structure prediction, protein design, and protein-protein interactions.[27–32] Coevolution analysis can provide additional information on the traditional multiple sequence alignments as it identifies the correlation of variations of two residues in a protein family. Therefore, it is interesting to investigate whether it is possible to extract the information of the catalytic sites of enzymes based on the coevolution analysis of protein sequences.[25] In this work, we characterize the coevolution properties of the residues in the catalytic center of enzymes using the complex network analysis method, which have been widely used in describing the statistical and dynamic behaviors of many social and biological systems.[33–39] For this purpose, we construct the coevolution networks at the residue level by performing coevolution analysis to the sequences of the homologous proteins for the given enzymes. Statistical analysis to the topological features of the catalytic sites shows that there are significant correlations between the catalytic sites and the topological features of protein coevolution networks. Residues at the catalytic center often correspond to the nodes with high values of centralities as demonstrated by the degree, betweenness, closeness, and Laplacian centralities of the networks.[40] The results of this work can be useful for extracting the coevolutionary information from protein sequences in predicting the catalytic sites of enzymes and for understanding the sequence-structure-function relationship of enzymes. To extract the coevolutionary information, we firstly performed multiple sequence alignment (MSA) for the proteins in a family of the given enzyme.[41] With the alignment data, the pairwise coupling between the residues can be characterized by the mutual information (MI), which is given by ${\rm MI}_{ij}=\sum\nolimits_{A,B} {f_{ij}\left(A,B \right)\ln\left[f_{ij}\left(A,B \right)/(f_{i}\left(A \right) f_{j}\left(B \right)) \right]}$.[42] Here $f_{i}\left(A \right)$ represents the relative frequency of finding an amino acid A in the position $i$ and the $f_{ij}\left(A,B \right)$ is the relative frequency of finding the amino acids A and B in the positions $i$ and $j$ (Fig. 1(a)). However, the correlation between the positions $i$ and $j$ described by the above mutual information may include contributions from direct coupling and indirect coupling. If the residues $i$ and $j$ have direct coupling, the ${\rm MI}_{ij}$ value will be large. On the other hand, if the residues $i$ and $j$ are not directly coupled, but both residues have direct coupling with a third residue $k$, the ${\rm MI}_{ij}$ will also have a large value arising from indirect coupling (Fig. 1(b)).[31,42] To disentangle the indirect coupling effect, direct coupling analysis (DCA) developed in previous work was performed.[42] The direct coupling analysis intends to infer a statistical model of the sequence distribution probability $P(a_{1},a_{2},\cdots,a_{N})$, with $a_{i}$ being the amino acid identity at the position $i$ of the sequences with length $N$. Since the model probability satisfying the observed $f_{i}(A)$ and $f_{ij}(A,B)$ is not unique, the least-constrained model with the maximal Shannon entropy was used,[42] which has the form $P(a_{1},a_{2},\cdots,a_{N})=\frac{1}{Z}\exp(\sum\nolimits_{i < j} {e_{ij}(a_{i},a_{j})} +\sum\limits_i {h_{i}(a_{i})})$. Here the parameters $e_{ij}(A,B)$ is the pairwise coupling, and $h_{i}(A)$ the local bias, which can be inferred by fitting the model to the observed data of $f_{i}(A)$ and $f_{ij}(A,B)$. The $Z$ is the normalization factor. To reduce the computational cost, the mean-field approximation (in which the $\sum\limits_{i < j} {e_{ij}(a_{i},a_{j})}$ is expanded to the linear order) was introduced in calculating the $e_{ij}$. Based on the DCA, we can calculate the direct information (DI), which measures the mutual information due to the direct coupling. According to previous work,[42] the DI was given by ${\rm DI}_{ij}=\sum\nolimits_{A,B} {P_{ij}^{\rm dir}(A,B)\ln[P_{ij}^{\rm dir}(A,B)/(f_{i}(A) f_{j}(B)) ]}$, where $P_{ij}^{\rm dir}(A,B)$ is the isolated two-site model of probability and is given by $P_{ij}^{\rm dir}(A,B)=\frac{1}{Z_{ij}}\exp \{ e_{ij}(A,B)+h_{i}(A)+h_{j}(B) \}$. Details of the calculations of the DI can be found in Refs. [31,42]. In this work, we used the HMMER web server[41] to construct the sequences of the Pfam family[43] for a given enzyme and to perform the MSA. The input sequence was taken from the corresponding PDB files of the studied enzymes, and the default significance $E$-values were used. The DI values were calculated by using the direct coupling analysis code provided in Ref. [42]. The coevolution residue networks were constructed based on the DI values (Fig. 1(c)). In the coevolution residue networks, each residue is represented as one node. If the DI value between the residues $i$ and $j$ is larger than a given threshold, the two nodes are connected by an edge. The threshold was chosen such that all the residues are connected into the network. The corresponding threshold values are listed in Table 1. For each node of the network, we calculated the following topological features, i.e., degree $K$, betweenness $B$, closeness $C$, and the Laplacian centrality $L$. All these well-defined topological features describe the centralities of the given nodes in the network and were calculated by using the software Pajek.[40,44]
cpl-37-6-068701-fig1.png
Fig. 1. (a) Schematic diagram showing the statistical information of the single-site and pairwise occurring probabilities of amino acids used in the coevolution analysis based on the multiple sequence alignment of homologous proteins. (b) Schematic diagram showing the contributions of direct and indirect coupling to the correlation between the residues $i$ and $j$. (c) Coevolution residue network constructed based on the direct information of coevolutionary analysis for the human complement factor $D$ (pdb code: 1bio).[45]
Figures 2(a) and 3(a) show the topological features of the nodes in the coevolution networks for the enzymes "human complement factor D" (HCFD, a serine protease)[45] and "high MW acid phosphatase" (HMAP),[46] respectively. All the residue identities quoted in this work correspond to the residue index in the PDB structures. For the HCFD, there are three residues in the catalytic center, including the H57, D102, and S195.[22] One can see that all the catalytic residues of this enzyme show distinguished topological features (Fig. 2(a)). For example, the H57 and S195 correspond to the two residues with the highest degree $K$. In addition, the betweenness of the D102 is ranked as the fifth highest. For the enzyme HMAP, there are 6 residues in the catalytic sites (R11, H12, R15, R79, H257, and D258). We found that the residues R79 and H257 demonstrated the highest betweenness $B$ and closeness $C$. The degree $K$ and Laplacian centrality $L$ of the R11 are ranked as the third and fourth highest, respectively. To have a more quantitative discussion, we consider the residues as the distinguished residues if one of the topological features is ranked as the top 10. We can see that most of the residues in the catalytic centers of the two enzymes have distinguished topological features. Such results clearly suggest that the catalytic residues are correlated with the topological features of the coevolutionary networks.
cpl-37-6-068701-fig2.png
Fig. 2. (a) Key topological features of the coevolution network as a function of residue index, including degree $K$, betweenness $B$, closeness $C$, and Laplacian centrality $L$ of the HCFD (pdb code: 1bio).[45] Arrows indicate residues in the catalytic center. (b) Mapping of the directly coupled residues onto the three-dimensional structure of the HCFD. The $C_{\alpha}$ atoms of the catalytic residues and their coupled residues are shown by red (H57), blue (D102), and gray (S195). Ten coupling residues with the largest DI values are shown for each catalytic residue.
There are several possible reasons that the catalytic sites show distinguished topological features in the coevolution networks. Firstly, efficient enzymatic reaction often requires a well-organized active center.[47] Therefore, the catalytic residues need to satisfy certain mechanical restraints to pre-organize the catalytic residues, which can be realized by forming local contacts with the surrounding residues. Since the direct coupling strengths from coevolution analysis are often correlated with the physical contacts,[42] formation of local contacts can lead to high values of the centrality measures of the networks. Secondly, the functional sites of proteins can be regulated by distantly positioned residues through allosteric communication.[17,19,48,49] The catalytic residues with high values of global centralities are helpful to sense the allosteric coupling signal, which is propagated through the residue contact network. In addition, the catalytic center corresponds to the most conserved position of the enzymes. As discussed in previous work, conserved residues tend to show stronger coupling in the coevolution,[28] which may also lead to distinguished values of the centrality measures of the catalytic residues. It is worth noting that the same catalytic residue may show distinguished values for some topological features, but not in the other topological features. This is reasonable because the four topological features measure different aspects of the centrality. According to the definitions of the centralities in complex networks,[40] the degree $K$ measures the local centrality and denseness of a given node. However, the betweenness $B$ and closeness $C$ mainly measure the global centrality of a node. In comparison, the Laplacian centrality $L$ is an intermediate measure between the global and local centralities.[40] The observation of the difference of the topological features for different catalytic residues suggests that the restraints to different catalytic residues may have different sources.
cpl-37-6-068701-fig3.png
Fig. 3. (a) Key topological features of the coevolution network as a function of residue index for the HMAP (pdb code: 1rpt).[46] (b) Mapping of the directly coupled residues onto the three-dimensional structure of the HMAP. For clarity, the neighboring residues at the catalytic center (R11& H12 and H257& D258) are indicated by the same color because they share most of the coupling residues.
In Figs. 2(b) and 3(b), we mapped the residues which have strong coupling (high DI values) with the catalytic residues onto the three-dimensional structures of the enzymes HCFD[45] and HMAP.[46] The $C_{\alpha}$ atoms of different catalytic residues (large spheres) were indicated by different colors. The $C_{\alpha}$ atoms of the directly coupled residues (small spheres) were indicated by the same color as that of the corresponding catalytic residue. One can see that the residues with strong coupling to the same catalytic residue often have direct physical contacts with the given catalytic residue, although some coupling residues can be distant, which suggests that direct coupling in the coevolution of the residues around the catalytic site mainly reflects the complementary mutations of the directly interacting residues. In addition, the directly coupled residues of a given catalytic residue tends to co-localize spatially and form cluster-like structure. The crosslinks between the clusters corresponding to different catalytic residues are relatively rare. Such a result is interesting because it may indicate that the motions of the residues at the catalytic site can be regulated by different pathways. The above direct coupling analysis requires that the amino acids at each position have sufficient variations among the sequences of a protein family. The catalytically essential residues often have higher degree of sequence conservation. For the enzymes studied in this work, the probability of observing the variation of the amino acid identity of a given catalytic residue is minor but significant (typically $\sim $10%), which enables a reasonable estimate of the direct coupling values given that the number of sequences in a protein family is large (Table 1). To have a more robust test of the topological features of the catalytic residues, we performed the same analysis for all the 20 enzymes in a dataset given in the Ref. [26]. The enzymes in this dataset covers a wide range of functions and structural subclasses, and the catalytic sites have been well annotated using the following criteria:[13,26] (i) residues directly involved in catalysis; (ii) residues affecting the residues/cofactors directly involved in catalysis; (iii) residues stabilizing the transition state or intermediate of the chemical step; or (iv) residues involved in the activation of the substrate in some way. The results are listed in Table 1. One can see that for 8 of the enzymes in the dataset, all the residues in the catalytic center show distinguished topological features. In other 9 enzymes, 40–85% of the catalytic residues show distinguished topological features. In the remaining 3 enzymes, none of the catalytic residues show distinguished topological features. For more quantitative discussions, we introduced the odds ratio $R=p/p_{0} $,[22] where $p$ gives the probability for a catalytic residue being identified as the distinguished residue. Whereas $p_{0}$ gives the probability for an arbitrary residue being identified as the distinguished residue. One can see that for most of the enzymes discussed in this work, the odds ratios are larger than 1.0, with an average of 8.6. We also calculated the odds ratios for each of the four topological features separately (Table 1), and the averages are 8.6, 6.4, 11.2, and 8.3, respectively, for the centralities $K$, $B$, $C$, and $L$. These high values of the odds ratios indicate that the correlation between the catalytic sites and the topological features of the coevolution networks is significant.
Table 1. Correlation between catalytic sites and residues with distinguished topological features (with underlines). The number of sequences $N_{\rm s}$, threshold of DI values (DI$_{\rm T}$), odds ratios for degree ($R_{\rm K}$), betweenness ($R_{\rm B}$), closeness ($R_{\rm C}$), Laplacian centrality ($R_{\rm L}$), and combination of them ($R$) are also given.
PDB ID Catalytic residues $N_{\rm s}$ DI$_{\rm T}$ $R_{\rm K}$ $R_{\rm B}$ $R_{\rm C}$ $R_{\rm L}$ $R$
1BK9 H48, Y52, D99 2593 0.094 0 0 7.4 0 1.9
1BWP S47, G74, N104, D192, H195 6770 0.022 3.56 0 0 7.12 2.7
1CHD S64, T165, H190, M283, D286 1634 0.031 8.5 0 4.25 8.5 5.3
1RPT R11, H12, R15, R79, H257, D258 3462 0.034 6.38 12.76 19.15 12.76 11.2
1YTW E290, D356, H402, C403, R409, T410 7816 0.040 3.92 7.83 11.75 3.92 6.9
1DNK E78, H134, D212, H252 14480 0.019 18.2 12.15 18.2 18.2 16.7
1BOL H46, E105, H109 6307 0.056 12.5 0 12.5 6.23 7.8
1BVV Y69, E78, E172 615 0.074 0 0 0 0 0
1UOK D199, E255, D329 11570 0.029 0 22.5 22.5 0 11.2
1EUG D64, H187 4283 0.030 7.75 7.75 15.5 7.75 9.7
1BR6 Y80, V81, G121, Y123, E177, R180 1613 0.057 0 0 0 0 0
1A16 D260, D271, H354, H361, E383, E406 64075 0.031 13.93 3.48 13.93 13.93 11.3
1B6  A H231 10314 0.025 20.9 0 20.9 20.9 15.7
1BIO H57, D102, S195 17288 0.045 14.67 14.67 14.67 14.67 14.7
9PAP Q19, C25, H159, N175 20698 0.051 16.43 10.95 21.9 10.95 15.1
1BXO D33, D213 5418 0.035 15.75 15.75 15.75 15.75 15.8
8TLN* E143, H231
1LBA Y46, K128 2359 0.044 9.9 4.95 4.95 9.9 7.4
1BTL S70, K73, S130, E166 12404 0.036 10.1 5.05 10.1 5.05 7.6
1CTT E104 6709 0.048 10.1 10.1 10.1 10.1 10.1
Average 8.6 6.4 11.2 8.3 8.6
$^{\ast}$The catalytic sites cannot be covered by the multiple sequence alignment.
In summary, structural and functional requirements of homologous proteins often impose strong constraints on their sequence variability, which leads to correlations among amino acid compositions at different sequence positions. Therefore, such coevolutionary information of the sequences in a protein family can provide useful knowledge on protein structure and functions. In this work, we have characterized the relationship between the locations of the catalytic sites of enzymes and the topological features of the coevolution networks. For the enzymes we analyzed, the nodes corresponding to the catalytic sites mostly show distinguished topological features, including the degree, betweenness, closeness, and Laplacian centrality. These results provide a new strategy to extract the coevolutionary information to identify the functionally important sites of proteins without knowing the three-dimensional structures. The authors thank Wei Wang, Jun Wang, Wenfei Li, and Jianrong Li in Nanjing University for the helpful discussions.
References Genome engineering using the CRISPR-Cas9 systemMechanistic Basis of Enzyme-Targeted DrugsA Perspective on Enzyme CatalysisAdvances in Protein ChemistryDynamical Contributions to Enzyme Catalysis: Critical Tests of A Popular HypothesisComputer Simulations of Enzyme Catalysis: Methods, Progress, and InsightsOvercoming the Bottleneck of the Enzymatic Cycle by Steric FrustrationProbing conformational change of T7 RNA polymerase and DNA complex by solid-state nanoporesMutation-induced spatial differences in neuraminidase structure and sensitivity to neuraminidase inhibitorsRole of substrate-product frustration on enzyme functional dynamicsThe Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural dataSizing up single-molecule enzymatic conformational dynamicsSubnanometre enzyme mechanics probed by single-molecule force spectroscopyIlluminating the mechanistic roles of enzyme conformational dynamicsProtein Allostery and Conformational DynamicsRelating Protein Motion to CatalysisEnzyme dynamics point to stepwise conformational selection in catalysisA hierarchy of timescales in protein dynamics is linked to enzyme catalysisFrustration, specific sequence dependence, and nonlinearity in large-amplitude fluctuations of allosteric proteinsCoupling between Catalytic Site and Collective Dynamics: A Requirement for Mechanochemical Activity of EnzymesProtein structure based prediction of catalytic residuesAnalysis of Catalytic Residues in Enzyme Active SitesActive site prediction using evolutionary and structural informationiGNM: a database of protein functional motions based on Gaussian Network ModelEvolutionarily Conserved Pathways of Energetic Connectivity in Protein FamiliesEvolutionarily conserved networks of residues mediate allosteric communication in proteinsProtein structure prediction from sequence variationAmino acid coevolution reveals three-dimensional structure and functional domains of insect odorant receptorsIdentification of direct residue contacts in protein-protein interaction by message passingCo-Evolutionary Fitness Landscapes for Sequence DesignIt's a small worldNetworks of Dynamic Allostery Regulate Enzyme FunctionA Coevolutionary Residue Network at the Site of a Functionally Important Conformational Change in a Phosphohexomutase Enzyme FamilyEpidemic Diffusion on Complex NetworksDetrended Fluctuation Analysis of Traffic DataNetwork Analysis Reveals the Recognition Mechanism for Dimer Formation of Bulb-type LectinsA Collaboration Network Model with Multiple Evolving FactorsHMMER web server: interactive sequence similarity searchingDirect-coupling analysis of residue coevolution captures native contacts across many protein familiesThe Pfam protein families database in 2019Structures of native and complexed complement factor D: implications of the atypical his57 conformation and self-inhibitory loop in the regulation of specific serine protease activityCrystal structures of rat acid phosphatase complexed with the transition-state analogs vanadate and molybdate. Implications for the reaction mechanismEnzyme millisecond conformational dynamics do not catalyze the chemical stepProtein topology and allosteryEnergy landscape views for interplays among folding, binding, and allostery of calmodulin domains
[1]Finkelstein M B and Ptitsyn O 2002 Protein Physics: A Course of Lectures (London: Academic Press)
[2]Segel I H 1993 Enzyme Kinetics (New York: Wiley-Interscience)
[3] Ran F A, Hsu P D, Wright J et al 2013 Nat. Protoc. 8 2281
[4] Robertson J G 2005 Biochemistry 44 5561
[5] Benkovic S J, Hammes-Schiffer S 2003 Science 301 1196
[6] Cui Q and Karplus M 2003 Adv. Protein Chem. 66 315
[7] Olsson W H M, Parson W W and Warshel A 2006 Chem. Rev. 106 1737
[8] Warshel A 2003 Ann. Rev. Biophys. Biomol. Struct. 32 425
[9] Li W, Wang J, Zhang J et al 2019 Phys. Rev. Lett. 122 238102
[10] Tong X, Hu R, Li X et al 2018 Chin. Phys. B 27 118705
[11] Yang Z, Hao D, Che Y et al 2018 Chin. Phys. B 27 018704
[12] Kong J Y, Li J C, Lu J J et al 2019 Phys. Rev. E 100 052409
[13] Porter C T, Barlett G J and Thornton J M 2004 Nucl. Acids Res. 32 D129
[14] Lu H P 2014 Chem. Soc. Rev. 43 1118
[15] Pelz B, Žoldák G, Zeller F et al 2016 Nat. Commun. 7 10848
[16] Hanson J A, Duderstadt K, Watkins L P et al 2007 Proc. Natl. Acad. Sci. USA 104 18055
[17] Guo J and Zhou H X 2016 Chem. Rev. 116 6503
[18] Hammes-Schiffer S and Benkovic S J 2006 Annu. Rev. Biochem. 75 519
[19] Ma B and Nussinov R 2010 Curr. Opin. Chem. Biol. 14 652
[20] Henzler-Wildman K A, Lei M, Thai V et al 2007 Nature 450 913
[21] Li W, Wolynes P G and Takada S 2011 Proc. Natl. Acad. Sci. USA 108 3504
[22] Yang L W and Bahar I 2005 Structure 13 893
[23] Fajardo J E and Fiser A 2013 BMC Bioinform. 14 63
[24] Bartlett G T, Porter C T, Borkakoti N et al 2002 J. Mol. Biol. 324 105
[25] Sankararaman S, Sha F, Kirsch J F et al 2010 Bioinformatics 26 617
[26] Yang L W, Liu X, Jursa C J et al 2005 Bioinformatics 21 2978
[27] Lockless S W and Ranganathan R 1999 Science 286 295
[28] Süel G M, Lockless S W, Wall M A et al 2003 Nat. Struct. Biol. 10 59
[29] Marks D S, Hopf T A and Sander C 2012 Nat. Biotechnol. 30 1072
[30] Hopf T A, Morinaga S, Ihara S et al 2015 Nat. Commun. 6 6077
[31] Weigt M, White R A, Szurmant H et al 2009 Proc. Natl. Acad. Sci. USA 106 67
[32] Tian P, Louis J M, Baber J L et al 2018 Angew. Chem. Int. Ed. 57 5674
[33] Collins J J and Carson C C 1998 Nature 393 409
[34] Holliday M J, Camilloni C, Armstrong G S et al 2017 Structure 25 276
[35] Lee Y, Mick J, Furdui C et al 2012 PLOS ONE 7 e38114
[36] Zhu X Y, Liu Z H and Tang M 2007 Chin. Phys. Lett. 24 1118
[37] Zhu X Y and Liu Z H 2007 Chin. Phys. Lett. 24 2142
[38] Zhao Y, Jian Y, Liu Z et al 2017 Sci. Rep. 7 2876
[39] Xu X L, Liu C P and He D R 2016 Chin. Phys. Lett. 33 048901
[40]Newman M E J 2010 Network (New York: Oxford University Press)
[41] Finn R D, Clements J and Eddy S R 2011 Nucl. Acids Res. 39 W29
[42] Morcos F, Pagnani A, Lunt B et al 2011 Proc. Natl. Acad. Sci. USA 108 E1293
[43] Sara E G, Mistry J, Bateman A et al 2019 Nucl. Acids Res. 47 D427
[44]Batagelj V and Mrvar A 2003 Graph Drawing Software ed Junger M and Mutzel P (Berlin: Springer) p 77
[45] Jing H, Babu Y S, Moore D et al 1998 J. Mol. Biol. 282 1061
[46] Lindqvist Y, Schneider G and Vihko P 1994 Eur. J. Biochem. 221 139
[47] Pisliakov A V, Cao J, Kamerlin S C et al 2009 Proc. Natl. Acad. Sci. USA 106 17359
[48] Xie J and Lai L 2020 Curr. Opin. Struct. Biol. 62 158
[49] Li W F, Wang W and Takada S 2014 Proc. Natl. Acad. Sci. USA 111 10550