A NEW TOOL FOR THE INTERROGATION OF MACROMOLECULAR STRUCTURE

Our program BABELPDB allows browsing and interrogating the native and derived structural features of biological macromolecules using data obtained from the Protein Data Bank (PDB). Major features of BABELPDB are: (1) convert from PDB to other formats, (2) add or remove H atoms, (3) strip the crystallization water molecules and (4) separate the -carbons (C ). The coordinates obtained with BABELPDB permit characterizing the presence of hydrogen bonds (H-bonds). The algorithm for detecting H-bonds is implemented in our program TOPO for the theoretical simulation of the molecular shape. An example is given to illustrate the capabilities of the software: the calculation of the fractal dimension of the lysozyme molecule with (1.908) and without (1.920) H atoms. The numbers compare well with reference calculations performed with our version of the GEPOL program and with the results from Pfeifer et al. For proteins, the C skeleton extracted with BABELPDB allows drawing the ribbon image, which determines the secondary structure of proteins.


INTRODUCTION
The three-dimensional (3D) structure of a protein is critical to its function in biological systems.The availability of an increasing number of protein structures facilitated greatly the teaching of protein chemistry.All biochemistry textbooks display selected 3D illustrations of protein structures.
The structural data of proteins and other biomacromolecules are maintained by the Protein Data Bank (PDB), which can be accessed from http://www.rcsb.org/pdbor other mirror sites such as Entrez, http://www.ncbi.nlm.nih.gov/Entrez. 1,2Tsai 3 described classroom applications of a freeware program, WPDB, which compresses the structure files of the PDB into a set of indexed files that can be retrieved, manipulated and analyzed locally. 4,5The 3D structures can be displayed within the program or invoking freeware program RasMol. 6ructure data on biological macromolecules as maintained by PDB are growing at a near exponential rate.The PDB contains ca.70 000 crystalline structures of proteins, nucleic acids and viruses and complexes of these with small molecules.While trends in the price vs. performance of computer hardware make handling of such large amounts of data manageable, at least for the next few years, software strategies for the efficient storage and retrieval of these data are necessary.A number of such strategies were employed for maintenance and querying of macromolecular structure data and fall into three broad categories according to the used storage method: indexed files as in WPDB, relational databases 7,8 and object-oriented databases. 9,10Associated with each storage method are one or more query methods, e.g., SQL 11 and MMQL. 12It is beyond the scope of the present paper to describe the advantages and disadvantages of each approach in detail; for further details see References 13 and 14.
Our program BABELPDB includes subprograms that allows the following options to examine a particular PDB structure: (1) convert from PDB to other formats; (2) add or remove H atoms; (3) strip the water molecules of crystallization and (4) separate C atoms.BABELPDB would seem particularly suited to educational purposes and an example of how it might be used is given.

CHEMICAL DATABANKS
The databanks most used in chemistry are the Brookhaven PDB and the Cambridge Structural Data Bank (CSD).COMPUTATIONAL METHOD BABEL implements a general framework for converting between file formats used for molecular modelling. 17BABEL will read the file types given in Table 3.

babel -m
In the command line input extensive online help is available.The command line input has the following format:

babel [-v] -i<itype> <infile> [keywords] -o<out type> <outfile> [keywords2]
All arguments surrounded by [] are optional.The -v flag is optional and is used to produce verbose output.The -i flag is used to set the input type.The input type codes that are currently supported are collected in Table 5.

CALCULATION RESULTS AND DISCUSSION
With program BABELPDB, the PDB coordinates of several proteins have been converted to Cartesian coordinates and H atoms have been added.9][20][21] The geometric analysis of H-bonds (X-H…Y), observed in crystal structure data retrieved from PDB, reveals lone-pair directionality as well as the H-acceptor separation, the angle sublaid at H atom (H), the angle at the acceptor atom (Y) and the displacement of H atom from a defined plane containing the lone-pair orbitals of the acceptor atom.For the experimentally well-studied enzyme lysozyme, the fractal dimension D has been calculated with and without H atoms. 26,27 The calculation has been performed using X-ray atomic coordinates of lysozyme (2LYM), extracted with BABELPDB. 283][34] Notice that they based their calculation on X-ray data for the positions of the atoms.Therefore, the surface under consideration involved lysozyme conformation in the crystalline state.However, this is presumably no restriction because, for lysozyme, the crystal structure analysis is known to provide an accurate picture of enzymatic action under native conditions.With these data presented in the form of molecular plots, Pfeifer et al. calculated for lysozyme a fractal surface dimension D = 2.17, using the silhouette and section variations of the box method. 35From the preceding discussion it is clear that our results for the lysozyme molecule compare well with Pfeifer et al.'s results, which are free of debate.
The solvent-accessible surface of lysozyme can be compared with a self-avoiding random surface.The fractal dimension results 1.908, on average, corresponding to the short range of distances (1.25-3.5Å).This value can be compared with the fractal dimension corresponding to a 3D self-avoiding random surface.This surface consists of identical rectilinear elements, one after the other and random oriented without attractions or repulsions among its elements and its fractal dimension is 7/3.This surface was proposed as a model of protein molecular surface. 36Notice that, at these short distances, the fractal dimension for lysozyme is lower than that for the random surface (1.883 < 2.333).The corresponding interpretation is that, in the short range of distances, the molecule is more lengthened than a random surface, due to steric repulsion between nearest atoms.Notice also how the idea of repulsive interaction in the range of short distances is translated in a difference in fractal coefficient, in comparison with the case without interactions.
C skeleton extracted from the lysozyme molecule with BABELPDB is shown in Figure 3.This skeleton allows drawing the ribbon image of lysozyme (Figure 4), where the ribbon links C atoms.The ribbon image determines the elements of the secondary structure ( -helix, -sheet, -turn, etc.). 37The regions of helix and sheet are summarized in Table 7.The four helical regions can be distinguished (three in Figure 4, bottom and one in the middle).Three of them are distorted -helices and the other is a 3.0 10 -helix.Lysozyme contains one antiparallel -sheet (Figure 4, right).
babel -imacmod benzene.dat-d -ox benzene.newThe program BABELPDB has been written for computer-based search, retrieval, analysis and display of information from database PDB.Several options are allowed: (1) convert from PDB to other formats; (2) add or remove H atoms. (3) strip the water molecules of crystallization and (4) keep only C atoms.BABELPDB is available from the author (Francisco.Torrens@uv.es).

Figure 1 .
Figure 1.Lysozyme molecule after H atoms have been added with program BABELPDB.See a number of water molecules around the enzyme.

Figure 2 .
Figure 2. Lysozyme molecule after the water molecules have been stripped.

2 . 3 .
The coordinates obtained with BABELPDB have allowed characterizing the presence of H-bonds.The algorithm for detecting H-bonds has been implemented in our program TOPO for the theoretical simulation of molecular shape.The fractal dimension of lysozyme has been calculated with and without H atoms.The figures compare well with reference calculations carried out with our version of program GEPOL and with results from Pfeifer et al. 4. For proteins, C skeleton extracted with BABELPDB allows drawing the ribbon image, which determines the secondary structure of proteins.
15,16The PDB is a computer-based archival file for macromolecular structures.It stores in a uniform format atomic coordinates and partial bond connectivities, as derived from crystallographic studies.Text included in each data entry gives pertinent information for the structure at hand (e.g., species from which the molecule has been obtained, resolution of diffraction data, literature citations and specifications of secondary structure).In addition to atomic coordinates and connectivities, PDB stores structure factors and phases, although these latter data are not placed in any uniform format.Input of data to PDB and general maintenance functions are carried out at Brookhaven National Laboratory.All data stored in PDB are available on magnetic tape and ftp for public distribution, from Brookhaven, Tokyo and Cambridge.A master file is maintained at Brookhaven and duplicate copies are stored in Cambridge and Tokyo.The PDB can be accessed from http://www.rcsb.org/pdbor other mirror sites such as Entrez, http://www.ncbi.nlm.nih.gov/Entrez.

Table 3 .
Types of files read by BABEL.

Table 4 .
Types of files written by BABEL.

Table 5 .
Input type codes currently supported by BABEL.

Table 6 .
Output type codes currently supported by BABEL.To delete H atoms from a Macromodel file named benzene.datand output the file as an XYZ file named benzene.newthe user would type: T=30000 in the file mopac.dat the user would enter: babel -imm2out mm2.grf -oai mopac.dat"PM3 GEO-OK T=30000" Notice the use of the double quotes around the keywords.

Table 7 .
The parameters of secondary structure regions in lysozyme.