The macromolecular structures of mucins can be visualised at several levels.
Subsequent articles in this issue will attest to the variety of information that can be obtained from
physical studies of the overall size and shape of glycoprotein complexes, immunohistochemistry of the
distribution of antigenic and functional epitopes, the structural analysis of individual oligosaccharides
and the molecular biology of the protein sequence.
From the last two can be extrapolated conformational information at the glycopeptide level which we are
beginning to visualise by molecular modeling using computer graphics and incorporating data from NMR.
GLYCOPEPTIDE STRUCTURE
The protein backbone of mucins has regions of high disulphide bonding, N-linked glycopeptides and tandem O-glycosylated repeats. The latter are multiple domains of homologous amino acid sequences having an N-acetylgalactosamine (GalNAc) monosaccharide added to near-neighbour serine (Ser) or threonine (Thr) amino acids. This monosaccharide, defined as the Tn antigen, can be further substituted with other monosaccharides, the most common being galactose (Gal) at C-3 to give the Thomsen-Friedenreich core (T antigen), and sialic acid (N-acetylneuraminic acid in humans) at C-3 of the Gal and/or C-6 of the GalNAc.
Several other core sequences involving Gal, GalNAc and N-acetylglucosamine (GlcNAc) are possible and these
can be lengthened by addition of repeating (Galß1-4GlcNAcß1-3/6)n or
(Galß1-4GlcNAcß1-3)n sequences before oligosaccharide chain termination, in some
cases, by addition of sialic acid or of the blood group H, A and B substitutions. Internal chain
fucosylation (Fuc) can occur at C-3 or C-4 of GlcNAc giving the Lex and
Lea structures, respectively. The conformations of these oligosaccharide motifs have
been characterised extensively by NMR and molecular modeling and gradually NMR studies on glycopeptides
are emerging as described next.
METHODS OF ANALYSIS
NMR is an ideal method for studying molecules up to 15 kDa, which includes small globular proteins, polypeptide domains of larger proteins and oligosaccharides. Studies of glycoproteins are beginning to be reported, most importantly by enriching for NMR-active nuclei: in biological molecules NMR can be used for example to determine the chemical environment of hydrogen (1H or proton), 13C, and 15N and 31p, i.e. nuclei which have magnetic spin and will therefore align in the magnetic field applied by the instrumentation, and on relaxation emit detectable energy. 13C and 15N only make up a very small amount of the naturally occurring isotopes 12C and 14N and are therefore added by isotopic enrichment so that 1H, 13C and 15N can be studied at high sensitivity in 2-, 3- and 4-dimensional experiments where the nuclei are assigned by correlations between them. At the end of the day (or normally a man-year or so!) the spatial distribution of near-neighbour atoms can be calculated both through bond and through space. This is a dynamic measurement within the time frame of the magnetic relaxation of the nuclei. It differs from X-ray crystallography in being applicable to flexible and heterogeneous mixtures of molecules which are not fixed in space in a crystal lattice. NMR still has the disadvantage that it only allows visualisation of an averaged overall "global" solution conformation. This is where molecular modeling takes over: by inputting data from the NMR experiments one can explore feasible solution conformations away from the global ensemble average. Ideally therefore only videos of computer graphics images should be shown as the molecules explore the space allowed them by the force fields in the modeling software that are used to define the atomic interactions.
By definition these studies are computationally expensive and therefore recourse is necessary to machines
that can handle large Mbyte-Gbyte files. At another level molecular models are very good for education and
thinking about small molecule chemistry. In the glycoprotein field they have been highly instructive in
illustrating the relative molecular masses of oligosaccharide and protein, startling the classical
biochemist who has not been introduced to the world of post-translational modification of proteins.
In mucinology the importance of the oligosaccharide moieties has always been appreciated.
A "snapshot" of one possible solution conformation of a tandem repeat glycopeptide (n=2) having two
times eight amino acids (black) with two times three disaccharides substitutions of Galß1-3GalNAcalpha-1
on three adjacent Thr residues in each peptide octamer. These are conformationally quite rigid due to
steric overcrowding, one of the presumed roles of mucin oligosaccharides in forming extended protein motifs.
Other oligosaccharide additions such as sialic acid or backbone sequences are more flexible.
This may be relevant to a second proposed role, as an interface between solid and solution phases of
biological macromolecules.
Further, the antigenic or functional oligosaccharide motifs expressed distally can be accommodated more
easily into a carbohydrate binding domain and be readily presented in multi-valent form for high affinity
binding.
ROLES AS ANTIGENS AND IN BACTERIAL ADHERENCE
The proteins that bind mucin oligosaccharides are numerous. Typical of the gastrointestinal tract, are food proteins, antibodies and infectious agents. It has been shown 2 that the peanut agglutinin (PNA) which binds the T antigen sequence can cause proliferation of gastrointestinal cells in vitro. In vivo this antigen is tumour-associated in the colon. Sialylation, sulphation or other core region sequences are probable reasons for non-expression. There is also much early literature of the normal and tumour-associated distribution of backbone and blood group antigens in stomach, small and large intestine. There is recent evidence that the gastric pathogen Helicobacter pylorides binds to the sequence Fucalpha1-2Galß1-3[Fucalpha1-4]GlcNAc (reviewed in 1), just one of the many examples of adhesion of micro-organisms mediated by specific oligosaccharides.
All such results need to be redefined at the conformational level using the techniques described in this article as we now have anecdotal evidence that lectin, antibody and adhesin binding is highly dependent on the faces of the oligosaccharides they recognise and has different effects relating to clustering and accessibility at the glycopeptide level. For example, as reviewed in the mushroom lectin, which also binds the T antigen sequence but tolerates sialylation, has the opposite effect on cell proliferation to PNA in vitro; antibodies characterised as recognising the T antigen can distinguish different clustered forms, and there is often considerable variation in experimental results trying to define the specificity of micro-organism adhesion.