Communities of practice |
Genomics And Proteomics: Towards New Targets In Schistosome Research23 Jan 2008 Source: WHO/TDR
R Alan Wilson
Department of Biology, University of York, PO Box 373, York YO10 5YW, UK Working paper for the Scientific Working Group meeting on Schistosomiasis Research, convened by the Special Programme for Research and Training in Tropical Diseases, Geneva, 14–16 November 2005 Full text source: Scientific Working Group, Report on Schistosomiasis, 14–16 November 2005, Geneva, Switzerland, Copyright © World Health Organziation on behalf of the Special Programme for Research and Training in Tropical Diseases, 2006, http://www.who.int/tdr/publications/publications/swg_schisto.htm The first years of the 21st century have seen quite remarkable advances in our knowledge of the schistosome genome and the application of new technologies such as microarrays and proteomics to exploit the accumulating information. In this working paper I shall review the many possibilities now open to researchers. Potential pitfalls and limitations are also considered, together with the technical obstacles that must be overcome before the maximum benefits for schistosomiasis control can accrue. The bulk of material covered relates to Schistosoma mansoni for two reasons. Firstly, it is the most tractable species experimentally, so research on it generally sets the pace. Secondly, it is the species on which I have undertaken most of my investigations over the last 35 years. The narrow focus is not intended to belittle the efforts made with S. japonicum where the difficulty in maintaining the snail intermediate host, and the small amount of cercarial material obtainable, create special problems. The same is true for the experimentally even more difficult S. haematobium, where effort needs to be concentrated because it infects the largest number of individuals in sub-Saharan Africa, causing the greatest morbidity and annual mortality among the three principal human schistosomes [1]. The schistosome genomeShotgun sequencing of the S. mansoni genome was undertaken 2002–2004 as a joint effort between The Institute of Genomic Research (TIGR), Rockville, USA, and the Wellcome Trust Sanger Institute (WTSI), Cambridge, UK; it generated more than three million reads to provide approximately 8x coverage of the 280 MB genome. A major intrinsic obstacle to genome assembly has been, and remains, the presence of large amounts of repeat sequence (comprising up to 40% of the genome, mostly in a number of retrotransposon families [see below]). A genome database, SchistoDB.org, was created and version 1 of the draft assembly was released to the research community in February 2005, with automatic annotation of the genes. Another limitation has proved to be the inefficiency of the gene finding programs (this is not unique to schistosomes). Three were used (Phat, SNAP and glimmerHMM) to pick out genes from the DNA sequence, and they rarely agree precisely on what represents the predicted coding region. Nevertheless, with careful interrogation, it is possible to discover many novel features of schistosome gene structure. Following release of version 1 of the draft genome, work has continued at WTSI, under the supervision of Dr Matt Berriman, to improve the assembly by the generation and sequencing of a fosmid library constructed from randomly sheared DNA. As a result the genome has now been assembled into 13 000 supercontigs with an N50 of 824 kb, i.e. 50% of the nucleotides are in scaffolds > 800 kb. This is about the size of the average protozoan chromosome; in contrast, the individual schistosome chromosomes are around 30–40 Mb each, which gives some idea of the magnitude of the assembly task. Work has also continued at TIGR, under the supervision of Dr Najib el-Sayed, to retrain the gene finding programs with a larger number of full length coding sequences (CDS). Version 3 of the assembly is currently being screened at TIGR with the improved gene finders to obtain a new set of predicted genes which will form the substrate for gene annotation, both automatic and manual (two annotators are currently employed full time at WTSI). It is anticipated that version 3 will be released to the community shortly, with publication of the genome paper scheduled for mid 2006. In the short term it is unlikely that the genome will be assembled into eight chromosome-sized chunks, with genes assigned to each. One obstacle is the paucity of gene mapping to individual chromosomes, in which the pioneering work has been undertaken as a collaboration by Drs LoVerde and Hirai [2,3]. A sequencing project for S. japonicum is also under way based at the Chinese National Human Genome Centre in Shanghai. A draft assembly of the Chinese reads has been attempted at WTSI but the current status of the project is unclear (Berriman, personal communication). ProspectsHelminth genomes are an integral part of the next five-year plan at WTSI, commencing mid 2006. Efforts to improve the quality of the S. mansoni genome sequence will continue. S. haematobium is also listed and the priority it receives will depend on the enthusiasm and lobbying of the schistosome research community. Possession of three schistosome genome sequences will confer significant benefits, especially in the sphere of gene finding (cf. the sequencing of five species of yeast [4]). The transcriptomeThe characterization of expressed schistosome genes really got under way in 1995 with the publication of the first expressed sequence tag (EST) project [5] that added more than 400 new sequences to the database. In the next eight years, about 15 000 EST sequences were put into the public domain. The situation then changed dramatically with the generation, analysis and simultaneous publication of two large EST datasets for S. mansoni [6] and S. japonicum.[7] The former study, by a Brazilian consortium, used the ORESTES technique [8] to sample mRNA from six life cycle stages (egg, miracidium, sporocyst germ ball, cercaria, lung schistosomulum and adult). Analysis of the 120 000 ESTs revealed a genome containing at least 14 000 genes, with an estimated 7000 expressed in each life cycle stage, around 1000 of which were thought likely to be stage-specific. The transcriptome dataset was estimated to sample 92% of S. mansoni expressed genes.6 The latter study by a Chinese group sampled mRNA from adult worms and eggs [7]. A total of >43 000 ESTs were assembled into >13 000 clusters which the authors believed comprised most of the protein-coding genes (∼15 000) in the parasite (this conclusion leaves little scope for stage-specific expression). These datasets have proved invaluable for gene finding but also revealed the large number of transcribed retrotransposons [9], none yet demonstrated to be actively moving around the genome. The data have also opened up the possibilities for in silico analyses of biochemical and cellular mechanisms, for example in development, cell adhesion and signaling [6,7]. The research community is now well placed to characterize pathways for which we have some knowledge, and to begin the more difficult task of investigating systems novel to schistosomes. At least 65% of schistosome EST clusters have no homology with the genes of other organisms and many of these will surely encode proteins that contribute to the unique features of the genus. Possession of a well characterized transcript database will permit the construction of a genome-wide microarray with which to investigate patterns of gene expression throughout the life cycle and within individual tissues. A number of studies have already been published with arrays that inevitably achieve only partial coverage. (To the author's knowledge at least five separate S. mansoni and two S. japonicum arrays have been constructed, cf. the filarial research community, which has settled on one Brugia malayi array with access for all.) Both cDNAs [10,11] and oligonucleotides [12] have been printed onto glass slides and used to investigate a number of biological questions. To date, these have centred on gender-associated and stage-specific expression, but the possibilities are endless. The first small-scale S. mansoni cDNA array [10] was used to identify 12 female and 4 male-associated gene transcripts, and these observations were subsequently expanded using a much larger oligonucleotide array that revealed 197 transcripts with a gender biased pattern of gene expression in the adult schistosome [12]. A small S. japonicum cDNA microarray also highlighted around 20 female and 8 male-associated transcripts in two different isolates of the parasite. Finally an array based on mRNA exclusively from the lung schistosomulum, containing >6000 features representing >3000 genes, was used to screen mRNA from six other life cycle stages to pinpoint genes highly expressed in, or specific to, the lung stage [11]. A total of 563 genes proved to be differentially regulated across the life cycle stages used, around 50 of which were highly expressed at the lung stage. A limitation in current microarray constructionThe current microarrays have been largely constructed using cDNAs either taken from the public databases or purpose generated. Until we get a definitive list of CDS from the genome sequencing project, a serious obstacle lies in the way of extending the arrays to provide greater genome coverage. The fact that the Brazilian S. mansoni transcriptome project generated ESTs using the ORESTES technique that relies on random priming, means that their strandedness is unknown. They can be compiled into clusters, but the strandedness of these clusters will only be known if they contain one or more ESTs from conventionally generated cDNAs. We estimate that, of the >30,000 clusters and singlets in the S. mansoni database, >20 000 are composed exclusively of ORESTES sequences. The problem for array construction could be solved by printing ‘sense’ and ‘antisense’ copies from each cluster with unknown strandedness on the slide, but this would increase its size by around 70% (from 30 000 to 50 000 features) and its cost proportionally. The triploblastic acoelomate trapInvestigations that ask questions simply about gender- or stage-specific expression do not encounter the triploblastic acoelomate problem. Schistosomes have a solid body plan with differentiated cells and tissues representing the major organ systems of higher animals, such as nerve, muscle, gut, nephridia and gonads. (It is only necessary to look at electron micrographs of miracidia to see how many cell types can be packed into a small space.) Furthermore the cells are firmly adherent to each other, and methods do not exist for their separation. This becomes a problem when the investigator wants to determine where in the parasite genes of interest are being expressed. For example, if surface-expressed genes that might serve as vaccine candidates are sought, then it is important to determine that they are expressed in tegument cell bodies or gut epithelia and not in some cell type buried deep within the organism. Equally, when attempting to designate the components of a signalling pathway, the investigator needs to know that all candidates are expressed in the same cell type. Prospects
All the above demand high priority for research efforts in the immediate future if the full benefits of microarray analysis are to accrue. ProteomicsThe proteome can be defined as the total protein complement of an organism, tissue, cell or organelle. The suite of techniques that has been developed provides a way to link any protein to its encoding DNA (provided the sequence is in the database). The first step is to separate a complex mixture into its constituent proteins. For soluble proteins this is readily achieved by 2D electrophoresis (2DE), using immobilized pH gradients for the first dimension [20]. Although 2DE has its detractors, it has one singular advantage over other separation techniques in that the relative amounts of constituents within a given preparation can be quantified, using image analysis and densitometry software. A second approach involves trypsinisation of the protein mixture followed by separation of the peptides by liquid chromatography (LC). Alternatively, proteins may be separated by 1D electrophoresis, trypsinised, and the peptides separated by LC. LC approaches are ideal for insoluble proteins such as membrane constituents, but there is presently no easy way to obtain information on the relative amounts of different proteins in such a mixture. The final step in proteomics is to subject the sample to mass spectrometry (MS). MALDI ToF MS will produce a peptide mass fingerprint from the tryptic digest of a single gel spot. Selected peptides can then be fragmented by collision with a gas to yield fragmentation spectra. Database searching with e.g. Mascot software against theoretical digests of S. mansoni cDNA sequences or predicted CDS from the genome assembly, translated in all six reading frames, enables the link between the parent protein and its CDS to be made. However, a putative function can only be assigned if an annotation is available (remember that 65% of S. mansoni EST clusters have no homology to other organisms). Proteomics has some advantages over microarray analysis, particularly because the expressed proteins endow a cell or parasite stage with its specific functions, whereas detection of mRNA does not always equate to the presence of protein. In addition, the acoelomate body plan is not an insuperable obstacle to analysis of protein expression at the cell or tissue level since a life cycle stage can be fractionated using conventional cell biological techniques such as gradient centrifugation. On the other hand, sensitivity is an issue and, unless desired protein subsets can be enriched, scarce constituents will not be identified. A number of schistosome proteomic studies have already been published and the influence of the approach is set to grow since it can provide answers to questions of composition that were posed in some cases decades ago. The first report compared the soluble proteome of S. mansoni across four life cycle stages [21] revealing that the abundant cytosolic components such as 14-3-3, actin, enolase, and aldolase were common to the four stages investigated. Furthermore, the list included several of the first generation vaccine candidates such as triose phosphate isomerase, glyceraldehyde-3-phosphate dehydrogenase, glutathione-S-transferase and fatty acid binding protein. Subsequent studies have focused on the composition of the adult tegument separated from worm bodies by freeze-thaw treatment [22] and further enriched by density gradient centrifugation and differential extraction [23]. The results of these investigations have enabled the protein constituents of the tegument and its surface membrane complex to be explored. Furthermore, the proteins of both parasite and host origin, most exposed on the external surface, have been investigated by surface biotinylation [24]. Lastly, the cercarial secretions used in host skin penetration have been characterized [25] and several proteases and putative immunomodulators identified [26]. ProspectsProteomics has developed to the point where there are two very obvious applications that can rapidly generate new knowledge:
The schistosome glycomeSimilar techniques to MS analysis can also be applied to the glycan chains attached to schistosome proteins and lipids. The parasite fractions of interest are first treated enzymatically or chemically to release the glycans. A glycan mass fingerprint can then be generated by MS and, due to the relatively small number of sugar structures involved, empirical compositions obtained [27]. Linkage analysis can then be performed to identify the precise monosaccharides involved and their glycosidic bond links, allowing the complete glycan structures to be deduced. To the author's knowledge at least two groups, at the University of Leiden, The Netherlands (Dr C Hokke), and Imperial College, London (Prof. A Dell), are actively involved in analysing schistosome glycan structures but no papers have been published to date. ProspectsAs with proteomics, glycobiology is undergoing a rapid expansion in techniques and possibilities. The current limitations in synthesizing the deduced structures for experimental purposes are being solved and the first glycan arrays have been constructed [28]. It should thus soon be theoretically possible to ‘print’ the full range of glycan structures from schistosomes and use them, for example, to probe the acquired immune responses of human and laboratory hosts to schistosome infection or vaccination. In parenthesis, it should be noted that glycan epitopes are often the predominant stimulators of antibody production [29]. Another application of such arrays would be in the identification of interacting partners within the parasite or among host macromolecules. The ImmunomeIn theory, the combination of proteomics and serology should provide a powerful tool to identify antibody targets among schistosome proteins and glycans. At a superficial level this is the case, with the combination of 2DE separation of soluble proteins, their blotting onto membranes and probing with antibody permitting the most reactive proteins to be identified by MS. However, in the author's investigations, the reactive proteins turn out to be the same abundant cytosolic proteins that were cloned from expression library screens using antisera a decade or more ago. In the one published study using the blotting approach [30], the same glycolytic enzymes, chaperones and muscle proteins were again the dominant antigens. Conversely, antibody detection of proteins on Western blots appears to be one or two orders of magnitude more sensitive than gel stains such as Sypro Ruby so that strong reactions are often present that have no visible counterpart on the gel suitable for MS analysis. ProspectsTechnical issues need to be addressed before characterization of the immunome becomes a useful tool for e.g. pinpointing vaccine candidates. One possibility to increase the sensitivity of proteomics in general is pre-depletion of protein fractions with a cocktail of antibodies against the abundant cytosolic proteins so that larger quantities of scarce proteins can be loaded onto gels for 2D separation. In this context, obtaining sufficient parasite material from stages other than adult worms or cercariae will be a challenge. Finally, the problem of membrane proteins needs to be addressed. These could well represent important vaccine targets but they are not amenable to 2DE and blotting and they are also much scarcer than cytosolic components. It is possible that a combination of detergent extraction of membranes and immuno-precipitation would be adequate for the purpose, provided that conformational epitopes were not destroyed. ConclusionsCollectively, the several avenues outlined above provide the possibility for unprecedented gains in our knowledge of schistosomes and their interactions with the mammalian host. The future is bright, but both ingenuity and resources are needed if rapid progress is to be made. Comments |
Featured Meetings:Is your organisation working against the infectious diseases of poverty? |
There are no comments about this article: Please login if you want to submit a comment.