Development of a pooled probe method for locating small gene families in a physical map of soybean using stress related paralogues and a BAC minimum tile path

Plant Methods. 2006 Dec 8:2:20. doi: 10.1186/1746-4811-2-20.

Abstract

Background: Genome analysis of soybean (Glycine max L.) has been complicated by its paleo-autopolyploid nature and conserved homeologous regions. Landmarks of expressed sequence tags (ESTs) located within a minimum tile path (MTP) of contiguous (contig) bacterial artificial chromosome (BAC) clones or radiation hybrid set can identify stress and defense related gene rich regions in the genome. A physical map of about 2,800 contigs and MTPs of 8,064 BAC clones encompass the soybean genome. That genome is being sequenced by whole genome shotgun methods so that reliable estimates of gene family size and gene locations will provide a useful tool for finishing. The aims here were to develop methods to anchor plant defense- and stress-related gene paralogues on the MTP derived from the soybean physical map, to identify gene rich regions and to correlate those with QTL for disease resistance.

Results: The probes included 143 ESTs from a root library selected by subtractive hybridization from a multiply disease resistant soybean cultivar 'Forrest' 14 days after inoculation with Fusarium solani f. sp. glycines (F. virguliforme). Another 166 probes were chosen from a root EST library (Gm-r1021) prepared from a non-inoculated soybean cultivar 'Williams 82' based on their homology to the known defense and stress related genes. Twelve and thirteen pooled EST probes were hybridized to high-density colony arrays of MTP BAC clones from the cv. 'Forrest' genome. The EST pools located 613 paralogues for 201 of the 309 probes used (range 1-13 per functional probe). One hundred BAC clones contained more than one kind of paralogue. Many more BACs (246) contained a single paralogue of one of the 201 probes detectable gene families. ESTs were anchored on soybean linkage groups A1, B1, C2, E, D1a+Q, G, I, M, H, and O.

Conclusion: Estimates of gene family sizes were more similar to those made by Southern hybridization than by bioinformatics inferences from EST collections. When compared to Arabidopsis thaliana there were more 2 and 4 member paralogue families reflecting the diploidized-tetraploid nature of the soybean genome. However there were fewer families with 5 or more genes and the same number of single genes. Therefore the method can identify evolutionary patterns such as massively extensive selective gene loss or rapid divergence to regenerate the unique genes in some families.