The University of Arizona
MSLL
Analysis


Sequences

  • HMPR and MSLL
  • TIGR AZM Release 4.0 of the assembled high-cot, methyl-filtered, and unfiltered sequences.
  • BES Maize BAC-end sequences from University of Arizona and Rutgers.
  • RescueMU which were generated as part of the Maize Gene Discovery Project.
  • TIGR maize repeat database Version 3.0 of the repeat database.
  • Maize EST assemblies assembled at AGCoL using PAVE.
  • 437 Maize sequenced BACs, with ordered sequence contigs.
  • 151 maize genes, curated and annotated by Brad Barbazuk.

Coverage fractions

To estimate the fraction of possible MSLL/HMPR sequences obtained, the sets (with organellar reads removed) were aligned to themselves using BLAT, requiring 98% identity over at least 95% of the shorter of the two sequences. By this technique, 28% MSLL and 26% of HMPR reads were found to duplicate another read, leading to the estimate that the sequence sets are approximately 28% and 26% complete, respectively.

Repeat and EST Analysis

All survey sequences (MSLL, HMPR, MF, HC, UF, BES) were masked using RepeatMasker and the TIGR repeat database, and evaluated for EST content using BLAT. EST alignments were required to have 98% identity over 100 bp. Results are given in the following table.

MSLL HMPR MF HC RMU UF BES
Total contigs/singlets 80,732 20,384 133,806 172,600 191,715 49,364 474,014
Retrotransposons 20% 8.6% 26% 12% 4% 64% 62%
Transposons 0.8% 0.7 0.5% 0.7% 0.3% 0.6% 0.7%
MITES 1.1% 1.1% 0.4% 0.7% 0.6% 0.2% 0.2%
Centromere 0.6% 0.1% 0.4% 0.09% 0.02% 0.8% 0.8%
Telomere 0.02% 0.1% 0.02% .02% 0.02% 0.01% 0.04%
Ribosomal genes 1.1% 1.7% 0.2% 0.1% 0.1% 2.1% 0.9%
Unknown 17% 14% 9% 14% 7% 7% 13%
Total repeats 40% 25% 39% 27% 12% 83% 77%
EST contig hits 13% 22% 18% 14% 12% 3% 1%

Linking Analysis

A key tenet for this project is that the HMPR should cover gaps in genic regions left open by the AZM, and the MSLL should span repetitive regions, linking genic regions together. We identified linked pairs of AZM using the following steps.

MSLL and HMPR sequences were soft-masked and then aligned to the AZM contigs that contain both MF and HC reads (Chen et al. 2005), requiring either 95% match of the entire query or target sequence, or an overlap alignment of at least 100 bp match at 95% identity (a 15 bp gap is allowed at the end for possible trimming error.) Reads hitting multiple AZM were ignored. Results are shown in the table below. The fourth row shows the number of linked pairs for which the two AZM were actually the same, i.e., both HMPR or MSLL ends were contained in the same AZM. As expected, this is far greater for the short HMPR.

Combining the joined pairs into chains give the following numbers:
3-chains: 525
4-chains: 37
5-chains: 1


HMPR MSLL
AZM hits 17,785 (87%) 59,248 (73%)
Multiple AZM hits 960 2468
Links 2,926 7,367
Confirmed 102 130
Expected 79 199
Same AZM 31 2

Alignment to the BACs

MSLL and HMPR were aligned to 3607 sequenced maize BACs, using oft-masking and requiring a 98% match over 95% of the query. Paired hits were required to be entirely within one sequence contig (unless the contigs were ordered), and to have opposite orientation. Note that the requirement to be within a sequence contig biases the MSLL pair detection towards shorter MSLL. The table below shows for each library:

1) Number of paired hits
2) Avg. span between the end hits
3) Avg. percentage of masked sequence in the spanned region.
4) Avg. number of est in the spanned region.
4) Avg. number of retrotransposons in the spanned region.

EST and retros were counted as being spanned if their alignment region (as seen on the minbac display) overlapped that of the paired msll/hmpr.

Lib paired hits avg. span (kb) avg. masking avg. ests avg. retros
Ha 7 3.1 9% 1 0
Hb 9 3.0 4% 0 0
Hc 39 3.3 13% 1 0
Hd 90 3.5 15% 1 0
He 297 3.2 15% 1 0
Hf 649 3.0 15% 1 0
Hg 84 3.2 16% 1 0
Hh 6 2.7 10% 1 0
Hi 10 2.4 10% 1 0
Hj 10 2.6 10% 1 0
Hk 11 .96 7% 1 0
La 29 46 69% 3 20
Lb 44 39 77% 0 19
Lc 150 26 74% 0 12
Ld 195 23 51% 4 7
Le 133 14 56% 0 5
Lf 9 61 58% 3 22
Lh 5 98 76% 0 49
Li 21 59 74% 1 34
Lj 8 65 60% 1 26
Lz 190 9.5 36% 1 2
hmpr 1243 3.1 14% 0 0
msll 800 23 55% 0 9

Alignment to curated genes

MSLL/HMPR sequences, along with other gene-rich contigs, were repeat-masked and with organellar removed, were aligned to 151 maize genes curated by Brad Barbazuk. Matches were required to cover 98% of the MSLL or HMPR, or to extend to within 2% of the end, for sequences partially overlapping.
Shown below are the percentages of hits obtained to different portions of the gene region.
The most noticeable feature of the distribution is that HMPR are quite suppressed within initial exons (as well as single exons) and relatively abundant in internal exons, introns, or the 3' UTR.

Placement on FPC map

Six MSLL plates (La0009, Lb00011, La00012, Lf0001, Lh0001) were fingerprinted using the HICF methodology of Nelson et al. (2005) and placed onto the HICF maize FPC map. This is significant because it demonstrates that MSLL-sized clones may be used to anchor gene-rich contigs onto a physical map, which would not have been possible using agarose fingerprinting due to the shorter size of most of the MSLL clones.

The fingerprints were screened to require at least half of the minimum expected band count for the size range of the clone library (using a conversion factor 1 band = 1.2 kb). Clones from the a,b libraries were required to have 14 bands; the f library required 25; and the h library required 41.

1022 successful fingerprints were obtained, of which 923 were placed onto the FPC map using FPC's "Keyset-->FPC" function. Placements were checked for accuracy using the sequenced BAC alignments found above. 42 placements could be checked this way, of which 41 agreed with the FPC placement.


Email comments to will@agcol.arizona.edu

Last Modified Wednesday July 02, 2008 10:27 AM and 21 seconds