EST extensions
The MSLL, HMPR and AZM sequences have been aligned to the ends of the
maize EST contigs to provide extensions. In total, 16027 contigs have extensions. HMPR extends 1618 contigs, MSLL extends 2662, and AZM5 extends 11747 contigs.
Query EST Contig Extentions
On the Query page, select Submit without choosing any libraries to see all
contigs with extensions. You may view subsets, e.g. only those with MSLL
extensions, by selecting the appropraite library. The prefixes for the ESTs in the contigs are as follows:
- ZMMBL - MSLL
- ZMMBH - HMPR
- AZM5 - AZMs
- ZM_BFx (x = a,b,c) - sequenced for FL-cDNA project
- all others are from Genbank.
Sequences used
- HMPR and MSLL
- TIGR AZM Release 4.0 of the assembled high-cot, methyl-filtered, and unfiltered sequences.
- BES Maize BAC-end sequences from University of Arizona and Rutgers.
- RescueMU which were generated as part of the Maize Gene Discovery Project.
- TIGR maize repeat database Version 3.0 of the repeat database.
- Maize EST assemblies assembled at AGCoL using PAVE.
- 437 Maize sequenced BACs, with ordered sequence contigs.
- 151 maize genes, curated and annotated by Brad Barbazuk.
Coverage fractions
To estimate the fraction of possible MSLL/HMPR sequences obtained, the
sets (with organellar reads removed) were aligned to
themselves using BLAT, requiring 98% identity over at
least 95% of the shorter of the two sequences. By this technique, 28% MSLL
and 26% of HMPR reads were found to duplicate another read, leading to the estimate
that the sequence sets are approximately 28% and 26% complete, respectively.
Repeat and EST Analysis
All survey sequences (MSLL, HMPR, MF, HC, UF, BES) were masked using RepeatMasker and the TIGR repeat database, and
evaluated for EST content using BLAT. EST alignments were required to
have 98% identity over 100 bp. Results are given in the following table.
|
MSLL |
HMPR |
MF |
HC |
RMU |
UF |
BES |
| Total contigs/singlets |
80,732 |
20,384 |
133,806 |
172,600 |
191,715 |
49,364 |
474,014 |
| Retrotransposons |
20% |
8.6% |
26% |
12% |
4% |
64% |
62% |
| Transposons |
0.8% |
0.7 |
0.5% |
0.7% |
0.3% |
0.6% |
0.7% |
| MITES |
1.1% |
1.1% |
0.4% |
0.7% |
0.6% |
0.2% |
0.2% |
| Centromere |
0.6% |
0.1% |
0.4% |
0.09% |
0.02% |
0.8% |
0.8% |
| Telomere |
0.02% |
0.1% |
0.02% |
.02% |
0.02% |
0.01% |
0.04% |
| Ribosomal genes |
1.1% |
1.7% |
0.2% |
0.1% |
0.1% |
2.1% |
0.9% |
| Unknown |
17% |
14% |
9% |
14% |
7% |
7% |
13% |
| Total repeats |
40% |
25% |
39% |
27% |
12% |
83% |
77% |
| EST contig hits |
13% |
22% |
18% |
14% |
12% |
3% |
1% |
Linking Analysis
A key tenet for this project is that the HMPR should
cover gaps in genic regions left open by the AZM, and the
MSLL should span repetitive regions, linking genic regions
together. We identified linked pairs of AZM using the following
steps.
MSLL and HMPR sequences were soft-masked and then aligned
to the AZM contigs that contain both MF and HC reads (Chen et al. 2005),
requiring either 95% match of the entire query or target sequence,
or an overlap alignment of at least 100 bp match at 95% identity
(a 15 bp gap is allowed at the end for possible trimming error.) Reads
hitting multiple AZM were ignored. Results are shown in the
table below.
The fourth row shows the number of
linked pairs for which the two AZM were actually the same, i.e.,
both HMPR or MSLL ends were contained in the same AZM.
As expected, this is far greater for the short HMPR.
Combining the joined pairs into chains give the following numbers:
3-chains: 525
4-chains: 37
5-chains: 1
|
HMPR |
MSLL |
|
| AZM hits |
17,785 (87%) |
59,248 (73%) |
| Multiple AZM hits |
960 |
2468 |
| Links |
2,926 |
7,367 |
| Confirmed |
102 |
130 |
| Expected |
79 |
199 |
| Same AZM |
31 |
2 |
Alignment to the BACs
MSLL and HMPR were aligned to 3607 sequenced maize BACs, using
oft-masking and requiring a 98% match over 95% of the query.
Paired hits were required to be entirely within one
sequence contig (unless the contigs were ordered), and to have opposite orientation.
Note that the requirement to be within a sequence contig biases the
MSLL pair detection towards shorter MSLL.
The table below shows for each library:
1) Number of paired hits
2) Avg. span between the end hits
3) Avg. percentage of masked sequence in the spanned region.
4) Avg. number of est in the spanned region.
4) Avg. number of retrotransposons in the spanned region.
EST and retros were counted as being spanned if their alignment region
(as seen on the minbac display) overlapped that of the paired msll/hmpr.
| Lib |
paired hits |
avg. span (kb) |
avg. masking |
avg. ests |
avg. retros |
| Ha |
7 |
3.1 |
9% |
1 |
0 |
| Hb |
9 |
3.0 |
4% |
0 |
0 |
| Hc |
39 |
3.3 |
13% |
1 |
0 |
| Hd |
90 |
3.5 |
15% |
1 |
0 |
| He |
297 |
3.2 |
15% |
1 |
0 |
| Hf |
649 |
3.0 |
15% |
1 |
0 |
| Hg |
84 |
3.2 |
16% |
1 |
0 |
| Hh |
6 |
2.7 |
10% |
1 |
0 |
| Hi |
10 |
2.4 |
10% |
1 |
0 |
| Hj |
10 |
2.6 |
10% |
1 |
0 |
| Hk |
11 |
.96 |
7% |
1 |
0 |
| La |
29 |
46 |
69% |
3 |
20 |
| Lb |
44 |
39 |
77% |
0 |
19 |
| Lc |
150 |
26 |
74% |
0 |
12 |
| Ld |
195 |
23 |
51% |
4 |
7 |
| Le |
133 |
14 |
56% |
0 |
5 |
| Lf |
9 |
61 |
58% |
3 |
22 |
| Lh |
5 |
98 |
76% |
0 |
49 |
| Li |
21 |
59 |
74% |
1 |
34 |
| Lj |
8 |
65 |
60% |
1 |
26 |
| Lz |
190 |
9.5 |
36% |
1 |
2 |
| hmpr |
1243 |
3.1 |
14% |
0 |
0 |
| msll |
800 |
23 |
55% |
0 |
9 |
Alignment to curated genes
MSLL/HMPR sequences, along with other gene-rich contigs, were
repeat-masked and with organellar removed, were aligned to
151 maize genes curated by Brad Barbazuk. Matches were required
to cover 98% of the MSLL or HMPR, or to extend to within 2% of the end,
for sequences partially overlapping.
Shown below are the percentages of hits obtained to different portions
of the gene region.
The most noticeable feature of the distribution is that HMPR are
quite suppressed within initial exons (as well as single exons)
and relatively abundant in internal exons, introns, or the 3' UTR.
Placement on FPC map
Six MSLL plates (La0009, Lb00011, La00012, Lf0001, Lh0001)
were fingerprinted using the HICF methodology of Nelson et al. (2005)
and placed onto the HICF maize FPC map. This is significant because
it demonstrates that MSLL-sized clones may be used to anchor gene-rich
contigs onto a physical map, which would not have been possible using
agarose fingerprinting due to the shorter size of most of the MSLL clones.
The fingerprints were screened to require at least half of the
minimum expected band count for the size range of the clone library
(using a conversion factor 1 band = 1.2 kb). Clones from the a,b
libraries were required to have 14 bands; the f library
required 25; and the h library required 41.
1022 successful fingerprints were obtained, of which 923 were placed onto the
FPC map using FPC's "Keyset-->FPC" function.
Placements were checked for accuracy using the sequenced BAC alignments
found above. 42 placements could be checked this way, of which
41 agreed with the FPC placement.
|