We generated simulation results for automatically selecting an MTP based on
different criteria. This document presents the results so that users will have
an idea what to expect from their MTPs.
We created simulated datasets from the rice chromosome 3
pseudomolecule for a total of 24 datasets:
For each dataset, the clones were assembled into FPC contigs. The tolerance for agarose was 7 and for HICF was 4. For each dataset, the simulation software used a binary search to finds the highest possible cutoff before getting the first false positive overlap. The same set of datasets were used for all the following tables.
The quality of minimal tiling paths was evaluated by
reference to the known positions of the clones in the
pseudomolecule.
In all the following table, the default MTP parameters were used unless stated otherwise.
Weight parameter:The user may influence the tradeoff
between minimizing overlaps and the risk of false positive
overlaps in the MTP through the variation of the "weight"
parameter. An indication of the effect of this parameter is
given in the following two graphs, which show average
overlap and percentage of false overlaps in the MTP as a
function of the weight parameter for several data sets.
These graphs may be used as guidelines for the effect of
changing the weight parameter value, but should not be
interpreted as a prediction of the results in any single
case. The performance of MTP and the effect of the weight
parameter will vary depending on the quality of the data
set, the digestion method, the FPC assembly, and the values
of other MTP parameters.
Agarose method:
BSS method using BES-draft
Draft sequence and BESs may also be used to find overlapping
clones as a basis for MTP. We created sequence assemblies from
simulated fragments at 1x, 2x, and 3x coverages of the rice
chromosome 3 pseudomolecule. Using the 10x with 6% error datasets,
BESs were created from the ends of clones.
There is no error introduced into the sequence. Additionally, the SeqCtgs are all correct;
that is, we simulated extracting sequences of 800 bps, then computed the SeqCtg based on the known coordinates.
The BSS (Blast Some Sequence) function in FPC was used to align the BESs to the
draft sequence, and then the MTP was used to find overlapping clone pairs
based on the BSS results. The MTP software allows positive only overlaps, or negative
overlaps. The idea behind negative overlaps is that the draft sequence bridges the two
clones, and hence, the full sequence is known. We have the following types of results:
- Negative allowed:
- Clones overlap and the BESs match a SeqCtg at the same location.
- Clones overlap and the BESs match a SeqCtg from a different location; this is not a false overlap, though
it is a false SeqCtg.
- Clones do not overlap but are bridged by a SeqCtg at the same location.
- Clones do not overlap but are bridged by a SeqCtg from a different location. In this case, using the
SeqCtg to bridge the two clones would be incorrect.
- Positive only: the same 4 situations as the above, except we must count false overlaps as such,
even if there is a correct bridging SeqCtg since the clones do not overlap.
We provide two tables, one with positive only overlaps allowed and one allowing negative overlaps.
For the first case, there is a column for F+ overlaps and in the second case there is a
column for Negative overlaps. In both cases, there is
a column for F+ SeqCtg, indicating the wrong SeqCtg bridges a negative overlap,
as these are the true false positives for BES->Draft.
When only using draft sequence to determine overlaps, there are generally not enough defined overlaps to
determine an MTP acrossed the entire contigs, hence, we have the concept of expressways and junctions. An
expressway is an MTP that does not go across the contig, i.e. there may be multiple expressways
in a contig. A junction is where two end clones of expressways overlap, but they were not defined
to have an acceptable overlap (e.g. there was no draft sequence covering the respective BESs). These
can cause large overlaps, so we suggest that the BSS method never be used by itself, but always in conjunction
with fingerprint overlaps. When using both, the MTP algorithm will always pick the BES-draft when possible, but otherwise
use fingerprints. The two following tables using only BSS overlaps is only to elucidate the small overlaps
that can be detected. The total overlap is the MTP and junction overlap.
The following two tables use the 10x with 6% error datasets.
Using the same data sets, we also ran MTP using both
fingerprint-based pairs and BSS-based pairs with positive overlap
only option.
If band sizes are available in the Sizes directory of an FPC project,
you may select to use these for calculation of clone overlaps instead of the
number of bands times the average size of restriction fragment. The following
table illustrates the differences in the MTP results as a consequence
of using the accumulated sizes versus an approximation based on the number of bands.
The table is based on the average of three
simulated data sets of a 15x library of rice chromosome 3
pseudomolecule with agarose digestion at a 0% error rate.
Another way in which size information can be used by MTP is to modify
the shortest path algorithm so that longer clones are used
preferentially. An example of the effect of selecting the "Give
preference to large clones" option is given in the following table,
which shows an average over the same data sets as used in the previous
table.
From our discussions with biologists who have selected MTPs by hand,
the automatic scripts do as well as a person can do by hand. As these
results show, using fingerprints is not the optimal way to select an
MTP. Using BESs and draft is obviously optimal. Another approach is to
sequence 'seed' clones and then find the best neighbor to sequence
based on finding the clone BES nearest the end. For this approach, we
are currently working on developing an algorithm to select the next
clone for sequencing based on a draft sequenced clone and BESs; note,
this will also provide some ordering of the sequenced contigs.