FPC V8: A tutorial

 

F. Engler and C. Soderlund

Arizona Genomics Computational Laboratory

BIO5 Institute, University of Arizona, Tucson AZ 85721

Corresponding author:  cari@agcol.arizona.edu

April 2006

 

This manuscript is modified from ÒFPC:  A software package for physical mapsÓ, In Ian Dunham (ed) Genomic Mapping and Sequencing, Horizon Press, Genome Technology series.  The introduction has been removed, and the manuscript has been edited by William Nelson in order to update it to FPC V8.0.  This work was funded by USDA/IFAFS grant #11180

 

 

         We have written this tutorial to cover the salient features of FPC. It is augmented by FPC help, which can be accessed by most of the FPC windows. It can also be accessed as an HTML file from http://www.agcol.arizona.edu/software/fpc/FPChelpdoc.htm. This tutorial covers the features we used to build the maize physical map -- that is, incrementally assembling the map, ordering contigs based on framework markers, adding markers and remarks, and searching. Other than merging and adding remarks, we will not cover any editing functions as these are nearly obsolete.  If needed, they are covered in the UserÕs Manual.[1] We briefly describe comparing multiple gel images using the Gel Image window. This feature is used in selecting a MTP, which is covered by Humphrey and Mungall (2002). (Note that FPC version 7 and later contains an automated MTP selection function).

        


Table of Contents

 

Analysis.............................................................................................................................. 3

Tolerance and cutoff......................................................................................................... 3

CB maps and CB units................................................................................................. 3

Q clones........................................................................................................................... 5

Getting started................................................................................................................... 6

Some Unix basics............................................................................................................ 6

Installing FPC.................................................................................................................. 6

Downloading the demo files............................................................................................ 6

Building a physical map with FPC.................................................................................. 6

Creating a new project................................................................................................ 6

The DQer....................................................................................................................... 10

Incremental Builds......................................................................................................... 11

Adding remarks and markers........................................................................................ 12

Manually adding remarks............................................................................................... 12

Adding remarks and markers from a file........................................................................ 13

Searching.......................................................................................................................... 15

Finishing a project........................................................................................................... 18

Merging Contigs............................................................................................................ 18

Adding singletons.......................................................................................................... 20

Verify overlap................................................................................................................ 21

BIBLIOGRAPHY........................................................................................................... 25


Analysis

 

We will start out describing the major aspects of analysis. You may want to skip this and come back to the various sections when they are referenced during the tutorial. We assume that the reader is familiar with the fingerprinting technique by restriction digest (Marra et al. 1997).

 

Tolerance and cutoff

The bands of two clones are compared to determine the probability that the two clones overlap by chance. The FPC assembly algorithm uses two user-defined variables for measuring clone overlap: tolerance and cutoff.  The tolerance determines how closely two bands must match to consider them the same band.  If you are using migration rates, a fixed tolerance is used; that is, the same tolerance is used regardless of the value. If you are using sizes, a variable tolerance is generally used; see Soderlund et al. (1997a). The probability that the matching bands are just a coincidence is computed, and the cutoff value is a threshold on the probability score. If the result of the equation is below the cutoff, the two clones are said to overlap, i.e. the matching bands are less likely to be a coincidence. The cutoff is expressed in scientific notation: a 1e-03 is the same as 0.001 and 1e-05 is 0.00001. A higher exponent is a lower score; a lower score is a higher stringency. We will usually refer to a high or low stringency when discussing the cutoff value. The equation that is used for comparing two clones is stated as follows:

where p = (1 – b)nH, b = 2t/gellen, t is the tolerance, gellen is the number of possible values for bands,  nL and nH are the minimum and maximum number of bands for the two clones (nL<nH), and M is the number of shared bands. Since the tolerance is used in the equation, it is desirable to set it at the beginning of your analysis and never change it; a change requires reassembly of the entire database. Since the number of bands is also used in the equation, two clone pairs with the same number of matching bands may have two very different probabilities of coincidence (see Table I). 

 

Number of Bands

 

 

Clone 1

Clone 2

Matching Bands

Prob. of Coin.

52

38

12

3e-02

52

16

12

3e-06

Table I.  Number of Matching Bands Versus the Probability of Coincidence.  Even though the two clone pairs have the same number of matching bands, they have different probabilities of coincidence.

 

CB maps and CB units.

FPC orders overlapping clones and puts them into contigs based on the probability of coincidence scores.  As it orders the clones, it tries to order the bands to provide a more precise definition of the endpoints of clones.  As shown in Soderlund et al. (2000), better data yields more precise endpoints. Even with the high quality data being produced today, much ambiguity remains: (1) two bands may have the same length, but be different, (2) two bands may have values where the difference is outside the tolerance, but be the same, (3) bands may be missing, and (4) there may be extra bands, for example, many digests result in end bands. Therefore, slippage occurs in the endpoints; but unless the data is of especially low quality or contains Q clones (as described in the next section), clones that are supposed to overlap based on the cutoff do overlap. Also, the algorithm is greedy -- that is, to save time, it does not try all possible combinations, and therefore cannot guarantee the best solution. It tries a number of different solutions, each time starting with a different clone, and takes the best one (the number to try is adjustable by the user but defaults to 10).

 

Figure 1.  A CB map displayed in FPC.  The consensus bands are shown along the left.  The tick marks represent partially ordered groups. The {+,x,o} character columns represent  the clones.  A Ô+Õ indicates a match with the band to the left within the tolerance, a ÔxÕ indicates a match within twice the tolerance, and a ÔoÕ indicates no match. The number of extra bands for each clone is listed under the clone name.

 

 

 

The ordering of clones and their fragments is called a Consensus Bands (CB) map, an example of which is shown in Figure 1.  The coordinate system used in the contig display is in CB units: each distinct band is one unit of measurement. The length of each clone is equal to the number of bands in the clone. Endpoint coordinates are assigned as follows: N is equal to the number of bands in the clone divided by 2, M is the midpoint of the location of the clone in the CB map, M-N is the left coordinate of the clone and M+N is the right coordinate of the clone.  The left endpoint of the contig is set to zero, but can go negative. Note that the coordinates do not have any meaning relative to the chromosome until they are mapped by a framework marker.

 

Q clones

           

A large number of Q clones generally result from one or more false positive overlaps. Say clone x from contig A falsely overlaps with a clone y from contig B.  As clones are being added to the CB map from contig A, when clone x is added, it brings in clone y, which in turn brings in all of contig B. Since there is no way to provide a linear order for two contigs in the same space, the clones in the second contig end up in a stack (see Figure 2). The CB map software recognizes that it cannot order the bands for these clones, and consequently marks them as Q clones.  In assembly, a low stringency cutoff results in contigs with many Q clones (i.e. many false positives); a high stringency cutoff results in too many contigs (i.e. many false negatives).  Empirical evidence shows that for BAC clones with an average of 28-35 bands, a 1e-12 cutoff works well to minimize the number of contigs and contigs with many Q clones. Note: It is not unusual to have a few Q clones in a contig due to poor fingerprints or as a result of the greedy nature of the assembly algorithm

 

Figure 2: Contig with Q clones: the stack of clones in the center  indicates an F+ overlap.

 

 
 

 

 

 


Getting started

 

Some Unix basics

 

FPC runs under Unix and Linux.  An extensive knowledge of Unix is not necessary to use FPC effectively.   Users must know how to logon to a Unix terminal, and perhaps have a basic knowledge of the directory structure.  In this tutorial, any necessary commands are given as they are used.  Two basic commands to know are cd, which changes a directory, and ls, which lists all files in the current directory.

 

Installing FPC

 

If FPC is not installed on your system, ask your system administrator to download an FPC executable from http://www.agcol.arizona.edu/software/fpc and place it in a shared area for all users to access.  Currently, executables for Solaris, Linux, and Mac are provided.  If none of these match your machine type, you will need to have your system administrator download the source code and compile an executable in order to run FPC.

 

Downloading the demo files

 

Download  demo.tar from http://www.agcol.arizona.edu/software/fpc.  When this file has finished downloading, type tar xvf demo.tar on the command line and press return.  This action creates a directory called demo in your current directory.  Type cd demo to move into that directory.  Then type ls.  The following files should be listed:     

copyNew.pl       /files      /Image            /Sizes

      cleanup.pl        /Gel        Newbands   

 

Image, Sizes, and Gel are directories, and the files in them are generated from the Image program (see www.sanger.ac.uk/Software/Image). NOTE: if at any time during the tutorial you wish to bring the demo back to its initial condition, type cleanup.pl on the command line while in the demo directory.  This will restore all files and directories back to their original condition so you can restart the demo from the beginning.

 

Building a physical map with FPC

 

Creating a new project

 

The commands covered in this section:

 

From the demo directory, start FPC by typing fpc on the command line.  The Main Menu window appears (see Figure 3).  Right-click on the button labeled FileÉ and a menu appears.  Select Create new project from the menu and a window appears as is shown in Figure 4.  Choose a name for your project and type it in the File: text entry.  For this demo, type the name ÒdemoÓ.  Click OK; a demo.fpc file is created, and the following is written to the terminal window:

 

Serial implementation

Adding Bands Directory

Configuration file demo.fpp not found. Therefore will use defaults.

New project is initalized.

 

Figure 3: The Main Window. This is the first window you will see when you start FPC.

 

 

 

Figure 4:  From the FileÉ menu, choose Create new project, and this window appears. Enter a name. 

 

 

 

Now click on the Update .cor button on the Main Menu window.  This function moves all migration rate files from the Image directory to a newly created Bands directory.[2]  It also creates the file demo.cor, which is the file FPC uses to read the migration rates of clones.  When this function has completed, the last few lines written on your terminal window will be (with your path name substituted for /u/efriedr):

 

Read 311 files. Add 345 gel entries and 9456 bands.

Cor file has 9456 bands.

Saving File /u/efriedr/demo/demo.fpc ......Done

 

Click on the Main Analysis button on the Main Menu window and the Main Analysis window opens (see Figure 5). 

 

Figure 5: Main Analysis

 

 

Change the cutoff to 1e-12 in the Cutoff text box. Leave all other values unchanged.  Next, click on the Build Contigs (Kill/Calc/OkAll) button.  This starts the map-building process.  This process may take several hours for large clone libraries, but for the demo, it should only take a few seconds.  When this process completes, the last few lines on your terminal window will be:

 

Complete Build: Tol 7 Cut 1e-12 Bury~ 0.10 Best 10

Singles 5 AvgOverlap 3.8  AvgScore 0.871 Qs 30(1) (<=5Qs 0  >5Qs 1)

Create 4 contigs (1:4): Max 121, 3 (>50), 1 (50:26), 0 (25:4), 0 (3:2)

NxN Pairs: Real time 0.140s   User time 0.150s   Sys time 0.000s

Layout:    Real time 0.540s   User time 0.510s   Sys time 0.000s

 

Note, the number of Qs may be 30 or 31, and the times will vary. The Project window pops up, as shown in Figure 6a.  The assembly resulted in four contigs.  Double-click on the row of contig 2 and it is displayed, as shown in figure 7.

 

Figure 6.  The Project window.  (a) Shows the window after an initial build, (b) after the DQer was run, and (c) after new clones were incorporated and the IBC was run.  Note the change in Q clones from (a) to (b), and the merging of contigs from (b) to (c).

 

Not all the clones are shown as redundant clones are buried. Click the button called Yes underneath the Show buried clones label and all the clones are shown (see Figure 7). If a clone has a set of bands similar to another, it can be buried in the second clone. Click on a clone that has an "*" at the end of the clone name; the "*" implies that it has buried clones and the buried clones are highlighted. A clone ending with a "=" has all the same bands as the parent clone. A clone ending with a "~" has approximately the same set of bands as the parent clone. Click on the No button again to switch back to the buried state.  You can zoom by holding the mouse down on the slider within the ruler under the Zoom label, and moving one way or another; alternatively, you can click in the grey area of the ruler. To scroll around the contig display, move the mouse pointer towards the right of the map and click on the middle mouse button (feature not available on two-button mouse).  The map scrolls to the left.  To move back, position the pointer towards the left of the map and click.  You can alternatively use the ruler at the bottom of the display.

 

 

Figure 6.  The Contig display.  The Yes was selected under Show buried clones, so all clones are shown. Selecting a clone shows the buried clones highlighted green

 

To close all windows at once except the Main Menu window, select Clean Up on the Main Menu.

 

The DQer

 

Commands covered in this section:

 

Open the Project window by double-clicking on the bold-faced project name (demo) at the top of the main menu. Look at the column with the heading ÔQsÕ on the Project window.  Notice that three contigs have a 0 in this column, while contig 3 has a 30. (Contig 0 contains all clones that could not be placed in the map; hence, Q clones do not apply.)  After an initial build with a moderate cutoff, we need to take the contigs with many QÕs and re-run them at a lower cutoff.  The DQer performs this function automatically. Click the Main Analysis button from the Main Menu.  Towards the bottom you will see a button labeled  DQer with two text entries to the right of it.  The first text box (if >= 5 Qs) determines how many QÕs a contig must have in order to be re-evaluated.  Empirical evidence has shown that a value around 5 yields good results. The second text box (Step 1) is relevant for HICF (see the HICF tutorial www.agcol.arizona.edu), and will be ignored for this demo. Click on the  DQer button and the reanalyzation starts.  The contigs with QÕs above the cutoff are reassembled up to three times, with cutoffs of 1e-13, 1e-14, 1e-15. The software tries to merge the CB maps by comparing the end clones at a lower stringency. If the CB maps cannot be merged, one or more new contigs are created.  When the DQer is done, the Project window pops to the front, as shown in Figure 6b. A contig with many Qs may not change if lowering the cutoff 3-fold does not make a difference; that is, when all clones remain in the same contig and the number of Qs remains high. This indicates contamination or a very repetitive fingerprint.

            Save the current contigs by clicking on the Save .fpc button on the Main Menu window.  The ONLY time the FPC project is automatically saved is after an Update .cor.  Therefore, whenever you have made some changes that you want saved, do so immediately. You can save any number of times during an FPC session. The benefit of saving often is that if you make a mistake (e.g. merge two contigs) and then decide you did not really want to do that, you can quit and restart FPC from your last save. Select the Quit button on the Main Menu to exit FPC. 

Type ls on the command line to see the new files created by FPC.  The following should now be listed:

Bands/            demo.cor.backup         files/            Sizes/

cleanup.pl       demo.fpc                Gel/       

copyNew.pl        demo.fpc.backup         Image/     

demo.cor          demo.fpp                Newbands/

 

Incremental Builds

           

Commands covered in this section:

 

We will now add some additional clones to our FPC project. Generally, as new gels are band-called, Image places the files in the Image, Sizes, and Gel directories. For this demo, a new set of files was temporarily put into the Newbands directory, and the files can be moved to the correct locations using the copyNew.pl perl script. From the demo directory, type ./copyNew.pl on the command line to copy the files from the Newbands directory into their respective Image, Gel, and Sizes directories.  Thereafter, launch FPC with the previously created project (we called it ÒdemoÓ) by typing fpc demo on the command line.  When the Main Menu window appears, click on the Update .cor button.  This will copy the files from the Image directory to the Bands directory, and it updates the demo.cor file with the new migration rates.  We are now ready to add the new clones to our map.  Open the Main Analysis window. The cutoff should still be set at 1e-12. Click on the Incremental Build Contigs button.  This adds the new clones to our map and merges contigs if the new information allows us to do so.  When the build is done, the Project window will pop to the front (see Figure 6c).  Contigs 1 and 4 have been merged indicating that one or more of the new clones hits both.  Save the new map by clicking on the Save .fpc button on the Main Menu.

 

Adding remarks and markers

 

Manually adding remarks

           

Commands covered in this section:

 

Open the Contig display for contig 2 (via the Project window).  At the top, in the text entry labeled Search, type b0297K22 and hit Return.  The clone is found in the contig, so it will be highlighted as shown in Figure 8a. Click on the highlighted clone and the Clone window opens (see Figure 8b).  Click on the Edit button in the top left corner and the Edit Clone window opens (see Figure 8c).  Here we can change the attributes of the clone, including attaching remarks.  In the text entry titled Remarks, type test_remark to add that remark to the clone.  Click on Accept Edit.  The Clone Edit window closes, and our newly added remark in the Clone text box. It is also shown in the Contig display; clicking on the clone highlights the remark and vice versa. Select Clean up on the Main Menu before going on to the next section.

Figure 8.  (a) Click the highlighted clone to bring up the Clone text window. (b) The Clone text window. (c) The Edit Clone window.

 

 

 

Adding remarks and markers from a file

 

Commands covered in this section:

 

Often remarks can be generated from an external file, in which case it is faster to automatically add them all at once. Hence, FPC provides features for adding a list of remarks from an external file.  The text file is a list with entries such as the following:

 

BAC : "b1046D08"

Remark  "new_add"