skip to main content

Assembly Archive User Guide

Table of Contents

  1. Available Data
  2. Finding the data
  3. Operating System and Browser Compatibilty
  4. Viewing the results

Available Data

What types of data

The assembly archive was designed to accept two types of data:

  • Assembly instructions describing the location and sequence alignment of individual traces within an assembly
  • Assembly alignments describing the alignment of a set of traces to a given assembly

There is a widely held assumption that assembly instructions can be recapitulated by aligning the traces used back to the assembly. However, this exercise will not always provide the expected result. During the course of an assembly, traces are generally constrained to a single location and heuristics and manual curation are often involved in producing the final consensus sequence. By storing the data used to construct the assembly, we are capturing this valuable information.

Source data

Assemblies held in the Assembly Archive are publicly available. The consensus sequence must be submitted to a sequence repository (GenBank/EMBL/DDBJ) and the traces must be submitted to the Trace Archive.

Available Organisms

Brucella suis, chromosome I and chromosome II

Bacillus anthracis, Ames ancestor strain

Finding the data

Currently, available assemblies can be obtained by clicking on a menu item "Browse". Search capabilities will be added in the future.

Operating System and Browser Compatibilty

Supported browers (by Operating System):


Windows2000/XP
MAC OSX
Unix/Linux

Viewing the results

A three-level view is available and displayed in three separate frames in the web browser.

The top frame

This frame contains a representation of the entire assembled molecule. A line with tick marks represents the assembly. Below this are graphs of clone coverage. The clone coverage is calculated on a per library basis and separate graphs are displayed. If there is any annotation associated with the assembly, this is displayed below the clone coverage graphs.

The red box in the top frame controls the display and navigation of the middle frame. The coordinates listed in the boxes flanking the red box give the exact basepair flanking the region of interest.

The middle frame

This frame shows a more zoomed-in view of the region defined by the red box in the top frame. In this view, any annotation associated with the assembly is also shown. Below this are the locations of individual traces. The red box in this frame controls the display and navigation of the bottom frame.

The individual traces are colored according to the relationship they have with their mate pair.

  • Green traces: have the expected orientation and distance relationship with their mate
  • Red traces: have either unexpected orientation or distance relationship with their mate
  • Blue traces: have no mate in the assembly

Trace and mate information can be obtained by placing the mouse over a trace of interest. A 'tool tip' will appear, giving you information regarding the mate pair. Red traces will also give you the particular mate pair violation that occurred. The arrows in the boxes indicate the orientation of the trace with respect to the consensus sequence.

Placing the mouse over a particular annotated object will also produce a 'tool tip' pertaining to that object.

The bottom frame

The region shown is defined by the red box in the middle frame. This frame contains the multiple sequence alignment, with the consensus sequence at the top and the traces below it ordered by their position in the genome. If the region shown is large, the sequence is represented as a grey line, with the red lines indicating regions of mismatch. As the degree of zoom increases, the grey lines become the actual bases. The actual traces can also be seen in this view (see navigation).

Labels and an indication of the orientation of the traces relative to the consensus are shown in the left of this frame (Fig. 1).

Fig. 1. Identifiers for the assembly (gnl|TRACE_ASSM|ID) and for the individual traces (gnl|ti|id) are shown to the left of the frame. The green arrowheads indicate the orientation of the traces to the consensus sequences (which is forward by default). The boxes with the (+) sign control whether the chromatogram is shown or hidden (it is hidden by default). The region shown is surrounded by a light green box. The blue numbers indicate the region shown (in basepairs). The grey lines indicate the sequence is in agreement with the consensus, while red lines in the grey bars indicate an alignment difference. The box made by the dashed red lines is a navigation device (described below).

Navigation

Viewing chromatograms:

Fig. 2. Selected traces are outlined in red, and a reload button appears in the upper left corner.

By default, the chromatograms are hidden in the bottom frame. To view one or more chromatograms, click on the boxes containing the (+) sign to the right of the green arrowheads. Selected traces will now have a box outlined in red, and a reload button will appear in the upper left hand corner of the frame (Fig. 2).

Clicking the reload button will refresh the frame, and the chromatograms for the selected traces will be visible. Note, while the assembly itself is selectable, no trace is available for this entry. The assembly represents the consensus of the traces in the multiple alignment.

If the region being viewed is too large to show actual basepairs the chromatogram cannot be reasonably drawn and instead a graph of the quality scores is displayed. As the region shown becomes smaller, the view will switch from a graph of the quality scores to a rendering of the chromatogram.

Controlling the region being viewed: The red box in the top frame is used to control the region being shown in the middle frame. The red box in the middle frame controls the region being shown in the bottom frame. The red box in the bottom frame also controls the region being shown in the bottom frame. All of the red boxes function in a similar way and can be used to control the level of zoom or can be used to pan to a different area.

Panning: Panning to a different region can be done by placing your mouse over the ruler (found near the top of the frame) within the red box. The mouse should now be represented as a cross. To pan to a different area hold down the left mouse button and drag the box to the position of interest. When the box is in the correct area double click over the ruler inside the red box and the view will be adjusted.

Zooming: Zooming in and out can be accomplished in two ways. Place the mouse over the left or right hand solid edge of the red box. When the mouse is represented by a horizontal line with double arrowheads click on the left mouse button and drag the line to the point of interest. When the line is placed appropriately double click over the ruler inside the red box and the view will be adjusted.

In addition, the exact coordinates of interest can be entered. Within each red box there is a smaller red box containing the coordinates of the region being shown. Double click in this box and those coordinates can be discarded and the coordinates of interest entered. Once the correct coordinates have been entered, double click over the ruler inside the red box to adjust the view.

Viewing specific traces: By default, all traces for a particular region are displayed. However, it is possible to view a subset of traces of interest. Select one or more traces in the middle frame. Having any trace selected will cause the reload icon to appear in the upper left corner of this frame. After all traces have been selected, click on the reload button and only the selected traces will appear in the bottom window.