HIV Databases HIV Databases home HIV Databases home
HIV sequence database



Advanced Consensus Explanation

Consensus Maker takes an input file of aligned sequences in most standard formats and calculates a consensus sequence for those sequences. The consensus alone may be returned by the program or the user has the option to prepend the consensus to the original alignment. A copy of the output file may be downloaded. If the input alignment comprises blocks of sequences (e.g., HIV sequences grouped by subtype) then the program can calculate a consensus for each sequence block and a consensus of the consensuses. The program recognizes sequence blocks by how the component sequences are named.

A good way to understand the options available in this program is to click the blue Sample Input button at the top of the submission page. This causes a simple, hypothetical alignment (in table format) to be loaded into the form.

A.seq1	A-CGTATTAG
A.seq2	A-CG-AT
A.seq3	A-CT-CT
A.seq4	A-TT-CX
B.seq1	A-CG-AT
B.seq2	A-CG-CT
B.seq3	A-CG-TT
You can then calculate the consensus of this alignment under varying input options to see the results of those options. Each column of the Sample Input has been chosen to illustrate the workings of the various options. The output looks like:
CON_OF_CONS  ACG-?TTAG
CON_A        Acg-?TTAG
A.seq1       ACGTATTAG
A.seq2       ACG-AT   
A.seq3       ACT-CT   
A.seq4       ATT-CX   
CON_B        ACG-?T???
B.seq1       ACG-AT   
B.seq2       ACG-CT   
B.seq3       ACG-TT   
Col. 1: unanimity, Col. 2: all gaps, column squeezed, Col. 3: majority, Col. 4: no majority letter but resolvable by common character, Col. 5: gaps, Col. 6: irresolvable tie in consensus, Col. 7: undefined character, Cols. 8-10: missing information (trailing blanks).


Input file options

Note, if your alignment contains sequences of varying length, Consensus Maker will equalize the lengths of sequences by adding spaces to the ends of short sequences. But those spaces will not be considered in calculating the sequence unless the space character is added to the set of "characters to consider."

Consensus output options

Consensus calculation options


Examples

Example of using names to identify alignment blocks:

In the table-formatted file below there are two blocks, an "A1" block and a "B" block recognizable by the "A1." and "B." (note the dot) with which the names begin. Two consensuses will be calculated for this alignment if "Do consensus for each block" is true and "Min. no. seqs. for consensus" is 3.

A1.FR.83.IIIB_A04321 aaactatcgtagctagctagctgatcgatgctagctgatcg.... etc
A1.FR.83.IIIC_A04322 aaactatcgtagctagctag------gatgctagctgatcg.... etc
A1.DE.96.POIURR_A04322 aaactatcgtagctagctag------gatgctagctgatcg.... etc
B.FR.82.LAI_K03455 aaactatcgtagctagctttctgatcgatgctagctgatcg.... etc
B._._.N833_AF76511 acactatcgtagctagctagctgatcgatgctagctgatcg.... etc
B.US.99.JK77_AF76511 acactatcgtagctagctagctgatcgatgctagctgatcg.... etc

Example of "pretty print" output:

CON                     gccagccccc tgaTGGGGGC GACaCTCCAC CATGAATCAC tCCCCTGTGA 
1a.-.COLONEL_AF290978   ---------- --TTGGGGGC GACACTCCAC CATGAATCAC CCCCCTGTGA 
1a.-.H77_AF009606       GCCAGCCCCC TGATGGGGGC GACACTCCAC CATGAATCAC TCCCCTGTGA 
1a.-.HEC278830_AJ278830 GCCAGCCCCC TGATGGGGGC GACGCTCCAC CATGAATCAC TCCCCTGTGA 

CON                     GGAACTACTG TCTTCACGCA GAAAGCGTCT AGCCaTGGCG TTAGTATGAG 
1a.-.COLONEL_AF290978   GGAACTACTG TCTTCACGCA GAAAGCGTCT AGCCATGGCG TTAGTATGAG 
1a.-.H77_AF009606       GGAACTACTG TCTTCACGCA GAAAGCGTCT AGCCATGGCG TTAGTATGAG 
1a.-.HEC278830_AJ278830 GGAACTACTG TCTTCACGCA GAAAGCGTCT AGCCGTGGCG TTAGTATGAG 

CON                     TGTCGTGCAG CCTcCAGGAC CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 
1a.-.COLONEL_AF290978   TGTCGTGCAG CCTCCAGGAC CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 
1a.-.H77_AF009606       TGTCGTGCAG CCTTCAGGAC CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 
1a.-.HEC278830_AJ278830 TGTCGTGCAG CCTCCAGGAC CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 

Example of "output aligned" output:

CON                     gccagccccc tgaTGGGGGC GACaCTCCAC CATGAATCAC tCCCCTGTGA 
1a.-.COLONEL_AF290978   .......... ..T------- ---------- ---------- C--------- 
1a.-.H77_AF009606       ---------- ---------- ---------- ---------- ---------- 
1a.-.HEC278830_AJ278830 ---------- ---------- ---G------ ---------- ---------- 

CON                     GGAACTACTG TCTTCACGCA GAAAGCGTCT AGCCaTGGCG TTAGTATGAG 
1a.-.COLONEL_AF290978   ---------- ---------- ---------- ---------- ---------- 
1a.-.H77_AF009606       ---------- ---------- ---------- ---------- ---------- 
1a.-.HEC278830_AJ278830 ---------- ---------- ---------- ----G----- ---------- 

CON                     TGTCGTGCAG CCTcCAGGAC CCCCCCTCCC GGGAGAGCCA TAGTGGTCTG 
1a.-.COLONEL_AF290978   ---------- ---------- ---------- ---------- ---------- 
1a.-.H77_AF009606       ---------- ---T------ ---------- ---------- ---------- 
1a.-.HEC278830_AJ278830 ---------- ---------- ---------- ---------- ---------- 

Example of formatted output (nexus):

#NEXUS

begin taxa;
dimensions ntax=4;
taxlabels
CON
1a._.COLONEL_AF290978
1a._.H77_AF009606
1a._.HEC278830_AJ278830
;
end;

begin characters;
dimensions nchar=150;
format interleave datatype=dna;
matrix
CON                     gccagccccctgaTGGGGGCGACaCTCCACCATGAATCACtCCCCTGTGA
1a._.COLONEL_AF290978   ------------TTGGGGGCGACACTCCACCATGAATCACCCCCCTGTGA
1a._.H77_AF009606       GCCAGCCCCCTGATGGGGGCGACACTCCACCATGAATCACTCCCCTGTGA
1a._.HEC278830_AJ278830 GCCAGCCCCCTGATGGGGGCGACGCTCCACCATGAATCACTCCCCTGTGA

CON                     GGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCaTGGCGTTAGTATGAG
1a._.COLONEL_AF290978   GGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAG
1a._.H77_AF009606       GGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCATGGCGTTAGTATGAG
1a._.HEC278830_AJ278830 GGAACTACTGTCTTCACGCAGAAAGCGTCTAGCCGTGGCGTTAGTATGAG

CON                     TGTCGTGCAGCCTcCAGGACCCCCCCTCCCGGGAGAGCCATAGTGGTCTG
1a._.COLONEL_AF290978   TGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAGAGCCATAGTGGTCTG
1a._.H77_AF009606       TGTCGTGCAGCCTTCAGGACCCCCCCTCCCGGGAGAGCCATAGTGGTCTG
1a._.HEC278830_AJ278830 TGTCGTGCAGCCTCCAGGACCCCCCCTCCCGGGAGAGCCATAGTGGTCTG

;
end;

last modified: Thu Jul 19 10:59 2007


Questions or comments? Contact us at seq-info@lanl.gov.