We have a new tool called
Pixel that visualizes large alignments
as an image, using 1 pixel per residue. It can be very helpful when the alignments
get large enough that you lose the overview, and quickly shows problems and
misalignments. It works for both nucleotide and amino acid alignments.
25 October 2008
The
Codon alignment tool
can restore codons in your alignment so that the resulting alignment can be
immediately translated. Currently this tool requires an input reading frame
(or does its work for all 3 frames). In the future we will connect it
to the reference sequence and the 'locate' tool so that it can find
the correct reading frame automatically.
21 June 2008
There are new ready-made alignments for all categories (all, consensus, and genotype
reference). They are called '2008 alignments' and are now the default for the
alignment interface.
26 May 2008
We added a beta version of the new tool
HCValign, which uses our
HCV HMM alignment model to align user sequences. It can also codon-align
the sequences, and separate individual HCV genes, and thus is
a near-successor to Gene cutter. Please note
that the tool has not yet been extensively tested. We would appreciate bug
reports and other feedback.
07 May 2008
Try out
Phyloplace!
This tool was designed to help users decide whether their
sequence fits inside a currently knows genotype and/or subtype, or would be
better classified as a new one. It can use either an intuitive distance-based
method or the phylogenetic tree-based Branching index, and produces user-friendly
graphical output for both. The tool also shows some promise for easily finding
potential recombinants.
14 January 2008
The
Treemaker tool now
lets you download your input sequences sorted in the order of the tree, which
makes it much easier
to select sequences in the alignment based on their phylogenetic behavior.
06 November 2007
We have added a new tool called
ElimDupes, for Eliminate Duplicates.
It will take your alignment and remove duplicate sequences. Several options can be set,
and the sequences can be automatically divided into groups (e.g. if there are data
from several patients with multiple sequences each). In the future we will add an
option to also remove sequences that are more than x% similar where x<=100.
31 October 2007
We will keep a list of tools we are working on, and
tools for which the problems we are aware of have been fixed.
The list will be updated frequently.
You can find it
here.
31 October 2007
We will be making some infrastructural changes to the HCV database website in
the coming week. This may break some links and bookmarks. We hope to minimize
the impact, but please bear with us while we are at work. Please let us know
of any problems at
hcv-info@lanl.gov.
22 October 2007
Regretfully, NIH has decided to stop funding this project.
The funding has been moved to the Viral Bioinformatics
Resource Center, who will be maintaining a different HCV site at
http://www.hcvdb.org/.
Because of this, we can no longer maintain the work-intensive
HCV immunology database.
The HCV immunology database will remain accessible
for the foreseeable future, but due to lack of recources, no new
information wll be added. We will display this information
at the top of our immunology web pages. For new epitope information,
users of this database can try the Immuno Epitope Database
(http://www.immuneepitope.org).
The HCV sequence database is still being maintained, although not
quite as diligently as before. The website will be tied even
more closely to the HIV site, so that all new HIV tools will also be
available for HCV. When we feel (or you let us know) that the annotation
quality begins to significantly deteriorate, this site will be closed.
The VBRC people are working hard to provide a worthy successor to this
database. Please take a look at
what they are doing, and give them a chance to come up to speed.
If you still think the disappearance of our database and website will
seriously affect your work, you may be able to help in two ways.
First, if you can contribute financially, this will help us to free up
resources to keep the database and website alive longer. Second, you can
send feedback to Dr Caroline Heilman, the director of DMID, the
NIH program office that distributes the HCV funds, and/or to
Dr Valentina DiFrancisco, the new HCV database program officer
(their
contact information is here).
And please CC us at hcv-info@lanl.gov, so we can keep a record.
|
21 October 2007
In the coming weeks we will be rolling out our new site design. We think it is
a vast improvement over the old one. We will try hard to minimize the
inconvenience, but while we are updating it you may notice a few glitches. If
you find a problem, please send an email to
hcv-info@lanl.gov. This will put it at the top of our "to be fixed" pile,
so it will be solved sooner.
12 October 2006
The Principal Coordinate Analysis tool
PCOORD
has been improved by the addition of options to strip gaps
from the alignment and to calculate distances using either ID or Smith amino acid matrix scoring.
The program works with both amino acid and nucleotide alignments;
the tool identifies them automatically.
11 October 2006
The
new version of Gene cutter can align your nucleotide sequences,
codon-align the coding regions, clip pre-defined regions from a sequence
or alignment, and provide alternate translations for codons with ambiguity codes. It no
longer requires the reference sequence to be in the alignment.
30 August 2006
Finally we have added a simple tool to
translate nucleotide to amino acid sequences.
20 July 2006
We have added a
site map with a
concise overview of (and links to) all the available tools.
19 July 2006
The
curated alignments have been updated, and separate alignments are now also available for
the putative ARF-P (alternate reading frame) protein and for the Okamoto region.
18 July 2006
We have created a page that lists several
primer sets
that people have used successfully to amplify different HCV genotypes. On this page you can also
sign up for a mailing list that can be used to ask (and answer!) HCV primer-related questions.
21 February 2006
The HCV search interface will now automatically pad your downloaded alignments with gaps,
so all sequences will have the same length. Superfluous columns containing only gaps will
be removed ("squeezed") by default, but this option can be switched off (for example if
it is needed to maintain the reading frame).
09 December 2005
A new tool,
Branchlength, is available, again based on a Perl program
initially created by Bette Korber. You provide a Newick treefile, the tool draws a
'clickable' tree and shows cumulative branch lengths from the selected node.
08 December 2005
A
web interface is now available for Bette Korber's
program VESPA (Viral Epidemiology Signature Pattern Analysis). This program compares two alignments
and identifies the most consistent differences ("signature patterns") between them. It can be used
to find amino acid or nucleotide positions that best distinguish two groups of sequences.
23 November 2005
A 'Last modification date' field has been added to the
search interface. You can find it in the 'Other fields'
pulldown menu, and it takes the same >MM-DD-YYYY
(don't forget the dashes) format
as the Download date field. The field is updated any time any
field in the database that is associated with that sequence is
changed.
19 October 2005
The
search interface
can now take aligned user sequences as input. The search will be automatically limited to database
sequences that span the same region. Trees can be built
via the search interface that include the user provided alignment, search results, and genotype
reference sequences.
11 October 2005
Since the publication of the
the new HCV nomenclature proposal, the HCV database has switched to using the H77 sequence
as a reference sequence, instead of HCV-H. The main advantage of H77 is that its 3' UTR
is much longer. All genes and proteins in the rest of the genome are equally long in both
strains, so the coordinates do not change except in the 3' UTR. The changes affect the
search interface and the Sequence Locator, Primalign and Epilign tools.
11 October 2005
Another consequence of the
the new HCV nomenclature proposal is the division of genotype assignments into "provisional"
and "confirmed". By default, all genotypes are included in search results, but the search
interface offers the possibility to limit the search to only confirmed genotypes. More
information is
here.
11 October 2005
You can now include the genotype reference sequence alignment when you download sequences
from the database. The reference alignment for the correct region will be selected automatically;
if your search criteria did not include a genomic region, the complete genome reference alignment
will be used.
16 August 2005
The
search interface now includes a Download date field, which can be used to search only
sequences that were downloaded from Genbank before or after that date. The field can be
found in the Other fields menu; help is available by clicking any of the search interface
field names.
15 August 2005
NIH has agreed to fund the sequencing of a number of complete genomes of
unusual HCV variants. We are looking for samples for a large number of
genotypes that have been found in at least three unrelated patients, and
for which fewer than three complete genomes have been sequenced; they are
listed
here.
If you would like to collaborate on this project by donating samples,
please contact us at
hcv-info@t10.lanl.gov
25 June 2005
Try out the
new search interface! It lets you build phylogenetic trees,
directly from database sequences that you have retrieved, including reference sequences. You
can also
- download the sequences in many different formats
- create sequence names so they contain information from many fields in the database, and determine the separators and missing value characters
- download background information
- easily select groups of sequences
- sort the sequences on start and end coordinates
- and more...
24 June 2005
The
antibody section of the HCV immunology database is now available. It
contains multi-part entries for HCV-specific antibodies with references and
notes, antibody epitopes summary table, antibody epitopes maps, antibody
index by name and antibody index by binding type.
26 May 2005
You can now automatically retrieve the "Okamoto region", a frequently sequenced region
of ~300nt in NS5B, using the
search interface.
Select the "Okamoto region" from the "genome region" menu. Also read the
background information about retrieving
this region.
15 April 2005
We have added protein F (ARFP, the alternate reading frame protein) to the
search interface; you can now search for this protein, and
(perhaps more importantly) download it aligned. Also see the
help
text about this feature.
14 April 2005
NIH has agreed to fund the sequencing of a number of complete genomes of
unusual HCV variants. We are looking for samples for a large number of
genotypes that have been found in at least three unrelated patients, and
for which fewer than three complete genomes have been sequenced; they are
listed
here.
If you would like to collaborate on this project by donating samples,
please contact us at
hcv-info@t10.lanl.gov
08 April 2005
An updated HLA anchor residue motif scanner
Motif Scan is now available to scan for
possible epitopes in any protein sequence or alignment, or in predefined HCV
consensus sequences. A new feature allows also to scan simultaneously for
all possible motifs and potential epitopes in a protein sequence.
30 March 2005
The
Epitope
Location
Finder (
ELF)
can quickly find probable epitopes within a protein sequence, based on HLA anchor motifs or on previously described
epitopes stored in the immunology database. It can also help identify potentially missed CTL reactivities due to variations
in the sequence strain selected as a basis for the peptides used to test the response. The tool is meant to be a
workbench for experimentalists who want a fast summary of their peptide CTL reactivity results.
29 March 2005
The
search interface has two new 'exclude' checkboxes, to
automatically exclude synthetic sequences (from the Genbank SYN database, sequences that result from laboratory
manipulation) and "bad" sequences which either contain more than 10% N's or IUPAC codes,
or which the HCV database staff judged to be 'suspicious'. See the
search interface
help page for more information.
22 March 2005
A 'gapsqueeze' option has been added to our
Gapstrip tool.
'Squeezing' gaps means removing all columns that contain only gaps. The tool now also automatically reads
any valid file format, and returns an output file in the same format.
17 March 2005
Try out
Synchaligns,
a new tool that can synchronize (or "merge") two alignments:
Alignment 1: seq1 JKLMNYOPQR-ST Alignment 2: seq3 HIJK-LMN-P
seq2 JKLMNYOPQRYST seq4 --JK-LMNOP
seq5 HIJKXLMNOP
Result:
seq1 --JK-LMNYOPQR-ST
seq2 --JK-LMNYOPQRYST
seq3 HIJK-LMN--P-----
seq4 --JK-LMN-OP-----
seq5 HIJKXLMN-OP-----
|
16 March 2005
Our
BLAST interface now
lets you Blast your sequences against a subset of all genotyped sequences; you can select
each genotype or any combination of genotypes.
31 January 2005
The HCV sequence database will collect data on, and provide access to
external sequence sets
that have not been deposited in Genbank. These sets are often
generated for purposes of genotyping or other clinical use and are often unannotated,
but they can still be useful for some types of analysis. Contact information is
available on request. We invite people who have such sets and are willing to provide them
to researchers upon request to contact us so that these sets can be added to the list.
01 December 2004
The 3' UTR alignments have been added to the
ready-made alignments page.
30 November 2004
The
Conferences overview page now also includes abstract deadlines.
18 November 2004
Bill Bruno provided the code for
FindModel,
which is similar to Posada and Crandall's
Modeltest script, but uses Ziheng Yang's
PAML
as a back end. FindModel analyzes your sequence alignment
to determine which evolutionary model fits it best; you can then use this model to build a better tree.
03 November 2004
The
ENTROPY program calculates and plots the Shannon entropy
(a measure of variability) for each position in an alignment. It can also
compare the entropy of all positions in two alignments, and perform a permutation-based statistical significance
test to find positions with different variability.
17 September 2004
The CTL and Helper sections of the
HCV immunology database are available!
04 September 2004
The data that were used for the
HCV variability graphs can now be viewed or downloaded.
03 September 2004
Two new annotated fields have been added to the database: HIV coinfection (confirmed/excluded/unknown) and
HBV coinfection (same values). They can be searched using the "Other fields" pulldown menu in the
search interface.
01 September 2004
It is now possible to download the listed search results as a tab-delimited file, with or
without the actual sequence. Scroll down to the bottom of the search results page.
23 August 2004
An new
MotifScan is available that is adapted for
immunological motifs, and can be used to locate HLA anchor motifs in a protein sequence or alignment, or in predefined HCV-H
and HCV consensus sequences.
12 July 2004
The updated
HCV curated alignments are available.
Pre-made consensus sequences have been added to the collection.
08 July 2004
The alignments you retrieve using the
search interface will now also be codon-aligned,
and in general should translate to amino acids without moving any gaps. This may mean
that the alignments are not exactly the same as alignments retrieved previously; please
let us know if this creates a problem. We have also added the option to include HCV-H in any
alignment you retrieve.
23 June 2004
The
search interface now lets you select multiple (adjacent) genes. If you select two non-adjacent genes, for example Core and
NS5B, the genes in between will be selected automatically.
22 June 2004
Epilign now includes a SUMMARIZE function
that shows the frequency of each variant of your epitope.
04 June 2004
The
sequence locator tool now also does reverse lookup:
you can give it coordinates in HCV-H and it will find the corresponding amino acid sequence. If you want
to do the same for nucleotides, please use the search interface.
05 May 2004
We have a new and versatile
consensus tool. Among
other things, you can set different thresholds for unanimity and majority,
divide your alignment into blocks and automatically calculate a consensus
for each block as well as a consensus-of-consensuses, and flexibly deal with
gaps and non-standard characters.
03 May 2004
We have created a series of graphs showing the genotype and subtype
variability of all
HCV proteins.
Three variability measures are provided: a histogram of the frequency of
pairwise distances within genotypes and subtypes; a sliding window graph of
the entropy of each protein, and a
position-by-position plot of the estimated ds (synonymous changes)
and dn (non-synonymous changes) of each protein.
29 April 2004
We have a new version of
BLAST
that also accepts protein sequences. It
automatically recognizes those, and when a protein sequence is submitted,
TBLASTN is used to search the BLAST database. This version also offers the option
of excluding sequences that do not have a genotype, which can be convenient
if your query sequence has many matches that are ungenotyped and you want to check its genotype.
28 April 2004
We have added a page of downloadable database software. You can use this
software if the datafile you want to analyze is very large, or if you want
to run batch analyses. The page can be found
here, and we have added a link to it on the navigation bar.
06 April 2004
We provide an
overview of assigned genotype/subtype designations. If you think you have found a new genotype or subtype, please consult this table to avoid conflicts with existing designations. If you have already assigned geno/subtypes that are not in this list, please contact us and we will add them immediately.
18 March 2004
We have an updated version of
Sequence Locator. This tool finds the coordinates of your input
sequence(s) relative to the start of the HCV-H genome, the CDS of
the polyprotein, and each individual protein it overlaps.
It also produces a map showing the location of your sequence, and
if you submit a
protein sequence, it lists the corresponding
nucleotide sequence in HCV-H.
26 February 2004
It is now possible to retrieve "clean" sequence sets that include only one
sequence per patient (or cluster of epidemiologically related patients).
When you use the search interface to retrieve sequences
and include
the genomic region in your search, a button will show up on the results
page that says "Exclude related". When you click this button, a clean
sequence set will be returned.
This function only works when the genomic region field is included in the
search, as it makes no sense to delete a sequence from one region
because another region from the same patient is already in the set.
In conjunction with this, clusters of epidemiologically related sequences
have been defined. You can search for a cluster, or get an overview
of existing clusters by going to the search interface, selecting 'Cluster
name' from the 'other fields' menu, and typing a '_' in that field; this
will list all sequences with a defined cluster. More information is
here.
20 February 2004
The HCV sequence website is now searchable; click on the "Enter the HCV sequence database" link above and use the search box in the navigation frame.
18 November 2003
We have created an alignment of
flavivirus
complete genomes. The alignments are provided on an "as is" and "let the user beware"
basis. We are still working on improving them; significant improvements
will be announced here.
12 November 2003
Please try the new automatic format converter,
OmniRead. The program is based on the Readseq and Fmtseq programs, and attempts to capture possible conversion errors. It does NOT do a perfect job of recognizing formats, but can be used for many of the most common formats, including several that
SeqConvert does not read, such as phylip. It also produces some output formats that SeqConvert does not offer, for example PAUP/Nexus.
30 October 2003
Several new fields have been added to the
search interface
. It is now possible to search on ALT level and whether or not the person was drug naive at the time of sampling (in the 'Other fields' pulldown menu), as well as on the therapy resonse at the end of the study. Also, sequences obtained from known non-human hosts and sequences used for patent applications (which are usually badly annotated and often duplicates of other sequences in the database) can be automatically excluded from the retrieval.
22 October 2003
Treemaker now allows different distance models to create a tree.
21 October 2003
Epilign, the amino acid/epitope equivalent of
Primalign is now available for HCV.
This program takes a user input sequence (amino acids in the case of Epilign, nucleotides in the
case of Primalign) and aligns it to the corresponding region in the complete genome alignment. The
program is designed to produce a quick overview of how conserved your epitope is.
02 October 2003
The
BLAST interface HCV BLAST interface now accepts accession numbers as a well as sequences as input.
23 September 2003
A new (and improved)
Primalign program is now available.
This program takes a user input sequence and aligns it to the corresponding region in the complete genome
alignment, so it gives a quick overview of how conserved the region is that your primer covers. It also
takes reverse complements.
27 August 2003
You can
submit comments and suggestions to the HCV database.
18 August 2003
The
sequence locator tool now also searches for reverse complements.
13 August 2003
Annotation of the HCV
genomic map has been expanded.
18 July 2003
A bug in Motifscan has been fixed, for now by deleting the epitope alignments; these will be
reinstated when the HCV Immunology Database is up, and epitope alignments are available.
16 July 2003
The ready-made alignments of (near-)complete HCV genes and genomes are now available.
We provide both alignments of all available genes, and 'genotype reference' sets which
contain a few representatives of all genotypes in the database. For most of these sets,
not all genotypes are well-represented in the database. The alignments can be downloaded
here.
15 July 2003