A Tutorial on Delila Instructions

with examples

by Tom Schneider

Outline

Introduction to Delila Instructions

Terminiology used is described in a glossary.

The concept of the Delila system is to extract fragments of sequence from a library (database) of sequences before beginning any analysis of the sequences. This has a number of advantages, including automating the analysis process, avoiding editing sequences (which will lead to mistakes!), the ability to permanently record the sequences used in a compact form (instructions) and therefore the ability to repeat an analysis. The extraction is done by a librarian program named Delila. One gives Delila instructions for what fragments to obtain and how to mutate them. The returned result given by the librarian is -- of course! -- a book.

An important feature of Delila is that the coordinate system of each sequence in the book corresponds to that in the parent library. This way you won't go crazy trying to figure out the locations of bases - all output has the same coordinate system. (The exception is if you make mutations, in which case coordinates get renumbered on the 'downstream' side.)


Title

Since Delila produces a book, it is natural that the first instruction in a set of Delila instructions is the title to be given to the book:

title "An example book";

Note that delila will accept both single (') and double (") quotes.

You can have any title you like. I would, however, recommend this format:

title 'Fis sites version = 1.81 of fis.inst 2002 Apr 24';
This includes four important components:
  1. Fis sites: the name of the sites
  2. 1.81: the version number which can be used by the ver program. All Delila programs pass the title to the next program (though they may be in comments). So by changing the version every time you change anything in the file, you will always know exactly what is happening.
  3. fis.inst: the file name (note that the type is 'inst'),
  4. 2002 Apr 24: the date.
If you use this format, then you can save backup copies in the form fis.inst.1.81 by using the save script.


Specification

Next, the desired source sequence must be specified. Delila was built before GenBank existed and it assumes that the database is organized by organism and chromosome (as opposed to the current mess of entries). So one defines these:

organism H.sapiens;
chromosome H.sapiens;

Next one needs to choose the particular sequence of DNA, called a piece:

piece LINEAR;
where LINEAR would usually be the GenBank ACCESSION number.


Requests

Having specified the sequence we want, we now can make a series of requests to get particular parts of the sequence. Suppose that the wild-type sequence named LINEAR begins with the EcoRI site 5' gaattc 3', with bases numbered 1 to 180. Then to obtain the entire sequence we can say:

get all piece;

To get the first 6 bases (containing just the EcoRI site) we say:

get from 1 to 6;
The lister program puts an asterisk ('*') every 5th base, and numbers every 10th base. (This way you won't go crazy counting bases - you never need to count more than 3 positions to identify a base.)

To get the second to sixth bases one can say:

get from 2 to 6;
which gives 5' aattc 3'.

One can also get the complement:

get from 6 to 2 direction -;
which also gives 5' gaatt 3'. Note that the asterisk in the figure is still over base 5. Delila retains the original coordinate system, which means that you can compare output from different extractions and the coordinates of the bases remain the same.

Here's a puzzler:

get from 2 to 5 direction +;
get from 5 to 2 direction -;
Why are these the same?

An example longer sequence is:

get from 20 to 1 direction -;
giving 5' aaagtcaactaactgaattc 3', which shows how the coordinate system decreases. (Note the EcoRI site at the 3' end.)

Having obtained the sequence(s) we want, Delila's job is over. Other programs are used to display and analyze the sequence. For these examples I used the Lister program for the figures. Lister gives the sequence, carefully labeled with 5' and 3' on the ends. Every 5th base is marked by an asterisk, and every 10th base is numbered. This way you will never need to count more than 3 bases to determine the coordinate of any base.


Relative Coordinate Requests

A powerful way to get sequences is relative to a particular point:

get from 3 -2 to 3 +2;
which gets 2 bases before coordinate 3 to 2 bases after coordinate 3, that is from base 1 to base 5: 5' gaatt 3'. Generally one does not want to repeat the second coordinate, so one can use the command:
get from 3 -2 to same +2;
where 'same' refers to the coordinate given after the word 'from'. This is the most convenient form for specifying binding site locations. For more examples, see: Making Delila Instructions for Symmetric Sites.


Making Mutations

There are three ways to make changes.

1. A CHANGE requires the previous base, the coordinate to change and then the new base:

get from 1 to 6 with g1t;

gives taattc. The base that changes from a G at 1 to an T is marked by the tail and head of an arrow. The figure is produced by first running Delila to extract the sequence(s) and to produce the marking information. This information is then used by Lister to create the postscript.

How do I write my instructions if I want the complementary sequence?
Glad you asked. Coordinates of changes are always given on the original wild-type coordinate system. The rule is:

The coordinates given in the mutation and the sequences given refer always to the sequence written 5' to 3' in the *positive* coordinate direction.
The reason for doing things this way is that you would go absolutely crazy if you had to change the definition of the mutation merely if you wanted the complementary sequence!

For example, starting again from 5' gaattc 3':

get from 6 to 1 with g1t;
Delila makes the mutation and then complements the sequence to give 5' gaatta 3'. Note that the first sequence in the illustration is already complemented. You can see this because the asterisk ('*') marks the 5th base.


2. An INSERTION uses two coordinates and a sequence. The sequence BETWEEN the coordinates is removed and the given sequence is inserted.

get from 1 to 6 with i2,3cc;
gives gaccattc.


Changing that to:

get from 1 to 6 with i1,4cc;
does a replacement to give gccttc.


Finally,

get from 1 to 6 with i1,4;
deletes to give gttc.


Note that any change can be made with this definition; the other methods are available for convenience.

3. A DELETION takes two coordinates. The sequence INCLUDING the coordinates is removed.

get from 1 to 6 with d2,5;
gives gc. Coordinates outside the end of the piece are allowed.


Combined changes are possible. Separate the changes with periods:

get from 1 to 6 with g1t.i1,4cc;
gives tccttc.


Mutation Analysis: example

title "ABCR mutation";
organism H.sapiens;
chromosome H.sapiens;
set doubling on;
piece Y15651;
name "mutation at exon 17 acceptor";
get from 63 -25 to same +7 with g64a;

Two new commands are introduced here:

set doubling on;
which tells Delila to give both the original sequence and the sequence with the mutation and
name "mutation at exon 17 acceptor";
which tells Delila to name the new sequence. The result, when displayed by the lister program, is:

Note how the mutation affects both walkers simultaneously. (See ABCR Mutation G863A for more information about this curious mutation.)


Comments

The Delila language provides two ways to create comments in the instruction files. Both are 'Pascal-like' since the same form is used in the computer language Pascal:

(* Two character comments *)
and
{ One character comments }
Material inside comments is ignored by Delila. Comments of one type can be nested inside the other type. I commonly make my comments using (* and *) and then use the braces { and } to block off instructions I don't want temporarily.

I strongly recommend putting in the date and the file name in the title, and at least a short description of what the instruction set is about in a comment. It is also useful to add citations for evidence that the sequence is a binding site, and to mention the kind of data that supports this (e.g. footprinting, gel shift assay, mutations).


Making Delila Instructions for Symmetric Sites

Binding sites can have three kinds of symmetry, as discussed in the glossary entry on binding site symmetry. The corresponding Delila instructions are of increasing difficulty:

Note: the ranges given above are only examples. We generally take a very large range such as -200 to +200 for our initial analysis to get a feeling for the background noise of the information curve.


Setting Parameters

Delila has a number of parameters that have preset values which you can change. You can use the word 'default' or 'set' to change them.


Full Definition of Delila

If you would like to know more about the Delila language, then you can look at the LIBrary DEFinition, LIBDEF.


Automatic Generation of Delila Instructions

The delila system has a number of ways to automatically generate delila instructions:


Original References:




Schneider Lab

origin: 1999 May 2
updated: version = 2.05 of delilainstructions.html 2009 Jan 27