Working with Non-Public Data
Step 1: Introduction
This tutorial demonstrates two different ways to manage private data in Genome Workbench.
- You've created your own sequence and want to work with it in Genome Workbench
- You want to view your own data/annotation on a publicly available sequence
We'll demonstrates using many of the Genome Workbench tools on data not found in the NCBI databases.
It is recommended that you complete Tutorial 1: Basic Operation first.
Here's a link to the sample data you'll need to complete this tutorial - BX530088_BX572102.
Step 2: Getting Started
For the first exercise, we're going to do the following:
- Load a user-generated AGP file (download sample)
- SPLIGN some mRNAs on that AGP sequence
- Create a FASTA file from the AGP
- BLAST that FASTA sequence to see what's related to it
- WindowMask that FASTA sequence (or part of it) to look for repetitive regions
Genome workbench starts up and displays the main screen. From here choose File->Open from the main menu to load your data file. Gbench understands many different file formats and for this step choose BX530088_BX572102.comp.agp from the data files downloaded. Click Next and then Next again to accept the defaults. Then click Finish to add the data file to a new project.
Now that your data is loaded, you can view it by selecting the data in the project tree, right clicking and choosing Open View. Then choose Graphical View. While this isn't very interesting you can zoom in to see the sequence.
Step 3: Apply a tool to private data
Now let's align an mRNA to our sequence. We will use the SPLIGN tool. SPLIGN, or SPLiced Aligner, is a global alignment tool used in NCBI's annotation pipeline. Search the NCBI Public Databases for NM_020137.3 and add it to the project. Then in the data folder, select both entries. With both chosen, select Tools->Run Tool to open the Tools dialog and choose SPLIGN and Next.
Select BX530088... for the genomic sequence and NM_020137.3 for the Transcript Sequences and click Next.
Add the results to the existing project and click Finish.
Step 4: Export a FASTA file
Select the data file in the Project Tree View we loaded previously. Right click (control click in the Mac OS) on the selected data and choose export. Select FASTA as the format, select a location and give the file a file name. Click Finish.
Now, open the FASTA file you just created. Choose File->Open. Select the file and click Next. Accept the default settings and click Next again. Choose to create a new project and click Finish.
Select the FASTA data in the Project Tree View and double click it. From the Open View menu choose Graphical View.
Step 5: Alignment
From the Graphical View of the FASTA sequence, use region selection to select the entire sequence. Click and drag in the number line at the top of the view to begin the selection. Once you have a region selected, click on the edges and stretch it to the boundaries of the view.
With the entire region selected, choose Run Tool (Tools->Run Tool from the main menu, or Right Click (control-click on the Mac OS)). From the Run Tool dialog choose BLAST Search and click next.
In the BLAST Search dialog ensure you've selected the Nucleotides Option, BLASTn from the Program menu, and BLAST Human Sequences/genome (all assemblies) from the Database menu and click Next.
From the next dialog, accept the general parameters and check the Filter low complexity regions and select Human from the Species specific repeats for: menu. Then click Next. Choose to add the results to the existing project and click Finish.
It can take 30 seconds for the analysis to return and present the results.
Step 6: WindowMasker
In this step we'll use WindowMasker on the FASTA sequence to look for repetitive regions. The FASTA file should still be available in the project tree view. Select it, double click and open a graphical view. Select the region by clicking in the number line and dragging a selection around a region.
Choose Tools->Run Tool from the main menu. Then select Search/Find Repetitive Sequences with WindowMasker and click Next. Ensure that our sequence is selected (BX530088...), select 9606 Homo sapiens from the Mask using parameters for menu. Then click Next. Choose a project to add the results to and click Finish. It can take 60-90 seconds for the job to complete.
The result is a histogram showing regions of repeats. You can scroll and zoom just like you would any other view.
If the histogram doesn't appear automatically, select the content menu at the bottom of the graphical view and choose Repeat Region (see figure).
Step 7: Conclusion
There are many, many ways to use Genome Workbench and this only shows some very simple examples. It gives you enough background to start to explore your data in new and interesting ways. It gives you the privacy you need along with the access to public data desired.
Download
Current Version is 2.6.0 (released August 31, 2012)
- Release Notes
- Windows
- Mac OS X
- Linux (Ubuntu 10.04 LTS (Lucid Lynx))
- Source
- Older Versions