SELECT A LAB:

BIMCORE Biomedical Imaging Technology Center DNA Sequencing Electron Microscopy Flow Cytometry Microchemical & Proteomics MRI Center Positron Emission Tomography Center Transgenic Mouse & Gene-Targeting

Software and Services: Blast

BIMCORE has assembled a compute cluster of modest size to offer Blast services locally. The goal is to provide standard Blast services on a local scale so when wait times at NCBI are excessive or external network connectivity is slow or non existent subscribers will have an alternative. You will have the added capability of being able to see your 5 most recent searches using a history link associated with a given account. We are also able to provide command line support for Vector Screening, Repeat masking, and MEGABLAST.

Below is a list of questions you might have concerning use of this service so please browse the list to get an idea of how this service might be of use to you as a BimCore subscriber.



Can anyone us this service ?

This is a web based service which is available to current BimCore subscribers only. The service is password protected and limited to access from machines with a current Lasergene installation.


How do I access this server ?

All primary investigators will be given a userid and password pair. Use a web browser and load up http://saf.bimcore.emory.edu and log in. From there we hope that you will recognize the typical Blast form. Select a program, paste in a sequence and click "submit". You can wait for your result or return to the main blast form to submit other jobs. Finished jobs will be marked as "completed" after which you will be able to examine the results. Your five most recent searches are cached for you by the server. To see more simply click the "history" link.


How do I get my userid ?

Userids and passwords will be distributed to primary investigators via email. We can accomodate id requests from affiliated lab staff though the initial round of ids will be based upon the investigator. After you have logged in it is possible to change your password using the password change form in the "tools" menu.


What Blast related programs are available ?

We offer BLASTN, BLASTP, BLASTX, TBLASTN, TBLASTX via the Web interface

We have command line access to MEGABLAST, WU-BLAST (Washington University Blast), and RepeatMasker


What Blast databases are available ?

You can easily see the list of available databases by examing the list in the drop down "database" item on any of the Blast forms. However here is a list of what we currently have:

  • drosophila
  • ecoli
  • est human
  • microbial genomes (click here to see the list)
  • Non redundant
  • Patent
  • PDB
  • Vector
  • Yeast
  • Swissprot


How often will the be updated ?

Once a month is the target. We use NCBI as the primary data source though at times downloads can be slow or error prone due to network droputs. We are currently automating this system to be as smooth as possible.


Why is there no PSI-BLAST or PHI-BLAST ?

Both of these programs are specialized implementations of other Blast programs. Furthermore they require the databases to be available in a way that we cannot reproduce locally without a purchasing additional disk resources. It should be noted that a single iteration of PSI-BLAST is equal to blastp so unless you are using PSI-BLAST with an explicit number of iterations greater than 1 then you should really be using BLASTP anyway.


How do I access MEGABLAST and other command line tools ?

Access to command line utilities requires an additional account to be setup. The envvironment is UNIX so one should have some facility with that environment. Command line access is available for MEGABLAST, RepeatMasker, Vector Screening, and HMMSRCH. There is also a complete development environment available including access to Perl, BioPerl,C, C++, and Java.


What about the Human and Mouse Genomes ?

All BimCore subscribers are eligible to use the Celera service to search the Human and mouse genomes. Its typically fast and there is a genome viewer available. If you wish to examine the public version of the human genome then please use the BLAT service at UC Santa Cruz - It is very fast since it is optimized for human genome investigation.


What about C. elegans ?

C. Elegans is served primarily thorugh WormBase and the associated mirrors. Due to its heavily integrated nature (Wormbase) we don't attempt to offer it through our Blast server. However if you want a local instance of WormBase and have a Linux machine with plenty of RAM we can install it for you as long as you don't mind making it available to local Emory users for searching.


What are the limitations of the server ?

Our Blast cluster is made up of 6 computers each with 2 prcoessors and 2 gigabytes of memory. Compare this to the 300+ computers at NCBI as well as numerous supporting database machines and one can see that we have a fraction of their overall capability. But by limiting access to local users we hope to provide a reasonably fast Blast server. But this aside there are only two significant limitations

You can only blast one sequence at a time (though it can be of most any length). That is pasting in mutiple sequences will not work. This is intentional to minimize load on the server. If you need to do high throughput blasting we can accomodate that though not using the Web interface. We can arrange for command line access so you can accomplish this.

The second limitation is that the output format is currently limited to the "standard" Blast "pairwise" alignment report which is the default at NCBI. We are working on enabling the other formats and will let you know when that is available.


Can I add individual accounts for my lab staff ?

By default the ids are distributed to primary investigators though ,upon request, we can setup additional accounts for your lab staff.


Can you add my favorite database ?

Probably though we will need a reliable download source for it. In general the cluster is optimized for use with databases of sizes less than 10 Gigabytes. Fortunately most of them are of that size or less but depending upon the nature of the database there may be reasons why we cannot Blast against it. Let us know if you want to add something.


Why can't I paste in more than one sequence at a time to be Blasted ?

This limitation is intentional to control load on the cluster. If you need to arrange for high throughput blasting then let us know and we can accmodate that via command line access.


I have thousands of trace files I would like to Blast. Can I use your server ?

Most likely but it would have to be by arrangement via the command line since the primary purpose of the cluster is for web based use. We have experimented extensively with high throughput Blasting of the Human Genome which involved Vector Screening, Repeat masking, and Blasting. We have found that the larger problem is what to do with the massive amounts of information generated by the Blast and how to parse it intelligently. Thankfully there are progamming tools to accomplish this though they require some customization and programming. We have a development environment installed which provides access to Perl and BioPerl, C, C++, and Java so we can accomodate researchers who wish to analyze their data using programs.


How long will my results be stored in the "results history" ?

A job is executed everyday to purge BLAST Results older than four weeks.


How does the cluster work ?

The cluster is 6 Linux "blades" with Dual Athlon 1600+ processors , 2 GIGS of RAM and 80 Gigs of hard drive space per node. The "head" node is connected to the five "worker" nodes via a 100/10 network switch. We have licensed Platform LSF software to manage the distribution of jobs to the worker nodes. Blast is not a truly parallel application so we take a given query and Blast it six times against one sixth of the target database. Then we merge the individual results and post them to the Web. The partitioned searches are much faster than Blasting against the single larger database.