General Information
In order for TAMAL to run quickly, a considerable amount of pre-processing is conducted as part of building the TAMAL database. The updating process is computer-intensive, takes about a week, and is done quarterly.
First, curated reference sequence (RefSeq) gene data are compiled. For genes with multiple isoforms, the maximal 5’ to 3’ extent of each gene is computed. This creates the TAMAL gene table.
Important! Note that TAMAL is based on standard gene names per the approved HUGO nomenclature. Some genes may be known by different names in other areas of science. For example, the gene ”neural cell adhesion molecule 1” located on chromosome 11q23-q24 is known as NCAM1 and not NCAM or CD56.
Second, SNPs are collected and/or updated from dbSNP, HapMap, Perlegen, and Affymetrix along with minor allele frequencies (MAFs) in samples of African, Asian, and European ancestries. SNPs with no evidence of variation in any population are removed. SNPs that map to multiple locations, random/unknown chromosomes, and multiple SNPs at the same genomic location are filtered out.
Third, TAMAL annotates the SNPs. This is done in multiple ways. There are three main categories of annotation.
- Tag SNPs. TAMAL provides tSNPs for the three major continental ancestries (Africa, East Asia, and Europe) based on TAGGER, Gabriel, and Perlegen methods. The first two are computed by running all HapMap data through HaploView on a 32 node cluster farm courtesy of the UNC Renaissance Computing Institute. The third was computed by Perlegen. In addition, Mike Weale has written routines efficiently to interface TAMAL with TagIT.
- Coding SNPs. TAMAL notes SNPs that lead to non-synonymous or synonymous amino acid changes (from dbSNP annotation) augmented with in silico prediction of functionality from LS-SNP along with SNPs that alter an intronic splice site (dbSNP).
- SNPs in predicted genomic features. TAMAL flags SNPs in predicted promoters (in silico prediction but with biological validation), in regions of predicted regulatory potential, in predicted transfactor binding sites, highly conserved regions (conservation scores =99th percentile genomewide for a human-chimp-rat-mouse-chicken alignment), and conserved 3’ UTR predicted miRNA target sites.
Fourth, all these data are merged to create the TAMAL SNP table. The entire database can be downloaded (
tamal.zip, 88 mb).
When you enter a gene or list of genes, TAMAL looks it up in the gene table to get the genomic location. It then queries the TAMAL SNP table, applies the filters you choose, and displays the results.