ERGR Help

1. Data Source

Species	Method	No. of Datasets	No. of Genes	Source
Human (Homo sapiens) (total 3311 genes)	Microarray expression	7	831	Literature
	Genome-wide association	1	71	Literature (PMID: 16894614)
	Linkage	3	918	Literature
	HuGE Navigator	9 phenotypes	203	HuGE Navigator (http://www.hugenavigator.net/)
	Addiction array list	1	130	Literature (PMID: 18477577)
	Literature search	1	1726	literature
Mouse (Mus musculus) (total 2129 genes)	Microarray expression	11	682	Literature
Mouse (Mus musculus) (total 2129 genes)	QTL	21 QTLs	1568	PARC Alcohol QTLs
Rat (Rattus norvegicus)	Microarray expression	6	679	Literature
Fly (Drosophila melanogaster)	Microarray expression	2	614	Literature (PMID: 17054780, 17973985)
Worm (Caenorhabditis elegans)	Microarray expression	1	228	Literature (PMID: 15028283)

2. Sequence ID conversion

Different publications may list different original sequence IDs (Gene symbol, Gene name, Nucleotide accession number, EST accession number, Clone ID, Affymetrix probe ID, UniGene ID, Gene ID etc.).�We used the NCBI Entrez GeneID as an unique cross reference ID.
The following 3 methods were used to get the NCBI Entrez GeneID.
1) Extract GeneID by the original IDs by parse NCBI gene2accession, gene2unigene and gene_info files (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/)
2) Use online tool GeneID Converter (http://idconverter.bioinfo.cnio.es/IDconverter.php)
3) Manually query NCBI nucleotide, EST and Gene databases.
For some gene identifiers (original IDs) obtained from source publications, no corresponding GeneIDs is available. In these cases, the original IDs were retained in the ERGR datasets. However, these entries do not have detailed annotations available since there are no corresponding GeneIDs.
For human linkage datasets, the marker names were obtained from source publications and the corresponding physical marker positions were found using UCSC genome browser. Once the linkage region was physically defined, the genes mapping to the region were obtained from Ensembl.
For the mouse QTL datasets, QTLs with significant status and assured map regions were obtained from the PARC Alcohol QTLs website, a curated set of mouse alcohol QTL results. The genes in the region were derived from the Mouse Genome Informatics (MGI) website.

3. Gene Annotation

1) Gene information, include GeneID, symbol, full name, alias, chromosome, genetic location, physical location, and gene type etc.
The gene information was extracted from gene_info file downloaded from NCBI ftp ( ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/GENE_INFO/);
2) Gene ontology (GO) annotation
The GO annotations were parsed from gene2go file, which was downloaded from NCBI ftp ( ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/);
3) Ortholog information
Human to mouse and human to rat ortholog information were downloaded from MGI download page;
Human to fly and human to worm ortholog information were from Inparanoid database (Version 6.1, build at Apr. 22 2008).
4) Gene refseq sequence annotation
We download the gene2refseq file from NCBI ftp (ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/). Because there are several alternate assemblies for human, mouse and rat, we only extracted the refseq information from the reference assembly. For the reference assembly, there also always have several genomic accessions. We only extracted the genomic accessions begin with �NC�, which are the whole chromosome sequence accessions.
5) Other database cross links
Some database cross links are available for each gene: NCBI Entrez Gene, MGI, RGD, Flybase, Wormbase, Ensembl, dbSNP, AceView, HuGE, OMIM etc.

4. Web Interface

Current URL of ERGR is http://bioinfo.vipbg.vcu.edu/ERGR/index.php.
Users can browse or search the data at different levels.
Browse:
1) browse by species;
2) browse by method;
3) browse by chromosome;
4) browse all datasets.

Search:
1) Quick search for GeneID or gene symbol at head of each page, it support wildcard search (e.g. ADH*);
2) Search page can search by combining multiple terms, such as GeneID, symbol,name, alias, species, phenotype GO annotation, and chromosome location.

Data Integration:
1) dataset union and integration.
2) Candidate gene selction based on evidence in multiple datasets or organisms.

BLAST page can do blast search against the ERGR gene mRNA or protein sequence.
Help page list this documents.
Link page list links for some alcohol research group or useful databases.
Phenotype page list all the phenotypes in ERGR, including descriptions and references.