Data Resource for Literature Search

1. Method

    We collected literature dataset by including genes which co-occur with schizophrenia related keywords. Co-occurrence of two entries in a document has often been applied to identify relationship (Roberts, 2006). We used NCBI PubMed automatic term mapping strategy to examine whether a gene and a schizophrenia-related keyword co-occur in the same document. Such a relationship suggests that the gene might have been studied for schizophrenia, and likely the gene is associated with schizophrenia because positive results have often been selected for publication. We evaluated different terms that are related to schizophrenia and selected six of them: "schizophrenia", "schizophrenias", "schizophrenic", "schizophrenics", "schizotypy" and "schizotypal". We downloaded human protein-coding genes from the NCBI FTP and used NCBI Entrez Programming Utilities ESearch to search NCBI PubMed. If a gene and a keyword co-occur in the same publication, a hit would be assigned. The number of hits is taken as the score for a gene. For the six keywords that we used, a gene might have score ranging from 0 (no hit with any keyword) to 6 (hits with all six keywords).

2. Dataset Description

    The literature-search study resulted in 1682 genes (as of 05/25/2009) related to schizophrenia.

Figure 2. Score distribution of literature dataset

    To facilitate the visualization of the co-occurrences, we also highlighted genes and the corresponding schizophrenia related keywords in the page describing detail info of literature search results for each gene.

  • Roberts, P.M. (2006) Mining literature for systems biology. Brief Bioinform, 7: 399-406. PubMed

Copyright © Bioinformatics and Systems Medicine Laboratory All Rights Reserved since 2009.