Pulmonary Arterial Hypertension KnowledgeBase (PAHKB)
Pulmonary Arterial Hypertension KnowledgeBase

Data collection of PAHKB and how to use PAHKB:

1. Data collection of PAHKB database

    Four steps to collect PAH-related genes

    Curation PAH-related genes from literature

2. Information for PAH-related genes

    General information and literature evidence

    Gene expression profile

    Gene regulation

    Mutation information

    Protein-protein interaction

3. Query and search database

    Text search of PAH-related gene

    Quick access information in database

    Blast all protein and nucleotide sequences

4. Browse database

    By dataset, chromosome, gene type

    By available animal models

    By highlighted pathway maps

5. Data analysis and download

    Gene ranking result

    Functional enrichment analysis

Data collection of PAHKB database

The primary aim of the database is to support pulmonary arterial hypertension (PAH) research by maintaining a high quality PAH-related gene database that serves as a comprehensive, fully classified, richly and accurately annotated PAH-related gene knowledgebase, with extensive cross-references and querying interfaces freely accessible to the scientific community.

The introduction of pulmonary arterial hypertension can be found here.

Four steps to collect PAH-related genes

Construction of this PAH-related gene database for Human, Mouse, and Rat genes involved four main steps: curation of PAH-related genes from published literatures; mapping the description of PAH-related genes from literatures to Entrez gene database IDs; manual curation of each literature for its organism information; extensive annotations of cellular function, gene expression, mutation, methylation, transcription factor, post-translational modification, and protein-protein interaction.

In detail, curation about PAH-related genes from the literature includes five steps before finally being included in the PAHKB database: exhaustive searching for relevant abstracts from the PubMed and Generif databases using the key words "PAH-related gene;" extracting the description for the PAH-related gene from text; grouping the descriptions extracted from PubMed abstracts and Generfi records by their topics using Entrez related topic function; extraction of gene name from the grouped descriptions of the PAH-related genes; lastly, mapping the gene name to Entrez geneID.

Exhaustive search:
To gain precise abstracts related to PAH, we searched PubMed using the expression: ("pulmonary arterial hypertension"[Title/Abstract] OR "IPAH"[Title/Abstract] OR "HPAH"[Title/Abstract] OR "pulmonary hypertension"[Title/Abstract]) AND (("genome-wide association study" [Title/Abstract] OR "genome wide association study" [Title/Abstract]) OR ("gene"[Title/Abstract] AND ("association"[Title/Abstract] OR "microarray" [Title/Abstract] OR "expression" [Title/Abstract] OR "linkage" [Title/Abstract] OR "proteomics" [Title/Abstract] OR "genetic" [Title/Abstract] OR "metabolomics" [Title/Abstract] OR "copy number variation" [Title/Abstract] OR "idiopathic" [Title/Abstract] OR "hereditable" [Title/Abstract] OR "family" [Title/Abstract] OR "mouse model" [Title/Abstract] OR "animal model" [Title/Abstract] OR "microRNA" [Title/Abstract] OR "mutation" [Title/Abstract] OR "SNP" [Title/Abstract] OR "drug" [Title/Abstract] OR "transporter" [Title/Abstract]))) with a return of 911 PubMed abstracts on 15th, Apr 2013. Next, we extracted 516 sentences from 353 PubMed abstracts from Generif database on 15th, Apr 2013. Combining the two exhaustive searches together, a total of 1161 PubMed abstracts were collected and downloaded in the Medline format for parse.

Extracting description:
To evaluate the information about PAH-related gene, the sentences containing keywords pulmonary or hypertension were extracted from 1161 PubMed abstracts.

Group abstracts:
All the downloaded abstracts are grouped based on topic according to related articles provided by the Entrez. This allows us, quickly and easily, to assess if and how certain gene names are highly related with PAH-related genes. Also, it allows us to access if and how some reference relate to other highly confirmed references about PAH-related gene descriptions.

In this step, we read the abstracts, assess the context given, and add relevant comments and features to the entry. Often from reading the abstract, we can see that the described gene belongs to PAH-related genes. In these cases, care is taken to look at other references about the same gene. The description line for each PAH-related gene is added to the new entry.

Mappping the gene symbols:
A major step in the process of curating an article is mapping the gene name in text to an Entrez gene ID, which will serve as the initial information to crosslink the gene in other public databases. Much care is taken with the synonyms of the gene symbol, and some synonyms deleted or transferred Entrez gene ID.

Information for PAH-related genes  [ top ]

Information is represented on six different types of pages, including general information view, literature highlight view, gene expression view, gene regulation view, gene mutation view, and gene interaction view.

The general information page is like the following:

In this page, users can find the data source and our curated descriptions for PAH-related genes from literature. It is easy to switch to other annotations by clicking the hyperlink at the top of the page.

User can find the details of the literatures with keywords highlighted in the literature highlight page as below. The keyword "pulmonary hypertension" is marked in red; keywords such as "cancer" and "pathway" are marked in brown; and the keywords in the category of "microRNA" are highlighted in green; the keywords such as "mutation" and "expression" are marked in black as shown in below.

The gene expression page is as below:

In this page, users can find gene expression profiles from human PAH-related samples from GSE22356 and lung development related samples from GSE14334. It is easy to view all the sample information by clicking the hyperlink in the profile images. Some genes have multiple probes; to provide an unbiased view for users, we presented all the gene expressions from all probes without any modification.

User can obtain all the sample inforamtion by clicking on the expression images.

The gene regulation page appears as follows:

The transcription factor regulation and post-transcriptional modification information were integrated from the TRANSFAC and dbPTM databases. In addition, the methylation in promoter regions was annotated based on data from the DiseaseMeth database.

The gene mutation page appears as follows:

All the GWAS results were collected from paper titled "Genome-wide association analysis identifies a susceptibility locus for pulmonary arterial hypertension." All the lung cancer related mutations were collected from the COSMIC database.

The gene interaction page appears as follows:

All the related protein-protein interactions were collected from the PathwayCommon database; we further divided the interactions into three main types, including "Physical Interaction," "Metabolic Interaction," and "Signaling Interaction."

Query and sequence search against database   [ top ]

All the PAH-related genes and their annotations in our database are searchable. The text search (Query) and sequence-based blast (Blast) are provided.

Text search of various annotation in our database

Users can search against the PAHKB by typing its name, accession IDs and its characteristics, including genomic location, regulatory, interaction partner, mutation, biological pathway, and genetic disease. In total, we provided four different search forms for users, including "Gene General Information Search", "Literature Search", "Mutation Search", and "Other Annotation Search" allow users to access general information, literature-based information, mutation, and other annotation information respectively.

The search is performed by typing keywords into any field separately or into several fields simultaneously in the query forms. Generally, text search information in the each searching form mainly includes three steps. Take the basic information query as an example below

  • select a specific annotation or field from from the dropdown menu in basic gene information and mutation query forms.

  • Input your interesting keyword.

  • In addition, the basic gene information and mutation query forms support the logical 'And,' 'Or,' and 'Not' operators to combine multiple keywords.

    The search result shows the list of matched PAH-related genes linked to the detailed gene information page below.

    Quick search a list of genes in database:

    To quickly access the information in the database, a quick search form is provided at the top of each page.

    Blast all sequences of genes in our database

    In the BLAST menu, users can search the PAHKB database based on their input sequences. The high similarity PAH-related genes with input sequences will be listed in the BLAST result page. In the input page, users can choose various sequence alignment options such as E-value and identity. The matched sequence signatures are visualized on the query sequence with colored bars containing the alignment score.

    To do a sequence-based search for all the PAH-related genes, please access the BLAST page.

    The output of BLAST is as below

    Click on the hyperlink in the Blast result page, users can access the PAH-related genes in our database.

  • Browse database  [ top ]

    The PAHKB database supports browsing PAH-related genes using cancer types and KEGG pathway maps. In the cancer type page, users can explore the PAH-related genes within specific cancers easily among organized cancer types from NCI. In addition, to help users get a bird's eye view for biological processes of PAH-related genes, the marked KEGG maps were provided.

    Users can browse the PAH-related gene of PAHKB with their annotated feature list rather than the cancer classification and marked KEGG maps. PAHKB also supports annotation-based browsing including chromosome, gene type, data source, data quality.

    Using different dataset, chromosomes, and significant biological annotations

    From the Browser page, users can browse the genes in PAHKB by their chromosome location. Moreover, users can obtain the PAH-related gene lists from different dataset such as the curated PAH,PH,and HPH gene list. In addition, user can also find all the protein coding and recently reported non-coding PAH-related genes in the browser page.

    By available animal model

    From animal model list users can easily browser all the cancer types according to NCI cancer classifiction system.

    By pathway maps

    From pathway list page users can visulize all the KEGG pathways with any PAH-related genes recorded in our database.

    Data analysis and download   [ top ]

    Users can freely download our gene ranking reuslt with all the gene in PAHKB for academic researchers, but not for profit purposes. Please access Analysis page.

    Functional enrichment analysis result.

    From the Analysis page, user can explore the significant enriched biological annotattions

    Clicking on the "Gene ontology enrichment analysis" can lead user to get an graphic view for enriched GO terms for all the 341 human PH-related genes.

    Clicking on the terms highlighted with red color can lead user to get an gene list view for the enriched GO term.

    If users have any suggestion to add new comment to records in current PAHKB or to revise wrong information in current PAHKB,please send us email directly.