exomewalker logo

Prioritization of whole-exome data by random-walk analysis of protein-protein interactions

Tutorial

This tutorial will take you through all the steps needed to use ExomeWalker to prioritize whole-exome sequencing data using protein-protein interaction data. See the individual pages in the tutorial menu for more details about individual steps.

User-Supplied data

Users of Exome Walker need to supply three pieces of information

  1. A VCF file representing the whole-exome sequence of an individual with a Mendelian disease for which the disease gene is being sought
  2. Settings for performing basic filtering of the variants in the VCF file
  3. A disease-gene family, either one of the predefined families from OMIM or a list of Entrez Gene ids representing the disease gene family.

The Disease-Gene Family

The Exome Walker searchers for a novel disease gene that is located close to other genes with similar phenotypes in the PPI network. We use the ca. 250 disease gene families that are defined by OMIM's phenotypic series. Alternatively, users can enter their own disease genes families given as a comma separated list of Entrez Gene ids (see link to NCBI Entrez Gene).

The output file

Once users have entered the information as described above and pressed on the GO button, the Exome Walker filters the VCF file and then performs Random Walk Analysis on all remaining genes. The results are presented as a ranked list of genes and the random-walk score they were assigned. Links to the corresponding entries in Entrez Gene as well as to a view in the UCSC Genome Browser with a view of 10 nucleotides to the 5' and 3' of the variant are presented, with a summary of all affected isoforms of the genes.

A sample VCF file

Users can download a sample VCF here: exomewalker.vcf. This file is an excerpt of a VCF file derived from the exome of a healthy individual (see Glusman et al. Low budget analysis of Direct-To-Consumer genomic testing familial data. F1000Res. 2012 Jul 16;1:3). The chromosomal coordinates of the mutation p.Y23C in the DPM2 gene were spiked into the VCF file as a homozygous variant. To test the functioning of ExomeWalker, download this file to your harddisk, and then upload it from the Do analysis page, set the filter settings to autosomal recessive and otherwise default, and choose the disease gene family Congenital disorders of glycosylation, type I. The DPM2 gene should be prioritized in first place. You will see multiple protein protein interactions with seed genes, including a number of direct interactions with other Congenital disorders of glycosylation, type I genes such as DPM3DPM2, ALG3DPM2, DOLKDPM2, DPAGT1DPM2, etc., as well as multiple second degree interactions such as DPM1PIGBDPM2.