Summary

EpigenCentral is a web resource for the interactive analysis of epigenomic datasets related to rare diseases. It allows users to classify their DNA methylation (DNAm) data samples into known disease subtypes, quantify the pathogenicity of genetic sequence variants, and discover new patterns of differential methylation between different sample groups. EpigenCentral has accumulated DNA methylation patterns associated with specific genetic mutations that lead to neurodevelopmental disorders (NDDs). These patterns are used in the classification tasks.

Disease Mutation Disease Pattern Data Source Study
Autism spectrum disorder 16p11.2 deletion table_16p GSE113967 Siu et al., 2017
Autism spectrum disorder CHD8 gene table_CHD8 GSE113967 Siu et al., 2017
CHARGE syndrome CHD7 gene table_CHD7 GSE97362 Butcher et al., 2017
Down syndrome chr21 trisomy table_Down GSE52588 Bacalini et al., 2015
Dup7 syndrome 7q11.23 duplication table_Dup7 GSE66552 Strong et al., 2015
Kabuki syndrome KMT2D gene table_KMT2D GSE97362 Butcher et al., 2017
Nicolaides-Baraitser syndrome SMARCA2 gene table_SMARCA2 GSE125367 Chater-Diehl et al., 2019
Sotos syndrome NSD1 gene table_NSD1 GSE74432 Choufani et al., 2015
Weaver syndrome EZH2 gene table_EZH2 GSE74432 Choufani et al., 2020
Williams syndrome 7q11.23 deletion table_Williams GSE66552 Strong et al., 2015

Quick Start

A sample dataset named “Kabuki” has already been pre-uploaded into the EpigenCentral's guest account, to help a new user explore the portal's analysis features. For a quick trial please follow these steps:

  1. Login: use guest as both the username and the password.
  2. Go to the Analyze page. In the dropdown list Dataset at the top of the page, check that the selected dataset is “Kabuki”.
  3. In the Disease classification tab, select the checkbox “Kabuki syndrome: KMT2D gene”.
  4. Click on the button Create Run. This should submit the analysis and automatically take you to the Results page.
  5. Refresh the Results page occasionally and monitor the progress status until the analysis report becomes available.

Please note that others may also log in as guest and examine or delete the data or analysis results within the guest account. Therefore, we recommend using the guest account only for demonstration purposes with datasets that are not sensitive.

Tutorial

The following is a short tutorial on using EpigenCentral. To explore other features of the portal, please consult the current User Guide.

EpigenCentral supports the analysis of DNA methylation datasets based on Illumina HumanMethylation450 and HumanMethylationEPIC platforms. From a user’s perspective the key task is to prepare their microarray dataset for analysis. Afterwards the workflow includes three main stages: upload the DNAm data set and sample sheet, select analysis tasks and parameters, and review the results. These stages are reflected in the three menu items at the top of the EpigenCentral page: Upload, Analyze, Results.

Login

Create an account with your username and password: first click on the Sign In link in the top-right corner of the EpigenCentral web page, then click on the Register now link in the popup window.

Data preparation

This tutorial will use the GEO dataset GSE116300 on Kabuki syndrome generated by Sobreira et al. (PubMed: 29255178). The first task is to download the data, then arrange it by Illumina chip ID and prepare metadata using a short R script.

  1. Install RStudio on your computer.
  2. From the GEO dataset website, download the archive file GSE116300_RAW.tar that contains the IDAT data files.
  3. Extract the archive, thereby creating a folder GSE116300_RAW on your computer.
  4. Download and place the R script preprocess.R into the folder GSE116300_RAW.
  5. Open the R script in Rstudio and run it. The script should arrange IDAT files into folders based on the Illumina chip, remove unused auxiliary GEO files, and create two additional metadata files by retrieving sample information from GEO.

These preparation steps should create the following data collection:

Upload page

Follow the steps in the video below to upload the dataset to EpigenCentral. There are two main types of upload: Guided and Bulk. We recommend Guided upload for new users.

Analyze page

Follow the steps in the video below to set parameters and submit a new data-analysis run.

Keep other parameters as defaults or customize them if desired. For example, for the “Disease classification” tab try additional options. Or change the “p-value adjustment” method between Bonferroni, FDR and none (i.e. no adjustment for multiple testing), which should have a large effect on the size of the returned DNA methylation patterns. To learn more about these and other parameter options, please consult the current User Guide.

Results page

The Results page presents a table of all analysis runs and their current status. A run that has just been submitted is shown as Pending. Once the data processing begins, a progress bar is shown. Use the Results page to monitor the progress status until the analysis report becomes available. Refresh the page occasionally to update the status bar as shown in the video below.

Analysis report

After the analysis is complete, the page displays a link to the analysis report and/or to any error messages encountered during the job processing. The automatically generated EpigenCentral analysis report is self-explanatory and presents summaries of various analysis steps as well as links to further files, tables and images. An example of an analysis report is provided here.

Alternative ways to prepare and upload data

For preparation of a dataset from GEO that has IDAT files, modify the R script preprocess.R as required by replacing the GEO series ID from GSE116300 to the desired one. Beware that the GEO metadata differs substantially across different datasets, therefore adjust accordingly the parts of the script that generate the sample sheet and the design file. You will likely require a basic knowledge of R programming to do so. Alternatively, you can prepare your metadata manually using a text editor.

Some GEO datasets do not publish the original IDAT files. In this case we recommend using their Series Matrix File, which is a tab-delimited table of DNA methylation values. EpigenCentral supports uploading tab-delimited tables of DNAm values (TSV or tab-separated values), with or without the associated sample sheets. Follow the steps in the video below to upload the a TSV file containing DNAm value tables.

In addition to the guided upload, EpigenCentral also supports uploading files in bulk as shown in the video below.

Further help

Please consult the current User Guide for additional features or send your questions or suggestions to turinsky {at} sickkids {dot} ca. We want to hear from you!