Summary
EpigenCentral is a web resource for the interactive analysis of epigenomic datasets related to rare diseases. It allows users to classify their DNA methylation (DNAm) data samples into known disease subtypes, quantify the pathogenicity of genetic sequence variants, and discover new patterns of differential methylation between different sample groups. EpigenCentral has accumulated DNA methylation patterns associated with specific genetic mutations that lead to neurodevelopmental disorders (NDDs). These patterns are used in the classification tasks.
Disease | Mutation | Disease Pattern | Data Source | Study |
---|---|---|---|---|
Autism spectrum disorder | 16p11.2 deletion | table_16p | GSE113967 | Siu et al., 2017 |
Autism spectrum disorder | CHD8 gene | table_CHD8 | GSE113967 | Siu et al., 2017 |
CHARGE syndrome | CHD7 gene | table_CHD7 | GSE97362 | Butcher et al., 2017 |
Down syndrome | chr21 trisomy | table_Down | GSE52588 | Bacalini et al., 2015 |
Dup7 syndrome | 7q11.23 duplication | table_Dup7 | GSE66552 | Strong et al., 2015 |
Kabuki syndrome | KMT2D gene | table_KMT2D | GSE97362 | Butcher et al., 2017 |
Nicolaides-Baraitser syndrome | SMARCA2 gene | table_SMARCA2 | GSE125367 | Chater-Diehl et al., 2019 |
Sotos syndrome | NSD1 gene | table_NSD1 | GSE74432 | Choufani et al., 2015 |
Weaver syndrome | EZH2 gene | table_EZH2 | GSE74432 | Choufani et al., 2020 |
Williams syndrome | 7q11.23 deletion | table_Williams | GSE66552 | Strong et al., 2015 |
Quick Start
A sample dataset named “Kabuki” has already been pre-uploaded into the EpigenCentral's guest account, to help a new user explore the portal's analysis features. For a quick trial please follow these steps:
- Login: use guest as both the username and the password.
- Go to the Analyze page. In the dropdown list Dataset at the top of the page, check that the selected dataset is “Kabuki”.
- In the Disease classification tab, select the checkbox “Kabuki syndrome: KMT2D gene”.
- Click on the button Create Run. This should submit the analysis and automatically take you to the Results page.
- Refresh the Results page occasionally and monitor the progress status until the analysis report becomes available.
Please note that others may also log in as guest and examine or delete the data or analysis results within the guest account. Therefore, we recommend using the guest account only for demonstration purposes with datasets that are not sensitive.
Tutorial
The following is a short tutorial on using EpigenCentral. To explore other features of the portal, please consult the current User Guide.
EpigenCentral supports the analysis of DNA methylation datasets based on Illumina HumanMethylation450 and HumanMethylationEPIC platforms. From a user’s perspective the key task is to prepare their microarray dataset for analysis. Afterwards the workflow includes three main stages: upload the DNAm data set and sample sheet, select analysis tasks and parameters, and review the results. These stages are reflected in the three menu items at the top of the EpigenCentral page: Upload, Analyze, Results.
Login
Create an account with your username and password: first click on the Sign In link in the top-right corner of the EpigenCentral web page, then click on the Register now link in the popup window.
Data preparation
This tutorial will use the GEO dataset GSE116300 on Kabuki syndrome generated by Sobreira et al. (PubMed: 29255178). The first task is to download the data, then arrange it by Illumina chip ID and prepare metadata using a short R script.
- Install RStudio on your computer.
- From the GEO dataset website, download the archive file
GSE116300_RAW.tar
that contains the IDAT data files. - Extract the archive, thereby creating a folder
GSE116300_RAW
on your computer. - Download and place the R script
preprocess.R
into the folderGSE116300_RAW
. - Open the R script in Rstudio and run it. The script should arrange IDAT files into folders based on the Illumina chip, remove unused auxiliary GEO files, and create two additional metadata files by retrieving sample information from GEO.
These preparation steps should create the following data collection:
- Four folders named
8655677134
,8655685063
,8655685138
and8655685165
, each folder containing 11 pairs of IDAT files for a total of 88 IDAT files. Each file represents either green or red intensity channel for one of the data samples. The file names follow standard Illumina convention, e.g.8655677134_R04C02_Grn.idat.gz
corresponds to a DNAm sample on chip #8655677134, row 4, column 2, green channel. The individual IDAT files may remain gzipped. - A sample sheet file
kabuki.samples.csv
: this comma-separated file describes sample characteristics such as the Illumina chip and position (in columns Sentrix_ID and Sentrix_Position, respectively), as well as sample group, mutation status and sex. Sample files may also include other available confounding factors, such as age, tissue of origin or batch information. - A design file
kabuki.design
: this tab-delimited file assigns numeric codes for group comparison: ‘2’ to indicate disease samples, ‘1’ for controls, ‘0’ for samples to ignore. In this example the design file indicates 9 Kabuki syndrome samples with pathogenic variants to be used as cases, 9 control samples as controls, and 26 other samples as those to be ignored when deriving differentially methylated patterns. See the details on creating design files at http://www.computationalgenomics.ca/tutorials/.
Upload page
Follow the steps in the video below to upload the dataset to EpigenCentral. There are two main types of upload: Guided and Bulk. We recommend Guided upload for new users.
Analyze page
Follow the steps in the video below to set parameters and submit a new data-analysis run.
Keep other parameters as defaults or customize them if desired. For example, for the “Disease classification” tab try additional options. Or change the “p-value adjustment” method between Bonferroni, FDR and none (i.e. no adjustment for multiple testing), which should have a large effect on the size of the returned DNA methylation patterns. To learn more about these and other parameter options, please consult the current User Guide.
Results page
The Results page presents a table of all analysis runs and their current status. A run that has just been submitted is shown as Pending. Once the data processing begins, a progress bar is shown. Use the Results page to monitor the progress status until the analysis report becomes available. Refresh the page occasionally to update the status bar as shown in the video below.
Analysis report
After the analysis is complete, the page displays a link to the analysis report and/or to any error messages encountered during the job processing. The automatically generated EpigenCentral analysis report is self-explanatory and presents summaries of various analysis steps as well as links to further files, tables and images. An example of an analysis report is provided here.
Alternative ways to prepare and upload data
For preparation of a dataset from GEO that has IDAT files, modify the R script preprocess.R
as required by replacing the GEO series ID from GSE116300 to the desired one. Beware that the GEO metadata differs substantially across different datasets, therefore adjust accordingly the parts of the script that generate the sample sheet and the design file. You will likely require a basic knowledge of R programming to do so. Alternatively, you can prepare your metadata manually using a text editor.
Some GEO datasets do not publish the original IDAT files. In this case we recommend using their Series Matrix File, which is a tab-delimited table of DNA methylation values. EpigenCentral supports uploading tab-delimited tables of DNAm values (TSV or tab-separated values), with or without the associated sample sheets. Follow the steps in the video below to upload the a TSV file containing DNAm value tables.
In addition to the guided upload, EpigenCentral also supports uploading files in bulk as shown in the video below.
Further help
Please consult the current User Guide for additional features or send your questions or suggestions to turinsky {at} sickkids {dot} ca. We want to hear from you!