Supplementary Resources for Reference-Free Deconvolution of complex DNA methylation data

Input data

The table below describes the input data used for the reference-free deconvolution protocol and how it can be obtained.

Name URL Description
GDC data download tool https://gdc.cancer.gov/access-data/gdc-data-transfer-tool On the website, select the version most appropriate for your computational infrastructure and follow the instructions to install the tool.
Manifest file for TCGA-LUAD https://portal.gdc.cancer.gov/legacy-archive/search/f Select "TCGA" in the category "Cancer program" on the left, and the "Disease type" as "lung adenocarcinoma". Then, go to files on the top of the left hand side and select "Methylation array" under "Experimental strategy". Last, select "Raw intensities" under "Data type" and "Illumina Human Methylation 450" under "Platform" and click on "Download Manifest".
Clinical metadata https://portal.gdc.cancer.gov/projects/TCGA-LUAD Click on the "Clinical" button and select "TSV" as the download option. Then, unpack the downloaded .tar.gz file into a new subdirectory annotation within your working directory.
Mutational data https://www.cbioportal.org/study/clinicalData?id=luad_tcga_pan_can_atlas_2018 Click on the download icon, which shows "Download clinical data for the selected cases" by dragging the mouse cursor over it and storing the resulting file luad_tcga_pan_can_atlas_2018_clinical_data.tsv. The file will be required as input in this plotting script.
Tumor purity scores https://static- content.springer.com/esm/art%3A10.1038%2Fncomms3612/MediaObjects/41467_2013_BFncomms3 612_MOESM489_ESM.xlsx Download the Excel file and browse to the sheet "RNASeqV2". From this sheet, select the "lung adenocarcinoma" cases from the column "platform" and store the resulting sheet a tabular file named "LUAD_ESTIMATE_RNAseqV2.tab" in the annotation subdirectory. The file is needed for the trait association script.
Stemness indices https://ars.els-cdn.com/content/image/1-s2.0-S0092867418303581-mmc1.xlsx Download the excel file and browse to the sheet "StemnessScores_DNAmeth". Store this sheet as a new file "stemness_index.csv" in the directory "annotation". This file will be required to correlate LMC proportions with cancer stemness indices using this script.

List of Resources and Intermediate Results

The table below describes the supplementary resources that are available for the reference-free deconvolution protocol.

Name URL Description
Manifest file for TCGA-LUAD https://github.com/CompEpigen/Decomp_web/blob/master/data/gdc_manifest.2019-01-23.txt The manifest file for the TCGA LUAD dataset. It can be used to download the IDAT files and associated metadata from TCGA using the GDC data transfer tool.
RnBeads report http://epigenomics.dkfz.de/downloads/DecompProtocol/RnBeads_Report_TCGA_LUAD/ The RnBeads report generated from the lung adenocarcinoma dataset from TCGA (TCGA-LUAD). The protocol only requires the Import and Quality Control modules to be executed, but we provide a complete execution of the RnBeads pipeline including exploratory analysis.
RnBSet http://epigenomics.dkfz.de/downloads/DecompProtocol/rnbSet_unnormalized.zip An processed RnBSet object comprising sample metadata, DNA methylation data, and CpG annotations. The dataset has not been subject to preprocessing and normalization.
CpGs passing quality filtering http://epigenomics.dkfz.de/downloads/DecompProtocol/sites_passing_quality_filtering.csv A list of CpG identifiers from the EPIC array, which pass the stringent quality criteria for this particular dataset. The list comes as a comma-separated values (CSV) file.
CpGs passing context filtering http://epigenomics.dkfz.de/downloads/DecompProtocol/sites_passing_context_filtering.csv A list of CpG identifiers from the EPIC array, which pass the context filtering steps. In this step, sites annotated to single nucleotide polymorphisms, or to the sex chromosomes are removed.
CpGs passing complete filtering http://epigenomics.dkfz.de/downloads/DecompProtocol/sites_passing_complete_filtering.csv A list of CpG identifiers from the EPIC array, which pass all filtering steps employed in DecompPipeline. This is an extension of the list above, which also removes sites that have been reported to be cross-reactive.
FactorViz output http://epigenomics.dkfz.de/downloads/DecompProtocol/FactorViz_outputs.tar.gz This folder comprises the results of the deconvolution experiment of the TCGA LUAD dataset, which can be directly (after extraction) imported through FactorViz. The folder contains the MeDeComSet, genomic annotation of the CpGs, and sample metadata.

Plotting scripts

The table below describes scripts that have been used to generate the plots shown within the protocol.

Name URL Description
Proportions heatmap https://github.com/CompEpigen/Decomp_web/blob/master/data/proportion_heatmap.R An R script to generate the proportions heatmap of the TCGA-LUAD deconvolution experiment with 7 LMCs.
Trait association https://github.com/CompEpigen/Decomp_web/blob/master/data/trait_association.R An R script to compare both quantitative and qualitative traits with LMC proportions.
Comparison with mutational data https://github.com/CompEpigen/Decomp_web/blob/master/data/compare_with_mutational_data.R An R script to associate LMC proportions with various kinds of genetic alterations.
Comparison to cancer stemness https://github.com/CompEpigen/Decomp_web/blob/master/data/correlation_with_stemness.R An R script to associate LMC proportions with cancer stemness indices.
Differential LMC analysis https://github.com/CompEpigen/Decomp_web/blob/master/data/compare_LMCs_scatterplot.R An R script to compare LMCs to one another, for instance to create scatterplots describing multiple LMCs.
Proportions and gene expression https://github.com/CompEpigen/Decomp_web/blob/master/data/quantify_gene_expression.R An R script to compare LMC proportions with gene expression values of cell type marker genes in lung tissue.
Survival analysis https://github.com/CompEpigen/Decomp_web/blob/master/data/survival_analyis.R An R script to perform survival analysis of patients with different LMC proportions.