Reference-free deconvolution protocol of Ewing RRBS cohort

Installation and Data Retrieval

The data were generated in a recent study which are also available from GEO, accession number GSE88826.

Data Import

RnBeads also support bisulfite sequencing data, provided that the data has been processed to generate single-CpG methylation calls in BED or similar format. However, since RnBeads creates a large internal data structure, more leightweight package can also be used. The input to the remaining steps should be a DNA methylation data matrix.

suppressPackageStartupMessages(library(RnBeads))
rnb.options(
  assembly = "hg38",
  identifiers.column = "sample_id",
  import = TRUE,
  import.default.data.type = "bed.dir",
  import.bed.style = "EPP",
  import.table.separator = ",",
  import.sex.prediction = TRUE,
  qc = TRUE,
  preprocessing = FALSE,
  exploratory = FALSE,
  inference = FALSE,
  differential = FALSE,
  export.to.bed = FALSE,
  export.to.trackhub = NULL,
  export.to.csv = FALSE
)
sample.anno <- "annotation/sample_annotation.csv"
bed.folder <- "bed/"

dir.report <- paste0("report",Sys.Date(),"/")
temp.dir <- tempdir()
options(fftempdir=temp.dir)
data.source <- c(bed.folder,sample.anno,"filename")
rnb.set <- rnb.run.analysis(dir.reports = dir.report, data.source=data.source)
# only use Ewing tissue samples
rem.samples <- !grepl("EwS_T",samples(rnb.set))
# remove FFPE samples from the analysis
rem.samples <- rem.samples | samples(rnb.set) %in% c("EwS_T122","EwS_T123","EwS_T124",
"EwS_T125","EwS_T126","EwS_T127","EwS_T129","EwS_T130","EwS_T131","EwS_T132","EwS_T133")
rnb.set <- remove.samples(rnb.set,rem.samples)

Preprocessing and Filtering

DecompPipeline also support preprocessing of bisulfite sequencing data. Here, we remove all sites that have a read coverage lower than 5 in any of the samples. Additionally, we remove high and low coverage outliers, sites with missing values, annotated SNPs, and sites on the sex chromosomes.

suppressPackageStartupMessages(library(DecompPipeline))
data.prep <- prepare.data.BS(rnb.set = rnb.set,
                          analysis.name = "Ewing_pipeline",
                          filter.coverage = TRUE,
                          min.coverage = 5,
                          min.covg.quant = 0.001,
                          max.covg.quant = 0.999,
                          filter.na = TRUE,
                          filter.snp = TRUE,
                          filter.sex.chromosomes = TRUE,
                          execute.lump = TRUE)

Feature Selection

We select the 5,000 most variably methylated CpGs across the samples for downstream analysis.

cg_subset <- prepare.CG.subsets(rnb.set=data.prep$rnb.set.filtered,
                                marker.selection = "var",
                                n.markers = 5000)

Deconvolution

Deconvolution is applied analogously to the general protocol. RefFreeCellMix and EDec can be used as alternative deconvolution methods.

md.res <- start.medecom.analysis(
  rnb.set=data.prep$rnb.set.filtered,
  cg.groups = cg_subset,
  Ks=2:15,
  lambda.grid = c(0,10^-(2:5)),
  factorviz.outputs = TRUE,
  analysis.name = "Ewing_pipeline",
  cores = 15
)

Interpretation

The returned MeDeComSet and associated deconvolution results can directly be imported into FactorViz for visualization.

suppressPackageStartupMessages(library(FactorViz))
startFactorViz(file.path(getwd(),"TCGA_Ewing_pipeline","FactorViz_outputs"))