Pleas follow the instructions provided in the protocol using MeDeCom.
Analogously to the general protocol, we use RnBeads for data import.
suppressPackageStartupMessages(library(RnBeads))
rnb.options(
assembly = "hg19",
identifiers.column = "submitter_id",
import = T,
import.default.data.type = "idat.dir",
import.table.separator = "\t",
import.sex.prediction = T,
qc = T,
preprocessing = F,
exploratory = F,
inference = F,
differential = F,
export.to.bed = F,
export.to.trackhub = NULL,
export.to.csv = F
)
sample.anno <- "annotation/sample_annotation.tsv"
idat.folder <- "idat/"
dir.report <- paste0("report",Sys.Date(),"/")
temp.dir <- tempdir()
options(fftempdir=temp.dir)
rnb.set <- rnb.run.analysis(dir.reports = dir.report,
sample.sheet = sample.anno,
data.dir = idat.folder)
The preprocessing and filtering is also performed in line with the general protocol using DecompPipeline.
suppressPackageStartupMessages(library(DecompPipeline))
data.prep <- prepare.data(rnb.set = rnb.set,
analysis.name = "TCGA_RefFree",
normalization = "bmiq",
filter.beads = TRUE,
min.n.beads = 3,
filter.intensity = TRUE,
min.int.quant = 0.001,
max.int.quant = 0.999,
filter.na = TRUE,
filter.context = TRUE,
filter.snp = TRUE,
filter.sex.chromosomes = TRUE,
filter.cross.reactive = TRUE,
execute.lump = TRUE)
We use Independent Component Analysis (ICA) to adjust for the potential confounding factors age, race, gender, and ethnicity.
data.prep <- prepare.data(rnb.set = data.prep$rnb.set.filtered,
analysis.name = "TCGA_RefFree",
normalization = "none",
filter.beads = FALSE,
filter.intensity = F,
filter.na = FALSE,
filter.context = FALSE,
filter.snp = FALSE,
filter.sex.chromosomes = FALSE,
filter.cross.reactive = FALSE,
remove.ICA = TRUE,
conf.fact.ICA = c("age_at_diagnosis","race","gender","ethnicity"),
ica.setting = c("alpha.fact"=1e-5,"save.report"=TRUE,
"ntry"=10,"nmin"=20,"nmax"=50,"ncores"=10))
We select the 5,000 most variable CpGs across the samples for downstream analysis.
cg_subset <- prepare.CG.subsets(rnb.set=data.prep$rnb.set.filtered,
marker.selection = "var",
n.markers = 5000)
We use the RefFreeCellMix function from the RefFreeEWAS package for deconvolution analysis. In contrast to MeDeCom, RefFreeCellMix does not use a regularization for the entries of the matrix to be at the extreme values 0 or 1. Thus, we just specify the number of components to be tested and the preprocessed object, along with the selected features.
md.res <- start.refreeewas.analysis(
rnb.set=data.prep$rnb.set.filtered,
cg.groups = cg_subset,
Ks=2:15,
factorviz.outputs=TRUE,
work.dir = "TCGA_RefFree"
)
The resulting object is an object of type MeDeComSet, since the output from RefFreeCellMix has internally been transformed to the MeDeCom infrastructure. This enables loading the data set directly into the visualization tool FactorViz
suppressPackageStartupMessages(library(FactorViz))
startFactorViz(file.path(getwd(),"TCGA_RefFree","FactorViz_outputs"))