InstaPrism is R package to deconvolute cellular
proportion and gene expression in bulk RNA-Seq data. Based on the same
conceptual framework and corresponding generative mode from BayesPrism,
InstaPrism re-implements BayesPrism in a derandomized
framework by replacing the time-consuming Gibbs sampling steps in
BayesPrism with a fixed-point algorithm, which greatly accelerated the
calculation speed while maintaining highly comparable performance.
This tutorial is associated the latest version of InstaPrism (InstaPrism v0.1.5), which contains the following key modifications:
get_Z_array()
function to retrieve 3-d cell
type specific expression array (sample \(\times\) genes \(\times\) cell.type)get_loglikelihood_trace()
function to
visualize model’s log likelihood over iterationsget_subcluster()
function to identify
subcluster structures within each cell type, can be used to specify
cell-state informationInstaPrism()
function associated with previous tutorial
(InstaPrism v0.1.3) can be accessed using the
InstaPrism_legacy()
functionIn this tutorial, we provide three examples of running InstaPrism and compare the results with BayesPrism.
Load the InstaPrism package.
InstaPrism works as an independent R package and
does not require the users to have BayesPrism installed.
We recommend the readers to read through the tutorial
of BayePrism before running our examples.
extdata
folder associated with this tutorial can be
downloaded from zenodo
repository.
library(InstaPrism)
#> Loading required package: dplyr
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
#> Loading required package: tidyr
#> Loading required package: Matrix
#>
#> Attaching package: 'Matrix'
#> The following objects are masked from 'package:tidyr':
#>
#> expand, pack, unpack
#> Loading required package: caret
#> Loading required package: ggplot2
#> Loading required package: lattice
#> Loading required package: NMF
#> Loading required package: pkgmaker
#> Loading required package: registry
#> Loading required package: rngtools
#> Loading required package: cluster
#> NMF - BioConductor layer [OK] | Shared memory capabilities [NO: bigmemory] | Cores 15/16
#> To enable shared memory capabilities, try: install.extras('
#> NMF
#> ')
#> Loading required package: pheatmap
#> Loading required package: abind
#> Loading required package: pbapply
The following lines simulate bulk and single-cell expression, as well as marker genes and true proportions that can be used as an example of deconvolution using the simulation model the BisqueRNA package.
library(Biobase)
library(BisqueRNA)
cell.types <- c("Neurons", "Astrocytes", "Oligodendrocytes", "Microglia", "Endothelial Cells","others")
avg.props <- c(.5, .2, .15, .07, .03,.05)
sim.data <- SimulateData(n.ind=20, n.genes=1000, n.cells=500, cell.types=cell.types, avg.props=avg.props)
sim.data contains the following simulated objects:
Alternatively, readers can load the example simulated data directly from the InstaPrism package.
data("sim.data")
Both InstaPrism and BayesPrism take single cell profiles as prior information and return deconvolution results for a given bulk expression matrix. To prepare input for InstaPrism and BayesPrism, we need:
In real practice, cell.state.labels usually denote different cell states from a same given cell type. For example, malignant cells can be subclustered by different patients to denote different malignant states. With the sim.data we created above, we will generate some artificial cell state labels for Neurons given individual IDs.
library(Biobase)
sc.eset <- sim.data$sc.eset
bulk.eset <- sim.data$bulk.eset
sc_Expr = exprs(sc.eset)
sc_Expr = apply(sc_Expr,2,function(x)((x/sum(x))*1e+05))
bulk_Expr = exprs(bulk.eset)
bulk_Expr = apply(bulk_Expr,2,function(x)((x/sum(x))*1e+06))
cell_type_labels=sc.eset@phenoData@data[["cellType"]]
# create artifical cell-state labels for Neuron cells: Neuron_A, Neuron_B, Neuron_C
Neurons_states=names(table(sc.eset@phenoData@data$SubjectName))
names(Neurons_states)=c(rep("A",8),rep("B",6),rep("C",6))
cell_state_labels=ifelse(sc.eset@phenoData@data[["cellType"]]=='Neurons',paste0('Neurons_sub',names(Neurons_states)[match(sc.eset@phenoData@data$SubjectName,Neurons_states)]),sc.eset@phenoData@data[["cellType"]])
table(cell_state_labels)
#> cell_state_labels
#> Astrocytes Endothelial Cells Microglia Neurons_subA
#> 1958 306 727 2014
#> Neurons_subB Neurons_subC Oligodendrocytes others
#> 1497 1477 1445 576
 InstaPrism()
is the main deconvolution function.
Depending the format of single cell prior information to use, users need
to specify the input_type
argument passed to the function.
The InstaPrism()
function currently support
four input_types.
Note that in the following examples we will consider a simple implementation where we use prior information from the scRNA Seq directly, without the need to update reference.
1. InstaPrism with
input_type = 'raw'
Under this setting, InstaPrism takes the same input format as BayesPrism:
library(tictoc) # load package to measure execution time
tic()
InstaPrism.res = InstaPrism(input_type = 'raw',sc_Expr = sc_Expr,bulk_Expr = bulk_Expr,
cell.type.labels = cell_type_labels,cell.state.labels = cell_state_labels)
elapsed1 = toc()
#> 0.161 sec elapsed
2. InstaPrism with
input_type = 'prism'
In this mode, users need to first construct a prism object using the
BayesPrism package, and then pass the prismObj
to the
InstaPrism()
function.
library(BayesPrism)
#> Loading required package: snowfall
#> Loading required package: snow
#> Warning: replacing previous import 'gplots::lowess' by 'stats::lowess' when
#> loading 'BayesPrism'
#> Warning: replacing previous import 'BiocParallel::register' by 'NMF::register'
#> when loading 'BayesPrism'
tic()
bp.obj = new.prism(reference = t(sc_Expr),input.type = 'count.matrix',
cell.type.labels = cell_type_labels, cell.state.labels = cell_state_labels,
mixture = t(bulk_Expr),key = NULL)
InstaPrism.res2 = InstaPrism(input_type = 'prism',prismObj = bp.obj)
elapsed2 = toc()
#> 0.72 sec elapsed
3. InstaPrism with input_type = 'refPhi'
(recommended)
Under this setting, users need to pass a pre-defined
refPhi
object containing single cell prior information. To
build a refPhi
object, users can use the
refPrepare()
function with the following arguments:
The resulting refPhi
object contains the following
information:
The advantage of using refPhi
as input is that one
doesn’t need to load the large single cell expression matrix every time,
and the refPhi
object can be easily re-implemented once
built.
refPhi_obj = refPrepare(sc_Expr, cell_type_labels, cell_state_labels)
tic()
InstaPrism.res3 = InstaPrism(input_type = 'refPhi',bulk_Expr = bulk_Expr,refPhi = refPhi_obj)
elapsed3 = toc()
#> 0.049 sec elapsed
4. InstaPrism with input_type = 'refPhi_cs'
(recommended)
Under this setting, users need to specify a pre-defined
refPhi_cs
object which can be built by
refPhi_cs_obj = refPrepare(sc_Expr, cell_type_labels, cell_state_labels,ref_type = 'refPhi_cs')
InstaPrism.res = InstaPrism(input_type = 'refPhi_cs',bulk_Expr = bulk_Expr,refPhi = refPhi_cs_obj)
Alternatively, users can use the built-in reference
provided by InstaPrism,
which are all in refPhi_cs
format.
Using refPhi_cs
provides a more versatile setting where
users can selectively choose which cell-states to include for
deconvolution. For example, given the refPhi
object we
built in the previous section, if we only want to include
Neurons_subA
and Neurons_subB
as cell states
of interest, and remove Neurons_subC
from Neurons, we
can:
refPhi_cs_map = refPhi_obj@map
refPhi_cs_map$Neurons = c('Neurons_subA', 'Neurons_subB' )
to_move_id = which(colnames(refPhi_obj@phi.cs) == 'Neurons_subC')
refPhi_cs_obj = new('refPhi_cs',phi.cs = refPhi_obj@phi.cs[,-to_move_id],map = refPhi_cs_map)
InstaPrism.res4 = InstaPrism(input_type = 'refPhi_cs',bulk_Expr = bulk_Expr,refPhi_cs = refPhi_cs_obj)
InstaPrism()
function returns an S4 object containing
the following information:
Run BayesPrism for comparison
In the following example, we are setting
update.gibbs = F
for consistency with our previous examples
where we only consider single-cell based reference, without updating the
reference. Deconvolution with BayesPrism can be called by a single
function run.prism()
as follows:
start.time = Sys.time()
bp.res = run.prism(bp.obj,update.gibbs = F)
end.time=Sys.time()
bp_running_time = difftime(end.time, start.time,units = 'secs') %>% as.numeric()
save(bp.res,bp_running_time,file='extdata/tutorial_example1/bp.res.initial.RData')
Â
1. deconvolution performance of InstaPrism
We provide a useful function deconv_performance_plot()
that visualize correlations at per-cell.type level. According to the
plot, InstaPrism accurately predicts cellular proportions from
sim.data
deconv_performance_plot(est = InstaPrism.res@Post.ini.ct@theta,true = sim.data$props,title = 'InstaPrism performance on sim.data',nrow=1)
#> Loading required package: ggpmisc
#> Loading required package: ggpp
#>
#> Attaching package: 'ggpp'
#> The following object is masked from 'package:ggplot2':
#>
#> annotate
2. compare cell type fraction estimates from InstaPrism and BayesPrism
We can visualize the correlation between two fraction estimates with a heatmap. As shown by the heatmap, cell.type estimates from both methods are highly correlated.
corr=cor(t(InstaPrism.res@Post.ini.ct@theta),bp.res@posterior.initial.cellType@theta)
diag(corr)
#> Neurons Astrocytes Oligodendrocytes Microglia
#> 0.9999964 0.9999965 0.9999892 0.9999696
#> Endothelial Cells others
#> 0.9999649 0.9999805
ComplexHeatmap::Heatmap(corr,show_row_dend = F,show_column_dend = F,column_title = 'BayesPrism',row_title = 'InstaPrism',name='correlation')
3. running time comparison InstaPrism significantly accelerates the deconvolution speed in either modes.
InstaPrism_mode1_running_time = as.numeric(elapsed1$toc-elapsed1$tic)
InstaPrism_mode2_running_time = as.numeric(elapsed2$toc-elapsed2$tic)
InstaPrism_mode3_running_time = as.numeric(elapsed3$toc-elapsed3$tic)
rt=data.frame(method=c('InstaPrism.raw.mode','InstaPrism.prism.mode','InstaPrism.refPhi.mode','BayesPrism'),
time=c(InstaPrism_mode1_running_time,InstaPrism_mode2_running_time,
InstaPrism_mode3_running_time,bp_running_time))
rt$time=round(rt$time,2)
ggplot(rt,aes(method,time))+
geom_bar(stat="identity",fill='grey',width = 0.5)+
theme_bw()+
xlab('')+
ylab('time (secs)')+
geom_text(aes(label = time), vjust = -0.2)+
ylim(0,max(rt$time)*1.05)+
theme(axis.text.x = element_text(angle = 45, hjust=1))
The 2-dimensional initial.scaler matrix from the InstaPrism output
can be used to reconstruct any cell.states/cell.types specific
expression of interest, which serves as an extra memory-efficient
alternative to the 3-dimensional Z array in the BayesPrism output.
To reconstruct Neurons specific expression from the above example:
Neuron_expr = reconstruct_Z_ct_initial(InstaPrism_obj = InstaPrism.res,
cell.type.of.interest = 'Neurons')
And BayesPrism derived Neurons specific expression can be obtained by:
Neuron_expr_bp = t(bp.res@posterior.initial.cellType@Z[,,'Neurons'])
Two Neurons specific expression matrices are near identical.
corr = cor(Neuron_expr,Neuron_expr_bp)
ComplexHeatmap::Heatmap(corr,show_row_dend = F,show_column_dend = F,column_title = 'BayesPrism',row_title = 'InstaPrism',name='correlation',cluster_rows = F,cluster_columns = F,show_row_names = F,show_column_names = F)
InstaPrism also supports full 3d Z tensor reconstruction (sample \(\times\) genes \(\times\) cell.type)
Z = get_Z_array(InstaPrism.res,resolution = 'ct',n.core = 16)
The dimension of Z is equivalent to BayesPrism output
dim(Z)
#> [1] 20 988 6
dim(bp.res@posterior.initial.cellType@Z)
#> [1] 20 988 6
Two Z tensors are nearly identical
all.equal(Z,bp.res@posterior.initial.cellType@Z)
#> [1] "Mean relative difference: 0.001901318"
We now consider a more advanced deconvolution problem where we want
to leverage from the information shared by bulk samples and update the
reference matrix accordingly.
In InstaPrism, we present a more versatile
InstaPrism_update()
function for a similar purpose, with
the following arguments included:
InstaPrism_update()
returns an S4 object containing the
following objects:
key
argument passed to
InstaPrism_update()
functionkeep.phi
argument passed
to InstaPrism_update()
functionAmong these objects, the ‘theta’ matrix exclusively holds the
deconvolution results of interest, while the remaining objects are only
required for Z reconstruction.
1. Reference update using InstaPrism_update()
To update the reference profile for the above simulated dataset, we
will set key = NA
since no malignant cells are present in
this problem. By setting cell.types.to.update = 'all'
, we
specify that all the cell.type reference will be updated.
tic()
InstaPrism.res.updated = InstaPrism_update(InstaPrism.res,
bulk_Expr,
cell.types.to.update = 'all',
key = NA)
#> the "key" parameter is set to NA, the updated reference will be the same for all the individuals
elp = toc()
#> 0.058 sec elapsed
2. Reference update with BayesPrism
To enable reference update using BayesPrism, one can set
update.gibbs = TRUE
in run.prism()
function
(the default setting), or by calling the update.theta()
function on a BayesPrism object returned from run.prism()
function with update.gibbs = FALSE
.
For a direct running time comparing between reference update steps,
we will first run run.prism()
function with
update.gibbs = FALSE
and then use the
update.theta()
function.
bp.res = run.prism(bp.obj,update.gibbs = F)
# we will only consider running time for reference updates
start.time = Sys.time()
bp.res.updated = update.theta(bp=bp.res)
# the bp.res.updated obtained by executing run.prism() + update.theta() steps mentioned above is equivalent to running the following command in a single line
# bp.res.updated = run.prism(bp.obj,update.gibbs = T)
end.time=Sys.time()
bp_updated_running_time = difftime(end.time, start.time,units = 'secs') %>% as.numeric()
save(bp.res.updated,bp_updated_running_time,file = 'extdata/tutorial_example1/bp.res.updated.RData')
3. deconvolution results comparison
When we set cell.types.to.update
equals to all the
unique cell.types present in the reference profile, two deconvolution is
near identical.
corr=cor(t(InstaPrism.res.updated@theta),bp.res.updated@posterior.theta_f@theta)
diag(corr)
#> Neurons Astrocytes Oligodendrocytes Microglia
#> 0.9999949 0.9999941 0.9999872 0.9999680
#> Endothelial Cells others
#> 0.9999705 0.9999546
4. running time comparison
InstaPrism_update()
demonstrates a substantial
acceleration in reference update speed compared to
BayesPrism::update.theta()
InstaPrism_updated_running_time = as.numeric(elp$toc - elp$tic)
rt=data.frame(method=c('InstaPrism','BayesPrism'),
time=c(InstaPrism_updated_running_time,bp_updated_running_time))
rt$time=round(rt$time,2)
ggplot(rt,aes(method,time))+
geom_bar(stat="identity",fill='grey',width = 0.4)+
theme_bw()+
ylab('time (secs)')+
geom_text(aes(label = time), vjust = -0.2)+
ylim(0,max(rt$time)*1.05)
5. reconstruct cell.type specific expression with the updated reference
We provide a useful function reconstruct_Z_ct_updated()
that enables the output of cell-type-specific expression using the
updated reference.
For example, if we are interested in Neurons specific expression with
the updated reference, we can
Neuron_expr_updated = reconstruct_Z_ct_updated(InstaPrism_updated_obj = InstaPrism.res.updated,
cell.type.of.interest = 'Neurons')
Alternatively, one can use get_Z_array()
function to get
the 3d Z array directly. The dimension of this Z array is sample \(\times\) genes \(\times\) cell.type.
Z = get_Z_array(InstaPrism.res.updated,resolution = 'ct',n.core = 16)
6. additional notes on InstaPrism_update function
In the above examples we only considered a simple scenario where we
update all the cell.types present. In real practice, we may want to
update only specific cell.types, such as those exhibiting high
cross-sample heterogeneity, while retaining the scRNA based reference
for the remaining cell types. In such cases, users need to specify the
keep.phi
argument, indicating whether to use phi.ct or
phi.cs for cell.types that are not updated.
Since either phi.ct
or phi.cs
can be used
for the cell.types that are not being updated, the theta
matrix can be a mixture of abundance estimates for both cell.types and
cell.states. To merge any cell.state information to cell.type levels,
users can utilize the convenient merge_updated_theta()
function for this purpose. Note that is function is only applicable when
keep.phi = 'phi.cs'
updated_ct = merge_updated_theta(InstaPrism.res.updated)
In this example, we will use the tutorial data
provided in BayesPrism and compare the deconvolution results between
InstaPrism and BayesPrism.
Note that it takes more than 6 hours to run the following code from our end (using n.core=16), users can skip the BayesPrism running process by loading our processed results directly.
library(BayesPrism)
load('extdata/tutorial_example2/tutorial.gbm.rdata')
sc.stat <- plot.scRNA.outlier(
input=sc.dat, #make sure the colnames are gene symbol or ENSMEBL ID
cell.type.labels=cell.type.labels,
species="hs", #currently only human(hs) and mouse(mm) annotations are supported
return.raw=TRUE #return the data used for plotting.
#pdf.prefix="gbm.sc.stat" specify pdf.prefix if need to output to pdf
)
bk.stat <- plot.bulk.outlier(
bulk.input=bk.dat,#make sure the colnames are gene symbol or ENSMEBL ID
sc.input=sc.dat, #make sure the colnames are gene symbol or ENSMEBL ID
cell.type.labels=cell.type.labels,
species="hs", #currently only human(hs) and mouse(mm) annotations are supported
return.raw=TRUE
#pdf.prefix="gbm.bk.stat" specify pdf.prefix if need to output to pdf
)
# Filter outlier genes from scRNA-seq data
sc.dat.filtered <- cleanup.genes (input=sc.dat,
input.type="count.matrix",
species="hs",
gene.group=c( "Rb","Mrp","other_Rb","chrM","MALAT1","chrX","chrY") ,
exp.cells=5)
# Subset protein coding genes
sc.dat.filtered.pc <- select.gene.type (sc.dat.filtered,
gene.type = "protein_coding")
# construct a prism object
myPrism <- new.prism(
reference=sc.dat.filtered.pc,
mixture=bk.dat,
input.type="count.matrix",
cell.type.labels = cell.type.labels,
cell.state.labels = cell.state.labels,
key="tumor",
outlier.cut=0.01,
outlier.fraction=0.1,
)
# run BayesPrism
start.time = Sys.time()
bp.res <- run.prism(prism = myPrism, n.cores=16,update.gibbs=T) # set update.gibbs=T for full comparison between two methods
end.time=Sys.time()
bp_running_time = difftime(end.time, start.time,units = 'mins') %>% as.numeric()
save(bp.res,bp_running_time,file = 'extdata/tutorial_example2/bp.res.RData')
Alternatively, users can download the processed results directly from zenodo repository
load('extdata/tutorial_example2/bp.res.RData')
We present a step-by-step InstaPrism deconvolution framework using the following commands:
tic()
# 1. get initial deconvolution results using scRNA based reference
InstaPrism.res.initial = InstaPrism(input_type = 'prism',prismObj = bp.res@prism, n.iter = 100, n.core = 16)
# 2. reconstruct cell.type specific expression with the initial reference (optional)
Z_ct_initial = get_Z_array(InstaPrism.res.initial,resolution = 'ct',n.core = 16)
# 3. update reference & get deconovlution results from the updated reference
updated_obj = InstaPrism_update(InstaPrism.res.initial,
bulk_Expr = t(bp.res@prism@mixture),
n.iter = 100,
n.core = 16,
cell.types.to.update = 'all',
key = 'tumor')
# 4. reconstruct cell.type specific expression with the updated reference (optional)
Z_ct_updated = get_Z_array(updated_obj, resolution = 'ct', n.core = 16)
elp = toc()
#> 76.866 sec elapsed
1. cell.type fraction comparison (with initial reference)
cell.type fraction estimates from two methods are nearly identical
ct.corr=cor(t(InstaPrism.res.initial@Post.ini.ct@theta),bp.res@posterior.initial.cellType@theta) %>% round(1)
ComplexHeatmap::Heatmap(ct.corr,show_row_dend = F,show_column_dend = F,column_title = 'BayesPrism',row_title = 'InstaPrism',name='correlation', cell_fun = function(j, i, x, y, w, h, col) {grid::grid.text(ct.corr[i, j], x, y)})
2. cell type specific expression comparison
Two tumor-specific expression matrix are near identical
# tumor specific expression from BayesPrism
Z_tumor_ct_initial_bp = bp.res@posterior.initial.cellType@Z[,,'tumor']
all.equal(Z_ct_initial[,,'tumor'],Z_tumor_ct_initial_bp)
#> [1] "Mean relative difference: 0.005304617"
3. cell.type fraction comparison (with updated reference)
cell.type fraction estimates from two methods are nearly identical
ct.corr.updated=cor(t(updated_obj@theta),bp.res@posterior.theta_f@theta)
diag(ct.corr.updated)
#> tumor myeloid pericyte endothelial tcell oligo
#> 0.9999576 0.9999839 0.9999233 0.9998641 0.9950594 0.9999056
4. running time comparison InstaPrism significantly reduced the running time from hours to mere minutes on this tutorial data.
InstaPrism_running_time = as.numeric(elp$toc - elp$tic) / 60 # tic and toc are the elapsed process times in seconds, divide it by 60 to convert to minutes
rt=data.frame(method=c('InstaPrism \n (n.core=16)','BayesPrism \n (n.core=16)'),
time=c(InstaPrism_running_time,bp_running_time))
rt$time=round(rt$time,2)
ggplot(rt,aes(method,time))+
geom_bar(stat="identity",fill='grey',width = 0.5)+
theme_bw()+
ylab('time (mins)')+
geom_text(aes(label = time), vjust = -0.2)+
ylim(0,max(rt$time)*1.08)
5. Memory comparison With 2-d compression of the Z matrix, InstaPrism significantly reduce the memory required to store the deconvolution object
InstaPrism_memory = object.size(c(InstaPrism.res.initial,updated_obj))
bp_memory = object.size(bp.res)
mp=data.frame(method=c('InstaPrism','BayesPrism'),
memory=c(object.size(c(InstaPrism.res.initial,updated_obj)),object.size(bp.res)),
m = c('85MB','1.8GB'))
ggplot(mp,aes(method,memory))+
geom_bar(stat="identity",fill='grey',width = 0.5)+
theme_bw()+
ylab('memory (bytes)')+
geom_text(aes(label = m), vjust = -0.4)+
ylim(0,2e+09)
We have recently proposed a heterogeneous bulk simulation pipeline that simulate bulk samples with realistic biological variance (for more details, check here). In the following example, we will run deconvolution on a heterogeneouly simulated bulk dataset of HNSC tumors and evaluate the deconvolution performance by comparing the fraction estimates with real fractions.
For details about how the bulk samples are simulated, check here.
load('extdata/tutorial_example3/example3.RData')
The example3.RData
contains the following objects:
where we assign the malignant cells with different cell.states by
their patient IDs.
InstaPrism.res = InstaPrism(input_type = 'raw',sc_Expr = scExpr_train,bulk_Expr = simulated_bulk,
cell.type.labels = cell.type.labels, cell.state.labels = cell.state.labels,
n.iter = 100,n.core = 16)
deconv_performance_plot(est = InstaPrism.res@Post.ini.ct@theta,true = t(simulated_frac),title = 'InstaPrism performance on heterogenously simulated bulk data',nrow=2)
InstaPrism achieves reasonable cell.type estimates for the simulated
bulk samples.
We present the following framework to run deconvolution with InstaPrism
Depending on input types, initial deconvolution results can be obtained through:
InstaPrism.res.initial = InstaPrism(input_type = 'raw',sc_Expr,bulk_Expr,cell.type.labels,cell.state.labels,n.core = 16)
InstaPrism.res.initial = InstaPrism(input_type = 'prism',prismObj,n.core = 16)
InstaPrism.res.initial = InstaPrism(input_type = 'refPhi',bulk_Expr, refPhi,n.core = 16)
InstaPrism.res.initial = InstaPrism(input_type = 'refPhi_cs',bulk_Expr, refPhi_cs,n.core = 16)
InstaPrism()
function returns an S4 object containing
the following information:
Z_cs_initial = reconstruct_Z_cs_initial(InstaPrism.res.initial,cell.state.of.interest)
Z_ct_initial = reconstruct_Z_ct_initial(InstaPrism.res.initial,cell.type.of.interest)
reconstruct_Z_cs_initial()
and
reconstruct_Z_ct_initial()
returns a gene \(\times\) sample matrix containing
cell-state or cell-type specific expression information.
Alternatively, use get_Z_array()
to get the 3-d
cell.state/cell.type specific expression:
Z_cs_initial = get_Z_array(InstaPrism.res.initial,resolution = 'cs')
Z_ct_initial = get_Z_array(InstaPrism.res.initial,resolution = 'ct')
updated_obj = InstaPrism_update(InstaPrism.res.initial,
bulk_Expr,
cell.types.to.update = 'all',
key = 'tumor',
n.core = 16)
updated_obj@theta
contains cell state/type abundance
estimates, while the remaining objects returned by
InstaPrism_update()
is for Z reconstruction only.
When updated_obj@theta
contains cell.state abundance
estimates, one can use merge_updated_theta()
function to
merge any cell.state information to cell.type levels.
updated_ct = merge_updated_theta(updated_obj)
Z_ct_updated = get_Z_array(updated_obj,n.core = 16)
So far we have demonstrated the primary usage of InstaPrism, mostly with the default parameters. In real practice, some of the default parameters can be tailored to meet the users’ specific requirements.
n.iter: number of iterations in InstaPrism
algorithm.
n.iter
is an important parameter for InstaPrism algorithm.
InstaPrism updates the fraction estimates over iterations and
n.iter
determines how many iterations are performed. Under
default setting, this value is set to max(100, number of cell.states
\(\times\) 2). However, users can
set it to larger values to improve convergence. Specifically, we
included two additional parameters that help to verify the convergence
status: verbose and convergence.plot.
verbose: a logical variable to determine whether
to display convergence status of the model.
By setting verbose = T
, InstaPrsim()
will
print the convergence status summary for both cell.states and
cell.types.
Specifically, the absolute difference in fraction estimates between the
last two iterations is utilized as an indicator of convergence
(abs_diff), with smaller values indicating convergence and higher
confidence of the deconvolution results. Using
example3.RData
as a demonstration, by setting
verbose = T
, we get a summarized table showing the minimum,
median and maximum abs_diff for all cell.states/types across the
samples. Usually we consider abs_diff < 0.01 as convergence.
load('extdata/tutorial_example3/example3.RData')
InstaPrism.res = InstaPrism(input_type = 'raw',sc_Expr = scExpr_train,bulk_Expr = simulated_bulk,
cell.type.labels = cell.type.labels, cell.state.labels = cell.state.labels,
n.iter = 100,n.core = 16,verbose = T)
#> deconvolution with scRNA reference phi
#>
#>
#> ********************** convergence status summary **********************
#> instructions:
#> the absolute difference in fraction estimates between the last two iterations is utilized as an indicator of convergence,
#> with smaller values indicating convergence (usually consider abs_diff < 0.01 as convergence),
#> below is a summarized convergence status for cell.states/cell.types across all the samples
#>
#> *************** convergence status summary for cell states ***************
#> cell.state min median max
#> 1 B cell 6.572045e-14 2.042905e-06 1.335874e-05
#> 2 Dendritic 1.257843e-07 4.954830e-06 2.266095e-05
#> 3 Endothelial 1.554095e-07 8.878092e-06 4.762081e-05
#> 4 Fibroblast 3.301680e-08 1.627585e-05 6.743852e-05
#> 5 HNSCC 1.278888e-07 2.426794e-05 1.275864e-04
#> 6 HNSCC10 1.091520e-08 1.591293e-06 1.144036e-05
#> 7 HNSCC12 2.139345e-09 5.206182e-07 2.872120e-05
#> 8 HNSCC13 5.232302e-07 2.301956e-05 8.867985e-05
#> 9 HNSCC16 9.318927e-08 2.984146e-05 2.215908e-04
#> 10 HNSCC17 1.417992e-07 2.709896e-05 3.314047e-04
#> 11 HNSCC18 5.775770e-10 3.036487e-07 6.254012e-05
#> 12 HNSCC20 5.802662e-07 1.003421e-05 1.443692e-04
#> 13 HNSCC22 1.080305e-08 2.707936e-05 4.096297e-04
#> 14 HNSCC24 3.619047e-09 1.656969e-06 2.719350e-05
#> 15 HNSCC25 1.389515e-08 2.383273e-05 2.361662e-04
#> 16 HNSCC26 1.013015e-07 2.077936e-05 7.958531e-05
#> 17 HNSCC28 2.987275e-08 3.044526e-05 2.738516e-04
#> 18 HNSCC5 2.497902e-08 6.200813e-06 1.489406e-04
#> 19 HNSCC6 2.433803e-07 1.457819e-05 6.872945e-05
#> 20 HNSCC7 1.243468e-09 3.923414e-06 3.197536e-05
#> 21 HNSCC8 2.316761e-13 3.827862e-07 6.602595e-06
#> 22 Macrophage 1.493925e-07 3.978596e-06 1.398273e-05
#> 23 Mast 5.757529e-11 1.759897e-07 2.972196e-06
#> 24 myocyte 9.239945e-10 2.153556e-07 1.784543e-06
#> 25 T cell 9.902686e-08 1.059614e-05 4.179099e-05
#>
#> *************** convergence status summary for cell types ***************
#> cell.type min median max
#> 1 B cell 6.572045e-14 2.042905e-06 1.335874e-05
#> 2 Dendritic 1.257843e-07 4.954830e-06 2.266095e-05
#> 3 Endothelial 1.554095e-07 8.878092e-06 4.762081e-05
#> 4 Fibroblast 3.301680e-08 1.627585e-05 6.743852e-05
#> 5 Macrophage 1.493925e-07 3.978596e-06 1.398273e-05
#> 6 malignant 1.568093e-05 3.891631e-04 9.749845e-04
#> 7 Mast 5.757529e-11 1.759897e-07 2.972196e-06
#> 8 myocyte 9.239945e-10 2.153556e-07 1.784543e-06
#> 9 T cell 9.902686e-08 1.059614e-05 4.179099e-05
#>
#> ******************* end of convergence status summary ********************
#>
#> scaler calculation
example3.RData
as a demonstration, first let’s
consider the scenario whenn.iter
is set too low. According
to the convergence plot, fraction estimates for malignant cells did not
converge. This means that we should set higher value of
n.iter
.InstaPrism.res = InstaPrism(input_type = 'raw',sc_Expr = scExpr_train,bulk_Expr = simulated_bulk,
cell.type.labels = cell.type.labels, cell.state.labels = cell.state.labels,
n.iter = 20,n.core = 16,convergence.plot=T,max_n_per_plot = 100)
#> Warning in InstaPrism(input_type = "raw", sc_Expr = scExpr_train, bulk_Expr =
#> simulated_bulk, : fraction estimation didn't converge for some samples, enable
#> convergence.plot for details and try larger n.iter
Now, let’s try using a higher value of n.iter
. This time
we get fraction estimation for all the cell types converge (with all the
abs_diff less than 0.01).
InstaPrism.res = InstaPrism(input_type = 'raw',sc_Expr = scExpr_train,bulk_Expr = simulated_bulk,
cell.type.labels = cell.type.labels, cell.state.labels = cell.state.labels,
n.iter = 100,n.core = 16,convergence.plot=T,max_n_per_plot = 100)
InstaPrism()
,
InstaPrism_update()
,
reconstruct_Z_ct_initial()
,
reconstruct_Z_ct_updated()
and
InstaPrism_legacy()
.