Question: analysis of multiple batches and multiple treatment conditions in scRNA-seq
0
gravatar for Bogdan
11 weeks ago by
Bogdan580
Palo Alto, CA, USA
Bogdan580 wrote:

Dear all, please would you advise us on the following :

for example, shall we have 2 batches of scRNA-seq data of 2 conditions :

WT_batch1, WT_batch2, DISEASE_batch1, DISEASE_batch2,

would the following approach be statistically legitimate in order to account/correct for the batch effect :

1 -- use CCA (in seurat) or MNNcorrect (in scran) to account for the batch effects

2 -- followed by TSNE and networkbasedclustering, in order to place correctly the cells into CLUSTERS

3 -- and perform differential expression (with wilcoxon test, limma, edgeR, etc) between the CLUSTERS

We know that CCA or MNNcorrect only place the cells in more "correct" clusters after batch correction, and do NOT provide a batch - corrected expression value.

In this case, considering for instance cluster_0, could we combine :

a -- the matrix of cells : normalizedexpression in cluster-0 in WTbatch1,

with the matrix of cells : normalizedexpresion in cluster-0 in WTbatch2

(let's call this matrix WTbatch1batch2)

b -- the matrix of cells : normalizedexpression in cluster-0 in DISEASEbatch1,

with the matrix of cells : normalizedexpresion in cluster-0 in DISEASEbatch2

(let's call this matrix DISEASEbatch1batch2)

c -- and use limma or edgeR or DESeq2 on WTbatch1batch2 versus DISEASEbatch1batch2 in order to get the differential expression

we would prefer to combine the batches into a matrix WTbatch1batch2 and respectively, into a matrix DISEASEbatch1batch2, as, sometimes, the number of cells in a cluster may be small (ie less than 200 cells)

or if there is any other approach that you'd recommend ..

thank you,

bogdan

limma edger deseq2 combat scrnaseq • 234 views
ADD COMMENTlink modified 10 weeks ago by Andrew_McDavid190 • written 11 weeks ago by Bogdan580

Is there a question here?

ADD REPLYlink written 10 weeks ago by Aaron Lun25k

Hi Aaron, great to hear from you. It would be awesome to have your opinion on the following question please :

after TSNE, considering for instance cluster_0, could we combine :

a -- the matrix of cells w/ normalizedexpression in cluster-0 in WTbatch1,

with the matrix of cells w/ normalizedexpression in cluster-0 in WTbatch2

(let's call this matrix WT_batch1_batch2)

b -- the matrix of cells w/ normalizedexpression in cluster-0 in DISEASEbatch1,

with the matrix of cells w/ normalizedexpresion in cluster-0 in DISEASEbatch2

(let's call this matrix DISEASE_batch1_batch2)

c -- and use limma or edgeR on WTbatch1batch2 versus DISEASEbatch1batch2 in order to get the differential expression ?

thank you,

bogdan

ADD REPLYlink written 10 weeks ago by Bogdan580

hi Aaron, i will read again your tutorials :

-- on DE : https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/de.html

-- on batch correction : https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/batch.html

aiming to place these 2 piece of R code together for 10x Genomics scRNA-seq. Shall you have any comments, please let me know. thank you a lot !

ADD REPLYlink modified 10 weeks ago • written 10 weeks ago by Bogdan580
Answer: analysis of multiple batches and multiple treatment conditions in scRNA-seq
0
gravatar for Andrew_McDavid
10 weeks ago by
Andrew_McDavid190 wrote:

If batch is indeed crossed with condition, as the OP seemed to indicate, this suggests the model of the form

~ cluster/condition + batch:cluster

where cluster is factor that identifies the cluster assignment, derived however desired. You could fit this model in any number of ways, including MAST. The main gotcha with this is that "expression",which implicitly is appearing on the left-hand side of the regression was also used in some complicated fashion to define the "cluster" variable on the right-hand side. This means you can't trust any p-value. Don't even attempt to report them.

ADD COMMENTlink written 10 weeks ago by Andrew_McDavid190

Hi Andrew,

I have a 10X scRNA-seq dataset where there are 4 replicates for each of the 2 conditions, but the samples were processed in 3 days - 1 control sample on the first day, 1 control and 2 treated samples on the second day, and 2 contorl and 2 treated samples on the third day. What is the best way to model this for differential gene expression analysis?

Thanks, Joyce

ADD REPLYlink written 6 weeks ago by ilee80

If you have a new question, please ask a new question.

ADD REPLYlink written 6 weeks ago by Aaron Lun25k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 393 users visited in the last hour