Dear all, please would you advise us on the following :

for example, shall we have 2 batches of scRNA-seq data of 2 conditions :

*WT_batch1,
WT_batch2,
DISEASE_batch1,
DISEASE_batch2,*

would the following approach be statistically legitimate in order to account/correct for the batch effect :

**1 -- use CCA (in seurat) or MNNcorrect (in scran) to account for the batch effects**

**2 -- followed by TSNE and network basedclustering, in order to place correctly the cells into CLUSTERS**

**3 -- and perform differential expression (with wilcoxon test, limma, edgeR, etc) between the CLUSTERS**

We know that CCA or MNNcorrect only place the cells in more "correct" clusters after batch correction, and do NOT provide a batch - corrected expression value.

In this case, considering for instance cluster_0, could we combine :

**a -- the matrix of cells : normalized expression in cluster-0 in WTbatch1,**

**with the matrix of cells : normalized expresion in cluster-0 in WTbatch2**

**(let's call this matrix WT batch1batch2)**

**b -- the matrix of cells : normalized expression in cluster-0 in DISEASEbatch1,**

**with the matrix of cells : normalized expresion in cluster-0 in DISEASEbatch2**

(**let's call this matrix DISEASE batch1batch2)**

c -- and use limma or edgeR or DESeq2 on **WT batch1batch2** versus

**DISEASE**in order to get the differential expression

*batch1*batch2we would prefer to combine the batches into a matrix **WT batch1batch2** and respectively, into a matrix

**DISEASE**, as, sometimes, the number of cells in a cluster may be small (ie less than 200 cells)

*batch1*batch2or if there is any other approach that you'd recommend ..

thank you,

bogdan

Is there a question here?

Hi Aaron, great to hear from you. It would be awesome to have your opinion on the following question please :

after TSNE, considering for instance

cluster_0,could we combine :a --

the matrix of cells w/ normalizedexpression in cluster-0 in WTbatch1,with the matrix of cells w/ normalizedexpression in cluster-0 in WTbatch2(let's call this matrix WT_batch1_batch2)b -- the matrix of cells w/ normalized,expression in cluster-0 in DISEASEbatch1with the matrix of cells w/ normalizedexpresion in cluster-0 in DISEASEbatch2(let's call this matrix DISEASE_batch1_batch2)c -- and use

limma or edgeRonWTversusbatch1batch2DISEASEin order to get the differential expression ?batch1batch2thank you,

bogdan

hi Aaron, i will read again your tutorials :

-- on DE :

https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/de.html-- on batch correction :

https://bioconductor.org/packages/release/workflows/vignettes/simpleSingleCell/inst/doc/batch.htmlaiming to place these 2 piece of R code together for 10x Genomics scRNA-seq. Shall you have any comments, please let me know. thank you a lot !