Question: Using Combat function of SVA to remove known batch effect and using the result for down-stream analysis
gravatar for littlefishes20
11 months ago by
littlefishes2010 wrote:

Hello Bioconductor support:

I've searched many other posters but failed to figure out my problem.

Here is the introduction to my data. I have eight groups of phenotypes I interested in, which has 7 or 8 biological replicates in each group, and the total number of samples is 55. I've got an unnormalized read cout matrix and a normalized rpkm matrix.

The key point is biological replicates in each groups were sequenced by two different platforms, fox example, 3 replicates in group1 were sequenced three years ago (we called them batch1 in the following), and the other 4 replicates were sequenced in this years (we called them batch2 in the following), and so did group2, group3 ,ect. Thus, I can see a clear distinction between batch1 and batch2. To remove the batch effect, I've learned several method like sva package, limma package, ect.

Here are my question:

1. which is a better input for sva? The unnormalized read cout matrix or the normalized rpkm matrix?

2. It seems that I must filtered the low expression genes before using sva, for read cout matrix I retained the genes with more than 100 reads in all samples and for rpkm matrix I retained the genes with rpkm bigger than 1 in all samples. Only under this condition, can I successfully run the combat function of sva, or they just print the following messages and quit.

"Found 2 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Error in while (change > conv) { : missing value where TRUE/FALSE needed"

However, I think this condition is too strict for my data, which will filter a great number of genes that with high potential to be DEGs. Please let me know what mistakes I took.

3. After removing the batch effect, I will get a matrix with decimal and negative values, with same structure as my input, for both read cout matrix and rpkm matrix. I wonder how to put them as input and find DEG using DESeq2 and edgeR.

There are all my questions, I am really looking for the answer, thanks!

ADD COMMENTlink modified 11 months ago by Aaron Lun19k • written 11 months ago by littlefishes2010
gravatar for Aaron Lun
11 months ago by
Aaron Lun19k
Cambridge, United Kingdom
Aaron Lun19k wrote:

If you are using edgeR, you should be supplying the original counts and blocking on the batch effect in the design matrix. Otherwise, the uncertainty of estimating the blocking coefficients will not be properly considered by the GLM fit. Blocking should not be a problem if each group contains one or more replicates in each batch.

ADD COMMENTlink modified 11 months ago • written 11 months ago by Aaron Lun19k

Thanks for quick response!Do you meaning I shouldn't remove batch effect and I should just point it out while using edgeR for DE analysis?

ADD REPLYlink written 11 months ago by littlefishes2010

If by "point it out", you mean "block on it", then yes.

ADD REPLYlink written 11 months ago by Aaron Lun19k
Please log in to add an answer.


Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 213 users visited in the last hour