Using Combat function of SVA to remove known batch effect and using the result for down-stream analysis
1
0
Entering edit mode
@littlefishes20-13054
Last seen 10 months ago
Taiwan

Hello Bioconductor support:

I've searched many other posters but failed to figure out my problem.

Here is the introduction to my data. I have eight groups of phenotypes I interested in, which has 7 or 8 biological replicates in each group, and the total number of samples is 55. I've got an unnormalized read cout matrix and a normalized rpkm matrix.

The key point is biological replicates in each groups were sequenced by two different platforms, fox example, 3 replicates in group1 were sequenced three years ago (we called them batch1 in the following), and the other 4 replicates were sequenced in this years (we called them batch2 in the following), and so did group2, group3 ,ect. Thus, I can see a clear distinction between batch1 and batch2. To remove the batch effect, I've learned several method like sva package, limma package, ect.

Here are my question:

1. which is a better input for sva? The unnormalized read cout matrix or the normalized rpkm matrix?

2. It seems that I must filtered the low expression genes before using sva, for read cout matrix I retained the genes with more than 100 reads in all samples and for rpkm matrix I retained the genes with rpkm bigger than 1 in all samples. Only under this condition, can I successfully run the combat function of sva, or they just print the following messages and quit.

"Found 2 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Error in while (change > conv) { : missing value where TRUE/FALSE needed"

However, I think this condition is too strict for my data, which will filter a great number of genes that with high potential to be DEGs. Please let me know what mistakes I took.

3. After removing the batch effect, I will get a matrix with decimal and negative values, with same structure as my input, for both read cout matrix and rpkm matrix. I wonder how to put them as input and find DEG using DESeq2 and edgeR.

There are all my questions, I am really looking for the answer, thanks!

sva batch effect deseq2 edgeR • 1.6k views
ADD COMMENT
0
Entering edit mode
Aaron Lun ★ 28k
@alun
Last seen 4 hours ago
The city by the bay

If you are using edgeR, you should be supplying the original counts and blocking on the batch effect in the design matrix. Otherwise, the uncertainty of estimating the blocking coefficients will not be properly considered by the GLM fit. Blocking should not be a problem if each group contains one or more replicates in each batch.

ADD COMMENT
0
Entering edit mode

Thanks for quick response!Do you meaning I shouldn't remove batch effect and I should just point it out while using edgeR for DE analysis?

ADD REPLY
0
Entering edit mode

If by "point it out", you mean "block on it", then yes.

ADD REPLY

Login before adding your answer.

Traffic: 864 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6