Question

Using Combat function of SVA to remove known batch effect and using the result for down-stream analysis

0

Entering edit mode

littlefishes20 ▴ 10

@littlefishes20-13054

Last seen 11 months ago

Taiwan

Hello Bioconductor support:

I've searched many other posters but failed to figure out my problem.

Here is the introduction to my data. I have eight groups of phenotypes I interested in, which has 7 or 8 biological replicates in each group, and the total number of samples is 55. I've got an unnormalized read cout matrix and a normalized rpkm matrix.

The key point is biological replicates in each groups were sequenced by two different platforms, fox example, 3 replicates in group1 were sequenced three years ago (we called them batch1 in the following), and the other 4 replicates were sequenced in this years (we called them batch2 in the following), and so did group2, group3 ,ect. Thus, I can see a clear distinction between batch1 and batch2. To remove the batch effect, I've learned several method like sva package, limma package, ect.

Here are my question:

1. which is a better input for sva? The unnormalized read cout matrix or the normalized rpkm matrix?

2. It seems that I must filtered the low expression genes before using sva, for read cout matrix I retained the genes with more than 100 reads in all samples and for rpkm matrix I retained the genes with rpkm bigger than 1 in all samples. Only under this condition, can I successfully run the combat function of sva, or they just print the following messages and quit.

"Found 2 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Error in while (change > conv) { : missing value where TRUE/FALSE needed"

However, I think this condition is too strict for my data, which will filter a great number of genes that with high potential to be DEGs. Please let me know what mistakes I took.

3. After removing the batch effect, I will get a matrix with decimal and negative values, with same structure as my input, for both read cout matrix and rpkm matrix. I wonder how to put them as input and find DEG using DESeq2 and edgeR.

There are all my questions, I am really looking for the answer, thanks!

sva batch effect deseq2 edgeR • 1.6k views

ADD COMMENT • link updated 7.0 years ago by Aaron Lun ★ 28k • written 7.0 years ago by littlefishes20 ▴ 10

score 0 · Answer 1 · 2017-05-16

0

Entering edit mode

Aaron Lun ★ 28k

@alun

Last seen 19 hours ago

The city by the bay

If you are using edgeR, you should be supplying the original counts and blocking on the batch effect in the design matrix. Otherwise, the uncertainty of estimating the blocking coefficients will not be properly considered by the GLM fit. Blocking should not be a problem if each group contains one or more replicates in each batch.

ADD COMMENT • link 7.0 years ago Aaron Lun ★ 28k

0

Entering edit mode

Thanks for quick response！Do you meaning I shouldn't remove batch effect and I should just point it out while using edgeR for DE analysis？