Hello Bioconductor support:
I've searched many other posters but failed to figure out my problem.
Here is the introduction to my data. I have eight groups of phenotypes I interested in, which has 7 or 8 biological replicates in each group, and the total number of samples is 55. I've got an unnormalized read cout matrix and a normalized rpkm matrix.
The key point is biological replicates in each groups were sequenced by two different platforms, fox example, 3 replicates in group1 were sequenced three years ago (we called them batch1 in the following), and the other 4 replicates were sequenced in this years (we called them batch2 in the following), and so did group2, group3 ,ect. Thus, I can see a clear distinction between batch1 and batch2. To remove the batch effect, I've learned several method like sva package, limma package, ect.
Here are my question:
1. which is a better input for sva? The unnormalized read cout matrix or the normalized rpkm matrix?
2. It seems that I must filtered the low expression genes before using sva, for read cout matrix I retained the genes with more than 100 reads in all samples and for rpkm matrix I retained the genes with rpkm bigger than 1 in all samples. Only under this condition, can I successfully run the combat function of sva, or they just print the following messages and quit.
"Found 2 batches
Adjusting for 0 covariate(s) or covariate level(s)
Standardizing Data across genes
Fitting L/S model and finding priors
Finding parametric adjustments
Error in while (change > conv) { : missing value where TRUE/FALSE needed"
However, I think this condition is too strict for my data, which will filter a great number of genes that with high potential to be DEGs. Please let me know what mistakes I took.
3. After removing the batch effect, I will get a matrix with decimal and negative values, with same structure as my input, for both read cout matrix and rpkm matrix. I wonder how to put them as input and find DEG using DESeq2 and edgeR.
There are all my questions, I am really looking for the answer, thanks!
Thanks for quick response!Do you meaning I shouldn't remove batch effect and I should just point it out while using edgeR for DE analysis?
If by "point it out", you mean "block on it", then yes.