I am analyzing RNAseq gene expression data generated for a clear cell renal carcinoma sample. I want to get up- and down-regulated genes in this sample. As I don't have normal sample, I am using counts data of TCGA samples processed by Rsubread. Now I have counts data of KIRC-Normal from TCGA (KN), KIRC-Tumor from TCGA (KT) and Patient (P) samples. I know that I can't do correction for batch effects, as I have only one sample (P) from our lab.
I have created Deseq object with all the samples and built model with KN, KT and P. But while getting the results, I have used only KN and P samples. Here is the code that I have used-
condition = c(rep("KN",72),rep("KT",524),"P")
metaData = data.frame(condition = condition)
rownames(metaData) = colnames(dt)
dds <- DESeqDataSetFromMatrix(countData = dt, colData = metaData, design = ~ condition)
dds$condition <- relevel( dds$condition, "KN" )
dds <- DESeq(dds)
res <- results(dds, contrast=c("condition", "P", "KN"))
The purpose of using TCGA Tumor samples (KT) is to incorporate Tumor effect while normalizing the data. As my sample is tumor sample, I wanted to incorporate some tumor samples in the normalization. Am I doing it right ? Should I add KT samples or not ? Will it effect the gene filtering ?
I have also analyzed the data using edgeR. Normalized KN, KT and P samples using TMM; filtered genes based on cpm in KN and P (did not use KT samples in gene filtering); built limma model on logcpm of KT, KN and P; and used contrast for KN and P. The results are matching with Deseq2 analysis. But the fold change values of low count genes went up to thousand fold, which is difficult to explain/interpret.
Thanks in advance for all the help.