Question

RNAseq gene expression analysis in Personalized genomics

0

Entering edit mode

bioc.vkas • 0

@biocvkas-18179

Last seen 6.4 years ago

I am analyzing RNAseq gene expression data generated for a clear cell renal carcinoma sample. I want to get up- and down-regulated genes in this sample. As I don't have normal sample, I am using counts data of TCGA samples processed by Rsubread. Now I have counts data of KIRC-Normal from TCGA (KN), KIRC-Tumor from TCGA (KT) and Patient (P) samples. I know that I can't do correction for batch effects, as I have only one sample (P) from our lab.

I have created Deseq object with all the samples and built model with KN, KT and P. But while getting the results, I have used only KN and P samples. Here is the code that I have used-

condition = c(rep("KN",72),rep("KT",524),"P")
metaData = data.frame(condition = condition)
rownames(metaData) = colnames(dt)
dds <- DESeqDataSetFromMatrix(countData = dt, colData = metaData, design = ~ condition)
dds$condition <- relevel( dds$condition, "KN" )
dds <- DESeq(dds)
res <- results(dds, contrast=c("condition", "P", "KN"))

The purpose of using TCGA Tumor samples (KT) is to incorporate Tumor effect while normalizing the data. As my sample is tumor sample, I wanted to incorporate some tumor samples in the normalization. Am I doing it right ? Should I add KT samples or not ? Will it effect the gene filtering ?

I have also analyzed the data using edgeR. Normalized KN, KT and P samples using TMM; filtered genes based on cpm in KN and P (did not use KT samples in gene filtering); built limma model on logcpm of KT, KN and P; and used contrast for KN and P. The results are matching with Deseq2 analysis. But the fold change values of low count genes went up to thousand fold, which is difficult to explain/interpret.

Thanks in advance for all the help.

TCGA deseq2 edger limma • 1.3k views

ADD COMMENT • link updated 6.4 years ago by Michael Love 43k • written 6.4 years ago by bioc.vkas • 0

score 0 · Answer 1 · 2018-11-05

I'm not sure what's the best approach actually when you have normal, tumor, and then a single tumor sample from a separate batch. The approach above is not controlling for batch effects, and it sounds like you are aware of this. So you can't say with certainty what large effects you see are due to tumor and what are due to technical artifacts (of which they are many in tumor RNA sequencing).

Two notes:

Depending on what version of DESeq2 you are using, the DESeq2 LFC may be moderated in the output of results() using a shrinkage method. Current DESeq2 versions (4 releases now, since 1.16) moved the moderation to a separate function lfcShrink(), but you didn't state your version number or include sessionInfo().

Another note is that KT will be used to estimate the size factors and the dispersion, even if it is not included in the comparison of P / KN.

I can see specialized methods that might help to deal with the fact that you are missing your single normal sample, but I can't think of these off the top of my head.