How to find the differentially expressed genes for paired tumor normal samples without any biological replicates?
1
0
Entering edit mode
seeker • 0
@seeker-7773
Last seen 2.2 years ago
Netherlands

Dear all,

I would like to make an analysis with a multilevel design, with paired samples. Ie

My design looks like 

> colData(dds) = 

  sample condition
  <factor> <factor>
sample_1 sample_1 Control
sample_2 sample_1 Tumor
sample_3 sample_2 Control
sample_4 sample_2 Tumor
sample_5 sample_3 Control
sample_6 sample_3 Tumor
sample_7 sample_4 Control
sample_8 sample_4 Tumor

> dds=DESeqDataSetFromMatrix( countData = nlDe, colData = colData, design = ~ sample+ condition)

> mcols(res, use.name = T)

DataFrame with 6 rows and 2 columns
                    

  type description  
baseMean intermediate mean of normalized counts for all samples
log2FoldChange results log2 fold change (MAP): condition Tumor vs Control
lfcSE results standard error: condition Tumor vs Control
stat results Wald statistic: condition Tumor vs Control
pvalue results Wald test p-value: condition Tumor vs Control
padj results BH adjusted p-values

> resultsNames(dds)
[1] "Intercept"        "sample_1"             "sample_2"             "sample_3"             "sample_4"             "conditionControl"
[7] "conditionTumor"   

 

I was wondering if this is the right way of doing the analysis?

 

rnaseq deseq2 • 2.1k views
ADD COMMENT
0
Entering edit mode

Only because this comes up so often: Of course you do have biological replicates: You have four patients, not just one. This counts as replications. 

ADD REPLY
0
Entering edit mode

but these are samples are from four different patients.

ADD REPLY
0
Entering edit mode

Only because this comes up so often: Of course you do have biological replicates: You have four patients, not just one. This counts as replications. 

ADD REPLY
1
Entering edit mode
@mikelove
Last seen 1 hour ago
United States
Yes, the results table you want is just: results(dds)
ADD COMMENT
0
Entering edit mode

Michael,

But when i tried  mcols(res, use.name = T), im getting 

DataFrame with 6 rows and 2 columns
                       type                                description
                <character>                                <character>
baseMean   intermediate  mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MLE): sample_4.conditionTumor
lfcSE             results         standard error: sample_4.conditionTumor
stat                results         Wald statistic: sample_4.conditionTumor
pvalue           results          Wald test p-value: sample_4.conditionTumor
padj               results                       BH adjusted p-values

ADD REPLY
0
Entering edit mode
You're skipping some lines of code which makes it difficult to help. Please post all your code and sessionInfo()
ADD REPLY
0
Entering edit mode

Please find the code below

sample2 <- read.table("read.count.txt", sep = "/t", stringsAsFactors= F, header=T)

sample2 <- sample2[!duplicated(sample2$Gene),]

comSamples <- sample2[,-1]
rownames(comSamples) <- sample2[,1]


sample_1 <- colnames(comSamples)
my_conditions <-  factor(rep(c("Control","Tumor"),length(sample_1)/2))
dex <-  factor(rep((1:4),each =2))
sample <- paste("sample",dex,sep ="_")

colData <- data.frame(sample = sample,condition=my_conditions,row.names= sample_1)

colData(dds) = 

  sample condition
  <factor> <factor>
sample_1 sample_1 Control
sample_2 sample_1 Tumor
sample_3 sample_2 Control
sample_4 sample_2 Tumor
sample_5 sample_3 Control
sample_6 sample_3 Tumor
sample_7 sample_4 Control
sample_8 sample_4 Tumor

  thres= 8
  nzIndex= as.vector(which(apply(comSamples,1,function(x){sum(x>thres)/length(x)})>=0.5))
  nlDe = comSamples[nzIndex,] 

dds=DESeqDataSetFromMatrix( countData = nlDe, colData = colData, design = ~sample+condition)
dds$condition <- relevel(dds$condition, "Control")
dds <- DESeq(dds)
res  <-  results(dds)

mcols(res, use.names=T) = 
DataFrame with 6 rows and 2 columns
                       type                                        description
                <character>                                        <character>
baseMean       intermediate          mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MAP): condition Tumor vs Control
lfcSE               results         standard error: condition Tumor vs Control
stat                results         Wald statistic: condition Tumor vs Control
pvalue              results      Wald test p-value: condition Tumor vs Control
padj                results                               BH adjusted p-values


  
   write.csv(as.data.frame(mcols(res, use.name = T)),file = "./output/DATE-DESeq2-test-conditions.csv")
  
plotMA(dds, ylim=c(-8,8),main = "RNAseq experiment")

result.table <- as.data.frame(res)
sig_gene <- row.names(result.table)[which(abs(result.table$log2FoldChange) >1)]
sig_gene_table <- result.table[which(abs(result.table$log2FoldChange) >1),]

sig_gene_table <- sig_gene_table[!is.na(sig_gene_table$pvalue),]
sig_gene_table <- sig_gene_table[sig_gene_table$padj < 0.05,]
  
resdata <- merge(sig_gene_table, as.data.frame(counts(dds,normalized=T)), by='row.names',sort=F)

sessionInfo()

R version 3.2.1 (2015-06-18)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.11.1 (unknown)

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DESeq2_1.10.0              RcppArmadillo_0.6.200.2.0  Rcpp_0.12.2                SummarizedExperiment_1.0.1
 [5] Biobase_2.30.0             GenomicRanges_1.22.1       GenomeInfoDb_1.6.1         IRanges_2.4.4             
 [9] S4Vectors_0.8.3            BiocGenerics_0.16.1        biomaRt_2.26.1            

loaded via a namespace (and not attached):
 [1] RColorBrewer_1.1-2   futile.logger_1.4.1  plyr_1.8.3           XVector_0.10.0       bitops_1.0-6         futile.options_1.0.0
 [7] tools_3.2.1          zlibbioc_1.16.0      rpart_4.1-10         digest_0.6.8         annotate_1.48.0      lattice_0.20-33     
[13] RSQLite_1.0.0        gtable_0.1.2         DBI_0.3.1            proto_0.3-10         gridExtra_2.0.0      genefilter_1.52.0   
[19] cluster_2.0.3        stringr_1.0.0        locfit_1.5-9.1       nnet_7.3-11          grid_3.2.1           AnnotationDbi_1.32.0
[25] survival_2.38-3      XML_3.98-1.3         BiocParallel_1.4.0   foreign_0.8-66       latticeExtra_0.6-26  Formula_1.2-1       
[31] geneplotter_1.48.0   ggplot2_1.0.1        reshape2_1.4.1       lambda.r_1.1.7       magrittr_1.5         splines_3.2.1       
[37] scales_0.3.0         Hmisc_3.17-0         MASS_7.3-45          xtable_1.8-0         colorspace_1.2-6     stringi_1.0-1       
[43] acepack_1.3-3.3      RCurl_1.95-4.7       munsell_0.4.2       

ADD REPLY
0
Entering edit mode
Yes, this is giving you the comparison you want. See: "log2 fold change (MAP): condition Tumor vs Control"
ADD REPLY
0
Entering edit mode

The reason why i was worried about my design is because, if i look at the normalized read count  (rld) of tumor/control in Fgfr1 gene, there is average fold change on 1.5.

Sample_1_Cont    Sample_1_Tum    Sample_2_Cont    Sample_2_Tumor    Sample_3_Cont    Sample_3_Tumor    Sample_4_Cont    Sample_4_Tumor
1.61928676       14.44964664        247.8401588        390.4711465        907.34300164    113.9341785        264.5305537        714.9689033

But looking at the result table log2FoldChange is 0.66. 

Gene baseMean log2FoldChange lfcSE stat pvalue padj
Fgfr1 223.1446095 0.668350599 0.173459896 3.853055467 0.000116653 0.006547737

 

ADD REPLY
0
Entering edit mode

You should be averaging in the log2 scale. While the first sample has a large, positive log2 fold change, it is not so large for the other samples, and large in the negative direction for sample 3.

In addition, the log2 fold change provided by DESeq2 is the maximum posterior estimate, so it's not easily calculated from normalized counts. Take a look at the DESeq2 paper: http://www.genomebiology.com/2014/15/12/550 

ADD REPLY
0
Entering edit mode

Michael,

But when i tried  mcols(res, use.name = T), im getting 

DataFrame with 6 rows and 2 columns
                       type                                description
                <character>                                <character>
baseMean   intermediate  mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MLE): sample_4.conditionTumor
lfcSE             results         standard error: sample_4.conditionTumor
stat                results         Wald statistic: sample_4.conditionTumor
pvalue           results          Wald test p-value: sample_4.conditionTumor
padj               results                       BH adjusted p-values

ADD REPLY
0
Entering edit mode

Michael,

But when i tried  mcols(res, use.name = T), im getting 

DataFrame with 6 rows and 2 columns
                       type                                description
                <character>                                <character>
baseMean   intermediate  mean of normalized counts for all samples
log2FoldChange      results log2 fold change (MLE): sample_4.conditionTumor
lfcSE             results         standard error: sample_4.conditionTumor
stat                results         Wald statistic: sample_4.conditionTumor
pvalue           results          Wald test p-value: sample_4.conditionTumor
padj               results                       BH adjusted p-values

ADD REPLY

Login before adding your answer.

Traffic: 947 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6