Question

FW: DESeq analysis

0

Entering edit mode

Fatemehsadat Seyednasrollah ▴ 260

@fatemehsadat-seyednasrollah-5367

Last seen 9.6 years ago

________________________________________ From: Fatemehsadat Seyednasrollah Sent: Friday, June 29, 2012 12:00 PM To: narges [guest] Subject: RE: DESeq analysis Hi, First thanks a lot for your answer. Actually I have used a subset of a public data from Bowtie(the Montgomery) and below are the reduced codes of my work both from edgeR and DEseq. I wanted to know have I done something wrong to obtain very different answers ( 85 from DESeq and 407 from edgeR) or it is natural to have this hude difference and it is related to the algorithms? edgeR: > g1 <- read.delim ("count1.txt", row.names = 1) > head(g1) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F ENSG00000000003 0 0 0 0 1 0 0 ENSG00000000005 0 0 0 0 0 0 0 ENSG00000000419 10 24 19 20 19 8 14 ENSG00000000457 17 15 13 18 21 18 21 ENSG00000000460 2 3 5 2 4 6 8 ENSG00000000938 20 4 35 16 10 17 19 NA10847F ENSG00000000003 0 ENSG00000000005 0 ENSG00000000419 6 ENSG00000000457 15 ENSG00000000460 2 ENSG00000000938 9 > group <- factor(rep(c("Male", "Female"), each= 4)) > dge <- DGEList(counts = g1 , group = group ) Calculating library sizes from column totals. > dge <- calcNormFactors(dge) > dge <- estimateCommonDisp(dge) > sqrt (dge$common.dispersion) [1] 0.3858996 > test <- exactTest(dge) > head(test$table) logFC logCPM PValue ENSG00000000003 -2.3441897 -3.042057 1.0000000 ENSG00000000005 0.0000000 -Inf 1.0000000 ENSG00000000419 0.5777309 3.850993 0.2791539 ENSG00000000457 -0.3054489 4.080866 0.5592668 ENSG00000000460 -0.7792622 1.966865 0.3274528 ENSG00000000938 0.3909100 3.997866 0.4269672 > sum (test$table$PValue <0.01) [1] 407 DESeq: > g1 <- read.table("count1.txt", header = TRUE, row.names = 1) > conds <- factor(rep(c("Male", "Female"), each= 4)) > dataPack <- data.frame(row.names = colnames(g1), condition =rep( c("Male", "Female"), each= 4)) > dataPack condition NA06994M Male NA07051M Male NA07347M Male NA07357M Male NA07000F Female NA07037F Female NA07346F Female NA10847F Female > cds <- newCountDataSet(g1, conds) > head(cds) CountDataSet (storageMode: environment) assayData: 1 features, 8 samples element names: counts protocolData: none phenoData sampleNames: NA06994M NA07051M ... NA10847F (8 total) varLabels: sizeFactor condition varMetadata: labelDescription featureData: none experimentData: use 'experimentData(object)' Annotation: > head(counts(cds) + ) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F ENSG00000000003 0 0 0 0 1 0 0 ENSG00000000005 0 0 0 0 0 0 0 ENSG00000000419 10 24 19 20 19 8 14 ENSG00000000457 17 15 13 18 21 18 21 ENSG00000000460 2 3 5 2 4 6 8 ENSG00000000938 20 4 35 16 10 17 19 NA10847F ENSG00000000003 0 ENSG00000000005 0 ENSG00000000419 6 ENSG00000000457 15 ENSG00000000460 2 ENSG00000000938 9 > cds <- estimateSizeFactors(cds) > sizeFactors(cds) NA06994M NA07051M NA07347M NA07357M NA07000F NA07037F NA07346F NA10847F 0.8599841 1.1102643 1.0869086 1.1157556 1.1056726 1.0666049 0.9152017 0.9402086 > head(counts(cds, normalized= TRUE)) > cds <- estimateDispersions(cds) > result <- nbinomTest(cds, "Male", "Female") > nrow(subset(result, result$pval <0.01)) [1] 85 Again thank you so much With Best Regards, Narges________________________________________ From: narges [guest] [guest@bioconductor.org] Sent: Tuesday, June 26, 2012 7:17 PM To: bioconductor at r-project.org; Fatemehsadat Seyednasrollah Subject: DESeq analysis Hi all I am doing some RNA seq analysis with DESeq. I have applied the nbinomTest to my dataset which I know have many differentially expressed genes but the first problem is that the result values for "padj"column is almost NA and sometimes 1. and when I want to have a splice from my fata frame the result is not meaningful for me. -- output of sessionInfo(): res <- nbinomTest(cds, "Male", "Female") > head(res) id baseMean baseMeanA baseMeanB foldChange log2FoldChange 1 ENSG00000000003 0.1130534 0.000000 0.2261067 Inf Inf 2 ENSG00000000005 0.0000000 0.000000 0.0000000 NaN NaN 3 ENSG00000000419 14.3767155 17.162610 11.5908205 0.6753530 -0.5662863 4 ENSG00000000457 17.0174761 15.342800 18.6921526 1.2183013 0.2848710 5 ENSG00000000460 3.9414822 2.855099 5.0278659 1.7610131 0.8164056 6 ENSG00000000938 16.0894945 18.350117 13.8288718 0.7536122 -0.4081058 pval padj 1 0.9959638 1 2 NA NA 3 0.3208560 1 4 0.5942512 1 5 0.4840607 1 6 0.5409953 1 > res1 <- res[res$padj<0.1,] > head(res1) id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj NA <na> NA NA NA NA NA NA NA NA.1 <na> NA NA NA NA NA NA NA NA.2 <na> NA NA NA NA NA NA NA NA.3 <na> NA NA NA NA NA NA NA NA.4 <na> NA NA NA NA NA NA NA NA.5 <na> NA NA NA NA NA NA NA my first question is that why although I know there are some differentially expressed genes in the my data, all the padj values are NA or 1 and the second question is this "NA.1" , "NA.2", ..... which are emerged as the first column of object "res1"instead of name of genes Thank you so much Regards -- Sent via the guest posting facility at bioconductor.org.

edgeR DESeq edgeR DESeq • 946 views

ADD COMMENT • link 11.8 years ago Fatemehsadat Seyednasrollah ▴ 260