DESeq with no replicates
1
0
Entering edit mode
Wu, Xiwei ▴ 350
@wu-xiwei-1102
Last seen 9.6 years ago
Dear list, I have a smallRNA sequencing data from two samples, without replicates. I tried using DESeq to identify differentially expressed gene following the manual, and the code is as below: > library(DESeq) > conds <- c("NSC", "TSC") > head(combined_data) NSC TSC hsa-let-7a 906684 413336 hsa-let-7a* 2 1281 hsa-let-7a-2* 51 132 hsa-let-7b 179751 63079 hsa-let-7b* 46 684 hsa-let-7c 380745 50729 > apply(combined_data, 2, sum) NSC TSC 5018095 8658429 > cds <- newCountDataSet(combined_data, conds) > cds <- estimateSizeFactors( cds ) > sizeFactors(cds) NSC TSC 1 1 > cds <- estimateVarianceFunctions(cds, pool=T) > res <- nbinomTest(cds, "NSC", "TSC") > head(res[order(res$pval),]) id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB 397 hsa-miR-299-5p 329.5 329 330 1.0030395 0.004378441 0.002408825 0.9236618 NA NA 131 hsa-miR-1285 407.0 406 408 1.0049261 0.007089425 0.002922980 0.9236618 NA NA 196 hsa-miR-146a 170.5 170 171 1.0058824 0.008461579 0.004641615 0.9696484 NA NA 525 hsa-miR-425* 180.5 178 183 1.0280899 0.039966407 0.008776619 0.9696484 NA NA 826 hsa-miR-675* 714.0 722 706 0.9778393 -0.032330654 0.008892177 0.9696484 NA NA 505 hsa-miR-379* 260.5 256 265 1.0351562 0.049848549 0.012192131 0.9696484 NA NA > sessionInfo() R version 2.11.1 (2010-05-31) i386-pc-mingw32 locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] preprocessCore_1.10.0 limma_3.4.5 DESeq_1.0.6 locfit_1.5-6 lattice_0.18-8 [6] akima_0.5-4 Biobase_2.8.0 loaded via a namespace (and not attached): [1] annotate_1.26.1 AnnotationDbi_1.10.2 DBI_0.2-5 genefilter_1.30.0 geneplotter_1.26.0 [6] grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-2 splines_2.11.1 survival_2.35-8 [11] tools_2.11.1 xtable_1.5-6 I have a few questions here: 1. The sizeFactors were estimated as 1, but the actual total number of reads in the two samples differ 60%, could it be correct? 2. It seems that the identified top candidates with the smallest p value are all with very small fold changes. I have seen huge difference between the two samples for some miRNA, but the P value was close to 1; 3. The resVarA and resVarB are NA for all genes, did I do something wrong when calling the estimateVarianceFunctions? 4. For a RNA or small RNA sequencing experiment with only two samples, what is the best way to normalize data? Any help will be greatly appreciated. Xiwei Wu, Ph.D. Bioinformatics Core Assistant Research Professor Department of Molecular Medicine Beckman Research Institute City of Hope x.65071 --------------------------------------------------------------------- SECURITY/CONFIDENTIALITY WARNING: \ This message and ...{{dropped:30}}
Sequencing miRNA SmallRNA DESeq Sequencing miRNA SmallRNA DESeq • 2.0k views
ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 3 months ago
EMBL European Molecular Biology Laborat…
Dear Xiwei can you try with a more recent version of DESeq, e.g. version 1.2.1 from Bioconductor 2.7 ? Just to keep the discussion more relevant to all of us. 1.+4.: The size factors you report do indeed look peculiar. I don't think that this has anything to do with the DESeq version though - one possible explanation could be that combined_data matrix contains lots of 'trivial' rows (e.g. all 0 or all 1). The default size factor estimation is not based on the sums, but on the median of the ratios between the columns. In most cases this makes sense, but you can of course override it and supply your own size factors if you have a good reason. What do you get from plot(jitter(combined_data)) or alpha=0.5; plot(jitter(combined_data^alpha)) ? One thing I would try is to inspect the matrix combined_data more closely, and if there are many trivial rows, then to filter out those rows first. 3. According to the manual page of nbinomTest, "resVarA is the ratio of the row-wise estimate of the base variance of the counts for condition A, divided by ..." Since you have no replicates in condition A (and similarly for B), that number is NA. 2. I don't know what is going on... Note that 'padj' is close to one. Still, in order to move forward, can you please retry with the above suggestions, and if doubts remain, send us the combined_data matrix to inspect the problem more directly? Best wishes Wolfgang Il Dec/16/10 8:55 PM, Wu, Xiwei ha scritto: > Dear list, > > I have a smallRNA sequencing data from two samples, without replicates. > I tried using DESeq to identify differentially expressed gene following > the manual, and the code is as below: > >> library(DESeq) >> conds<- c("NSC", "TSC") >> head(combined_data) > NSC TSC > hsa-let-7a 906684 413336 > hsa-let-7a* 2 1281 > hsa-let-7a-2* 51 132 > hsa-let-7b 179751 63079 > hsa-let-7b* 46 684 > hsa-let-7c 380745 50729 > >> apply(combined_data, 2, sum) > NSC TSC > 5018095 8658429 > >> cds<- newCountDataSet(combined_data, conds) >> cds<- estimateSizeFactors( cds ) >> sizeFactors(cds) > NSC TSC > 1 1 > >> cds<- estimateVarianceFunctions(cds, pool=T) >> res<- nbinomTest(cds, "NSC", "TSC") > >> head(res[order(res$pval),]) > > id baseMean baseMeanA baseMeanB foldChange > log2FoldChange pval padj resVarA resVarB > 397 hsa-miR-299-5p 329.5 329 330 1.0030395 > 0.004378441 0.002408825 0.9236618 NA NA > 131 hsa-miR-1285 407.0 406 408 1.0049261 > 0.007089425 0.002922980 0.9236618 NA NA > 196 hsa-miR-146a 170.5 170 171 1.0058824 > 0.008461579 0.004641615 0.9696484 NA NA > 525 hsa-miR-425* 180.5 178 183 1.0280899 > 0.039966407 0.008776619 0.9696484 NA NA > 826 hsa-miR-675* 714.0 722 706 0.9778393 > -0.032330654 0.008892177 0.9696484 NA NA > 505 hsa-miR-379* 260.5 256 265 1.0351562 > 0.049848549 0.012192131 0.9696484 NA NA > >> sessionInfo() > R version 2.11.1 (2010-05-31) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United > States.1252 > [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C > > [5] LC_TIME=English_United States.1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > other attached packages: > [1] preprocessCore_1.10.0 limma_3.4.5 DESeq_1.0.6 > locfit_1.5-6 lattice_0.18-8 > [6] akima_0.5-4 Biobase_2.8.0 > > loaded via a namespace (and not attached): > [1] annotate_1.26.1 AnnotationDbi_1.10.2 DBI_0.2-5 > genefilter_1.30.0 geneplotter_1.26.0 > [6] grid_2.11.1 RColorBrewer_1.0-2 RSQLite_0.9-2 > splines_2.11.1 survival_2.35-8 > [11] tools_2.11.1 xtable_1.5-6 > > > > I have a few questions here: > > 1. The sizeFactors were estimated as 1, but the actual total number of > reads in the two samples differ 60%, could it be correct? > > 2. It seems that the identified top candidates with the smallest p value > are all with very small fold changes. I have seen huge difference > between the two samples for some miRNA, but the P value was close to 1; > > 3. The resVarA and resVarB are NA for all genes, did I do something > wrong when calling the estimateVarianceFunctions? > > 4. For a RNA or small RNA sequencing experiment with only two samples, > what is the best way to normalize data? > > Any help will be greatly appreciated. > > Xiwei Wu, Ph.D. > Bioinformatics Core > Assistant Research Professor > Department of Molecular Medicine > Beckman Research Institute > City of Hope > x.65071 > > > --------------------------------------------------------------------- > SECURITY/CONFIDENTIALITY WARNING: \ This message and ...{{dropped:30}} > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor -- Wolfgang Huber EMBL http://www.embl.de/research/units/genome_biology/huber
ADD COMMENT

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6