Deseq2 number of fragments use for differential expression
2
0
Entering edit mode
aristotele_m ▴ 40
@aristotele_m-6821
Last seen 7.6 years ago
Italy

Dear All!

I have align  a total stranded RNA using STAR and count the gene level using htseq count.

All the sample on RNASeqQC have more then 60ML of total reads and more than 40 ML of uniq reads,

When I import on DESeq2 object and I try to see the number of element I found this:

colSums(assay(rld)

colSums.assay.rld...1e.06
A                     0.114
B                     0.114
C                     0.115
D                     0.114
E                     0.114
F                     0.115
G                     0.114
H                     0.116
I                     0.115

Where I can found all my other reads?

Thanks for any help!

 

 

sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.7          genefilter_1.50.0       ggplot2_1.0.1           limma_3.24.14           RColorBrewer_1.1-2     
 [6] gplots_2.17.0           org.Hs.eg.db_3.1.2      RSQLite_1.0.0           DBI_0.3.1               annotate_1.46.1        
[11] XML_3.98-1.3            AnnotationDbi_1.30.1    Biobase_2.28.0          biomaRt_2.24.0          DESeq2_1.8.1           
[16] RcppArmadillo_0.5.300.4 Rcpp_0.12.0             GenomicRanges_1.20.5    GenomeInfoDb_1.4.1      IRanges_2.2.5          
[21] S4Vectors_0.6.3         BiocGenerics_0.14.0    

loaded via a namespace (and not attached):
 [1] reshape2_1.4.1       cluster_2.0.3        magrittr_1.5         acepack_1.3-3.3      gtable_0.1.2         RCurl_1.95-4.7      
 [7] XVector_0.8.0        stringr_1.0.0        lambda.r_1.1.7       splines_3.2.1        Formula_1.2-1        lattice_0.20-33     
[13] survival_2.38-3      KernSmooth_2.23-15   plyr_1.8.3           locfit_1.5-9.1       futile.options_1.0.0 gridExtra_2.0.0     
[19] digest_0.6.8         colorspace_1.2-6     futile.logger_1.4.1  proto_0.3-10         stringi_0.5-5        geneplotter_1.46.0  
[25] Hmisc_3.16-0         labeling_0.3         gdata_2.17.0         rpart_4.1-10         BiocParallel_1.2.9   xtable_1.7-4        
[31] munsell_0.4.2        MASS_7.3-43          caTools_1.17.1       gtools_3.5.0         latticeExtra_0.6-26  tools_3.2.1         
[37] foreign_0.8-65       bitops_1.0-6         nnet_7.3-10          scales_0.2.5         grid_3.2.1          
>
deseq2 start • 1.8k views
ADD COMMENT
0
Entering edit mode
@steve-lianoglou-2771
Last seen 22 months ago
United States

The 1e.06 bit at the end of the column name that comes from colSums(assay(rld)) makes me suspicious.

Is rld a DESeqDataSet of read counts?

Can you provide the output of assay(rld)[1:10,], please?

 

ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

rld is the name in the vignette for the rlog transformed values. So colSums(assay(rld)) has nothing to do with total read count. 

Try:

colSums(counts(dds))

If this is not what you expected, you should go back to the htseq-count and figure out if everything there is correct. To get useful feedback here, you will need to provide all your htseq code and your R code.

ADD COMMENT
0
Entering edit mode

Thanks so much!!!I use this code:

mkdir -p  Count  && mkdir -p tmp && ~/usr/local/bin/samtools sort -no ALIGN/120/Aligned.sortedByCoord.out.bam tmp/120 |samtools view  - | htseq-count  --mode=intersection-nonempty --stranded=reverse --type=exon --idattr=gene_id - Trasc_GH37_74/Homo_sapiens.GRCh37.74.gtf  > Count/120_counts.txt

ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="Count/", design= ~1) # No design
dds <- estimateSizeFactors(ddHTSeq)
#filtro 10 conti there are less than 3 samples with counts greater than or equal to 10. Given that the inference i
idx <- rowSums( counts(dds) >= 10 ) >= 3

dds <- dds[idx,]
dds <- DESeq(dds)
rld = rlog(dds)

If I use counts() Improve the numbers but not so much:

SampleA
SampleB
sampleC
SampleD
SampleE
SampleF
SampleH
9.531
13.109
14.5
12.98
5.221
0.741
5.8
ADD REPLY
0
Entering edit mode

colSums(counts(dds)) will never give rounded numbers. Is that millions of reads? Can you please post the code you are using to make those numbers?

Also, are you certain that the strandedness is reverse and not unstranded? If you were wrong about that, it would result in half the number of expected reads. The best way to check this in my opinion is by eye with IGV:

https://www.broadinstitute.org/igv/

ADD REPLY
0
Entering edit mode


 

I use this code:

colSums(counts(dds))/1e6

 

I use the Total starnde from illumina. How can verify by eye how appear on IGV? I mean how appear the reverse stranded  library?

Thanks so much!!

 

ADD REPLY
0
Entering edit mode

I'm sorry to say, but it's rather surprising how astonishingly difficult you have made it to answer your question and provide any help.

Take a minute to reflect on the content of the original question you posted. You said that you ran colSums(assay(rld) [note you're missing a closing paren] and claimed that it provided this output:

colSums.assay.rld...1e.06
A                     0.114
B                     0.114
C                     0.115
D                     0.114
E                     0.114
F                     0.115
G                     0.114
H                     0.116
I                     0.115

And what code you actually ran to get that output. In the future, please paste in the exact code you ran, with the exact output that is confusing you, so that we can get to an answer more quickly

Anyhow -- to diagnose the strandedness of your library, you can (minimally) get IGV to color your alignments by the strand, but you can also see the orientation of the alignments by looking at which way the read is "pointing" in the browser.

The Viewing Alignments page from the IGV documentation would be a good start to learn more.

 

ADD REPLY

Login before adding your answer.

Traffic: 452 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6