Question: Deseq2 number of fragments use for differential expression
0
gravatar for aristotele_m
4.2 years ago by
aristotele_m30
Italy
aristotele_m30 wrote:

Dear All!

I have align  a total stranded RNA using STAR and count the gene level using htseq count.

All the sample on RNASeqQC have more then 60ML of total reads and more than 40 ML of uniq reads,

When I import on DESeq2 object and I try to see the number of element I found this:

colSums(assay(rld)

colSums.assay.rld...1e.06
A                     0.114
B                     0.114
C                     0.115
D                     0.114
E                     0.114
F                     0.115
G                     0.114
H                     0.116
I                     0.115

Where I can found all my other reads?

Thanks for any help!

 

 

sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i686-pc-linux-gnu (32-bit)
Running under: Ubuntu 14.04.2 LTS

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] pheatmap_1.0.7          genefilter_1.50.0       ggplot2_1.0.1           limma_3.24.14           RColorBrewer_1.1-2     
 [6] gplots_2.17.0           org.Hs.eg.db_3.1.2      RSQLite_1.0.0           DBI_0.3.1               annotate_1.46.1        
[11] XML_3.98-1.3            AnnotationDbi_1.30.1    Biobase_2.28.0          biomaRt_2.24.0          DESeq2_1.8.1           
[16] RcppArmadillo_0.5.300.4 Rcpp_0.12.0             GenomicRanges_1.20.5    GenomeInfoDb_1.4.1      IRanges_2.2.5          
[21] S4Vectors_0.6.3         BiocGenerics_0.14.0    

loaded via a namespace (and not attached):
 [1] reshape2_1.4.1       cluster_2.0.3        magrittr_1.5         acepack_1.3-3.3      gtable_0.1.2         RCurl_1.95-4.7      
 [7] XVector_0.8.0        stringr_1.0.0        lambda.r_1.1.7       splines_3.2.1        Formula_1.2-1        lattice_0.20-33     
[13] survival_2.38-3      KernSmooth_2.23-15   plyr_1.8.3           locfit_1.5-9.1       futile.options_1.0.0 gridExtra_2.0.0     
[19] digest_0.6.8         colorspace_1.2-6     futile.logger_1.4.1  proto_0.3-10         stringi_0.5-5        geneplotter_1.46.0  
[25] Hmisc_3.16-0         labeling_0.3         gdata_2.17.0         rpart_4.1-10         BiocParallel_1.2.9   xtable_1.7-4        
[31] munsell_0.4.2        MASS_7.3-43          caTools_1.17.1       gtools_3.5.0         latticeExtra_0.6-26  tools_3.2.1         
[37] foreign_0.8-65       bitops_1.0-6         nnet_7.3-10          scales_0.2.5         grid_3.2.1          
>
deseq2 start • 787 views
ADD COMMENTlink modified 4.2 years ago by Michael Love25k • written 4.2 years ago by aristotele_m30
Answer: Deseq2 number of fragments use for differential expression
0
gravatar for Steve Lianoglou
4.2 years ago by
Denali
Steve Lianoglou12k wrote:

The 1e.06 bit at the end of the column name that comes from colSums(assay(rld)) makes me suspicious.

Is rld a DESeqDataSet of read counts?

Can you provide the output of assay(rld)[1:10,], please?

 

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Steve Lianoglou12k
Answer: Deseq2 number of fragments use for differential expression
0
gravatar for Michael Love
4.2 years ago by
Michael Love25k
United States
Michael Love25k wrote:

rld is the name in the vignette for the rlog transformed values. So colSums(assay(rld)) has nothing to do with total read count. 

Try:

colSums(counts(dds))

If this is not what you expected, you should go back to the htseq-count and figure out if everything there is correct. To get useful feedback here, you will need to provide all your htseq code and your R code.

ADD COMMENTlink modified 4.2 years ago • written 4.2 years ago by Michael Love25k

Thanks so much!!!I use this code:

mkdir -p  Count  && mkdir -p tmp && ~/usr/local/bin/samtools sort -no ALIGN/120/Aligned.sortedByCoord.out.bam tmp/120 |samtools view  - | htseq-count  --mode=intersection-nonempty --stranded=reverse --type=exon --idattr=gene_id - Trasc_GH37_74/Homo_sapiens.GRCh37.74.gtf  > Count/120_counts.txt

ddHTSeq <- DESeqDataSetFromHTSeqCount(sampleTable=SampleTable,directory="Count/", design= ~1) # No design
dds <- estimateSizeFactors(ddHTSeq)
#filtro 10 conti there are less than 3 samples with counts greater than or equal to 10. Given that the inference i
idx <- rowSums( counts(dds) >= 10 ) >= 3

dds <- dds[idx,]
dds <- DESeq(dds)
rld = rlog(dds)

If I use counts() Improve the numbers but not so much:

SampleA
SampleB
sampleC
SampleD
SampleE
SampleF
SampleH
9.531
13.109
14.5
12.98
5.221
0.741
5.8
ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by aristotele_m30

colSums(counts(dds)) will never give rounded numbers. Is that millions of reads? Can you please post the code you are using to make those numbers?

Also, are you certain that the strandedness is reverse and not unstranded? If you were wrong about that, it would result in half the number of expected reads. The best way to check this in my opinion is by eye with IGV:

https://www.broadinstitute.org/igv/

ADD REPLYlink written 4.2 years ago by Michael Love25k


 

I use this code:

colSums(counts(dds))/1e6

 

I use the Total starnde from illumina. How can verify by eye how appear on IGV? I mean how appear the reverse stranded  library?

Thanks so much!!

 

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by aristotele_m30

I'm sorry to say, but it's rather surprising how astonishingly difficult you have made it to answer your question and provide any help.

Take a minute to reflect on the content of the original question you posted. You said that you ran colSums(assay(rld) [note you're missing a closing paren] and claimed that it provided this output:

colSums.assay.rld...1e.06
A                     0.114
B                     0.114
C                     0.115
D                     0.114
E                     0.114
F                     0.115
G                     0.114
H                     0.116
I                     0.115

And what code you actually ran to get that output. In the future, please paste in the exact code you ran, with the exact output that is confusing you, so that we can get to an answer more quickly

Anyhow -- to diagnose the strandedness of your library, you can (minimally) get IGV to color your alignments by the strand, but you can also see the orientation of the alignments by looking at which way the read is "pointing" in the browser.

The Viewing Alignments page from the IGV documentation would be a good start to learn more.

 

ADD REPLYlink written 4.2 years ago by Steve Lianoglou12k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 156 users visited in the last hour