DESeq2 filter out my expression gene
4
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 2.6 years ago
Italy

Hi!

I have  a set of experiments where I try to detect the differential expression genes between the over-expression of a particular  fusion genes. One  of the gene of the chimera are differential express the other no. This is strange because we have some evidence on the wet lab,

Another think is I found  only few differential expression genes. This is strange.

 

out of 15361 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 21, 0.14%
LFC < 0 (down)   : 3, 0.02%
outliers [1]     : 14, 0.091%
low counts [2]   : 0, 0%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

in some comparison I obtain only this:

 

out of 15530 with nonzero total read count
adjusted p-value < 0.1
LFC > 0 (up)     : 2, 0.013%
LFC < 0 (down)   : 0, 0%
outliers [1]     : 100, 0.64%
low counts [2]   : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

 

I know one of the gene of the chimera are probably filter out as outliers for this reason:

 

ENSG ...   1991.4851208834 2.2929902325 0.3341579938 6.8619942512 NA NA

Even if I use  this comand:

 

dds <- DESeq(dds,minReplicatesForReplace=Inf)

I obtain the same problem.

 

What Can I do?

R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] edgeR_3.12.0               pheatmap_1.0.8             genefilter_1.52.1          ggplot2_2.0.0             
 [5] limma_3.26.7               RColorBrewer_1.1-2         gplots_2.17.0              org.Hs.eg.db_3.2.3        
 [9] RSQLite_1.0.0              DBI_0.3.1                  annotate_1.48.0            XML_3.98-1.3              
[13] AnnotationDbi_1.32.3       DESeq2_1.10.1              RcppArmadillo_0.6.500.4.0  Rcpp_0.12.3               
[17] SummarizedExperiment_1.0.2 Biobase_2.30.0             GenomicRanges_1.22.4       GenomeInfoDb_1.6.3        
[21] IRanges_2.4.6              S4Vectors_0.8.11           BiocGenerics_0.16.1        biomaRt_2.26.1            

loaded via a namespace (and not attached):
 [1] statmod_1.4.24       gtools_3.5.0         locfit_1.5-9.1       splines_3.2.3        lattice_0.20-33     
 [6] colorspace_1.2-6     yaml_2.1.13          survival_2.38-3      foreign_0.8-66       BiocParallel_1.4.3  
[11] lambda.r_1.1.7       plyr_1.8.3           zlibbioc_1.16.0      munsell_0.4.2        gtable_0.1.2        
[16] futile.logger_1.4.1  caTools_1.17.1       labeling_0.3         latticeExtra_0.6-26  geneplotter_1.48.0  
[21] acepack_1.3-3.3      KernSmooth_2.23-15   xtable_1.8-0         scales_0.3.0         gdata_2.17.0        
[26] Hmisc_3.17-1         XVector_0.10.0       gridExtra_2.0.0      digest_0.6.9         grid_3.2.3          
[31] tools_3.2.3          bitops_1.0-6         RCurl_1.95-4.7       Formula_1.2-1        cluster_2.0.3       
[36] futile.options_1.0.0 rpart_4.1-10         nnet_7.3-12   
deseq2 • 2.0k views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 15 hours ago
United States
Can you show an MA plot as well? I would suggest trying betaPrior=FALSE here as the prior assumption may not hold.
ADD COMMENT
1
Entering edit mode
Everything looks fine in your MA plot. Most genes in your experiment have small fold change. As your sample size increases you could detect more differences as statistically significant, but with 3 vs 6 you only detect the very strongly induced genes, of which there are few.
ADD REPLY
0
Entering edit mode

Here you have   plotMA. (http://imgur.com/a/d2KwN)

I use also collectRNaseq metrics of picards tools. Do not seem to have strange situation.

What do you suggest?

 

 

 

ADD REPLY
1
Entering edit mode
It helps to widen the ylim to see what's going on. You should be fine with betaPrior=FALSE and cooksCutoff=FALSE.
ADD REPLY
0
Entering edit mode

using this  condition I obtain the  genes as differential expression but I have however only 21 up and 3 down.

thanks so much!

 

ADD REPLY
0
Entering edit mode
@steve-lianoglou-2771
Last seen 3 days ago
Denali

First thing is to see if you can convince yourself that the data tells you that your gene of interest is differentially expressed.

If you use the plotCounts function from DESeq2 to plot the expression of your gene of interest, does the data support your intuition?

Have you looked PCA or MDS plots of your data to how strong your treatment effect is? From the results you show, it seems like the effect is rather minimal.

How many replicates do you have? When effect sizes are minimal, higher replication can give you more power to detect differential expression.

You say you have "some evidence in the wet lab". Presumably you've done some qPCR on a panel of genes to look at the effect of over-expression of your fusion gene on a handful of transcripts? Does your RNA-seq show similar changes in expression as the evidence you are referring to across the same genes?

ADD COMMENT
0
Entering edit mode
@wolfgang-huber-3550
Last seen 7 weeks ago
EMBL European Molecular Biology Laborat…

I suggest doing quality assessment / control to see whether all of the libraries are good. Garbage in → garbage out.

The DESeq2 vignette is one place to find suggestions for quality related plots, but also consider statistics such as number of reads, number of uniquely aligned reads, etc.

ADD COMMENT
0
Entering edit mode
@jarod_v6liberoit-6654
Last seen 2.6 years ago
Italy

Thanks so much for the suggestions:

So I have 3 replicated for condition and  6 controls. In the ctrl I have no expression of my fusion gene or over-expression of  GFP. The chimera are composed by both gene. One is differentialy expressed the other are not reported. Programs for the detection of fusion like FusionCatcher   are able to detect the fusion

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered http://imgur.com/tOYJx9Y :

 

 

Here you can found a pca and show what do you suggest but I have 3 replicates. http://imgur.com/wl9rGVK x

All the samples have more that 27 ML of unique reads aligned. I did one set of experiments using 40ML of reads but no increase power was detected.

Any suggestion?

 

ADD COMMENT
0
Entering edit mode

I'm sorry, but I don't understand: when you say this:

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered

What do you mean by "the value of that gene are not considered"?

ADD REPLY
0
Entering edit mode

The filtering of p-values occurs in results(), so if you want to turn off filtering here (if you think it is not working on this dataset) you should use:

results(dds, cooksCutoff=FALSE)

 

ADD REPLY
0
Entering edit mode

I have another question because not in all samples work. I mean some samples I have only 2 differential expression gene and are my genes I have use  in my experiments.

In other case this number of replicates works and Now I want to understand when is important to use cooksCutoff=False and  betaPrior=FALSE?

 

If I use only coding sequencing genes  can help to have more differential genes?

 

ADD REPLY
0
Entering edit mode
Detecting differences depends on the size of the difference, the amount of variation with group, and sample size. 3 is basically the minimum recommended sample size for a condition so it's expected that your only detect a few of the largest differences. Other experiments may have bigger differences, less variation, or more samples, and so find more differences. Here I recommended turning off beta prior because you have two very very large fold changes with almost all other genes having small fold changes. The beta prior doesn't work well in this one situation. I recommended turning off filtering based on a Cooks distance cutoff because it seemed to be too aggressive in this case. I can't give a general suggestion when to turn it off though, just to say you should inspect the genes which are filtered.
ADD REPLY
0
Entering edit mode

thanks!!!

And for the use of only coding genes?

 

ADD REPLY
0
Entering edit mode
I don't really have an opinion or experience on using this information.
ADD REPLY
0
Entering edit mode

Sorry ,I mean I have this results on the results file . I don't have any pvalue or padjust

ENSG ...   1991.4851208834 2.2929902325 0.3341579938 6.8619942512 NA NA

 

This I obtained before introduce modification suggested by Michael. Using this Work

ADD REPLY
0
Entering edit mode
Things are getting a bit confused and hard to follow. Can you show me exactly the DESeq() and results() lines that produced an NA pvalue? If you use the arguments i suggested, there should not be filtering.
ADD REPLY
0
Entering edit mode
........................
ADD REPLY
0
Entering edit mode

When I use what you suggest I have the value  and not NA. Sorry for the confusion I refer  on the starting of the discussion  when I use dds<-DESeq(dds) . Now it is ok.

 


 

ADD REPLY

Login before adding your answer.

Traffic: 438 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6