Question: DESeq2 filter out my expression gene
0
3.5 years ago by
Italy
jarod_v6@libero.it40 wrote:

Hi!

I have  a set of experiments where I try to detect the differential expression genes between the over-expression of a particular  fusion genes. One  of the gene of the chimera are differential express the other no. This is strange because we have some evidence on the wet lab,

Another think is I found  only few differential expression genes. This is strange.

out of 15361 with nonzero total read count
LFC > 0 (up)     : 21, 0.14%
LFC < 0 (down)   : 3, 0.02%
outliers [1]     : 14, 0.091%
low counts [2]   : 0, 0%
(mean count < 3)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results

in some comparison I obtain only this:

out of 15530 with nonzero total read count
LFC > 0 (up)     : 2, 0.013%
LFC < 0 (down)   : 0, 0%
outliers [1]     : 100, 0.64%
low counts [2]   : 0, 0%
(mean count < 2)
[1] see 'cooksCutoff' argument of ?results
[2] see 'independentFiltering' argument of ?results



I know one of the gene of the chimera are probably filter out as outliers for this reason:

 ENSG ...   1991.4851208834 2.29299 0.334158 6.86199 NA NA

Even if I use  this comand:

dds <- DESeq(dds,minReplicatesForReplace=Inf)

I obtain the same problem.

What Can I do?

R version 3.2.3 (2015-12-10)
Platform: x86_64-redhat-linux-gnu (64-bit)
Running under: CentOS release 6.7 (Final)

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] edgeR_3.12.0               pheatmap_1.0.8             genefilter_1.52.1          ggplot2_2.0.0
[5] limma_3.26.7               RColorBrewer_1.1-2         gplots_2.17.0              org.Hs.eg.db_3.2.3
[9] RSQLite_1.0.0              DBI_0.3.1                  annotate_1.48.0            XML_3.98-1.3
[17] SummarizedExperiment_1.0.2 Biobase_2.30.0             GenomicRanges_1.22.4       GenomeInfoDb_1.6.3
[21] IRanges_2.4.6              S4Vectors_0.8.11           BiocGenerics_0.16.1        biomaRt_2.26.1

loaded via a namespace (and not attached):
[1] statmod_1.4.24       gtools_3.5.0         locfit_1.5-9.1       splines_3.2.3        lattice_0.20-33
[6] colorspace_1.2-6     yaml_2.1.13          survival_2.38-3      foreign_0.8-66       BiocParallel_1.4.3
[11] lambda.r_1.1.7       plyr_1.8.3           zlibbioc_1.16.0      munsell_0.4.2        gtable_0.1.2
[16] futile.logger_1.4.1  caTools_1.17.1       labeling_0.3         latticeExtra_0.6-26  geneplotter_1.48.0
[21] acepack_1.3-3.3      KernSmooth_2.23-15   xtable_1.8-0         scales_0.3.0         gdata_2.17.0
[26] Hmisc_3.17-1         XVector_0.10.0       gridExtra_2.0.0      digest_0.6.9         grid_3.2.3
[31] tools_3.2.3          bitops_1.0-6         RCurl_1.95-4.7       Formula_1.2-1        cluster_2.0.3
[36] futile.options_1.0.0 rpart_4.1-10         nnet_7.3-12   
deseq2 • 1.6k views
modified 3.5 years ago by Michael Love25k • written 3.5 years ago by jarod_v6@libero.it40
Answer: DESeq2 filter out my expression gene
1
3.5 years ago by
Michael Love25k
United States
Michael Love25k wrote:
Can you show an MA plot as well? I would suggest trying betaPrior=FALSE here as the prior assumption may not hold.
1
Everything looks fine in your MA plot. Most genes in your experiment have small fold change. As your sample size increases you could detect more differences as statistically significant, but with 3 vs 6 you only detect the very strongly induced genes, of which there are few.

Here you have   plotMA. (http://imgur.com/a/d2KwN)

I use also collectRNaseq metrics of picards tools. Do not seem to have strange situation.

What do you suggest?

1
It helps to widen the ylim to see what's going on. You should be fine with betaPrior=FALSE and cooksCutoff=FALSE.

using this  condition I obtain the  genes as differential expression but I have however only 21 up and 3 down.

thanks so much!

Answer: DESeq2 filter out my expression gene
0
3.5 years ago by
Denali
Steve Lianoglou12k wrote:

First thing is to see if you can convince yourself that the data tells you that your gene of interest is differentially expressed.

If you use the plotCounts function from DESeq2 to plot the expression of your gene of interest, does the data support your intuition?

Have you looked PCA or MDS plots of your data to how strong your treatment effect is? From the results you show, it seems like the effect is rather minimal.

How many replicates do you have? When effect sizes are minimal, higher replication can give you more power to detect differential expression.

You say you have "some evidence in the wet lab". Presumably you've done some qPCR on a panel of genes to look at the effect of over-expression of your fusion gene on a handful of transcripts? Does your RNA-seq show similar changes in expression as the evidence you are referring to across the same genes?

Answer: DESeq2 filter out my expression gene
0
3.5 years ago by
EMBL European Molecular Biology Laboratory
Wolfgang Huber13k wrote:

I suggest doing quality assessment / control to see whether all of the libraries are good. Garbage in → garbage out.

The DESeq2 vignette is one place to find suggestions for quality related plots, but also consider statistics such as number of reads, number of uniquely aligned reads, etc.

Answer: DESeq2 filter out my expression gene
0
3.5 years ago by
Italy
jarod_v6@libero.it40 wrote:

Thanks so much for the suggestions:

So I have 3 replicated for condition and  6 controls. In the ctrl I have no expression of my fusion gene or over-expression of  GFP. The chimera are composed by both gene. One is differentialy expressed the other are not reported. Programs for the detection of fusion like FusionCatcher   are able to detect the fusion

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered http://imgur.com/tOYJx9Y :

Here you can found a pca and show what do you suggest but I have 3 replicates. http://imgur.com/wl9rGVK

All the samples have more that 27 ML of unique reads aligned. I did one set of experiments using 40ML of reads but no increase power was detected.

Any suggestion?

I'm sorry, but I don't understand: when you say this:

For the evidence of wet lab I have qpcr and western blot of the expression of this chimera and this genes. Here you have a plot counts show the value of that gene are not considered

What do you mean by "the value of that gene are not considered"?

The filtering of p-values occurs in results(), so if you want to turn off filtering here (if you think it is not working on this dataset) you should use:

results(dds, cooksCutoff=FALSE)

I have another question because not in all samples work. I mean some samples I have only 2 differential expression gene and are my genes I have use  in my experiments.

In other case this number of replicates works and Now I want to understand when is important to use cooksCutoff=False and  betaPrior=FALSE?

If I use only coding sequencing genes  can help to have more differential genes?

Detecting differences depends on the size of the difference, the amount of variation with group, and sample size. 3 is basically the minimum recommended sample size for a condition so it's expected that your only detect a few of the largest differences. Other experiments may have bigger differences, less variation, or more samples, and so find more differences. Here I recommended turning off beta prior because you have two very very large fold changes with almost all other genes having small fold changes. The beta prior doesn't work well in this one situation. I recommended turning off filtering based on a Cooks distance cutoff because it seemed to be too aggressive in this case. I can't give a general suggestion when to turn it off though, just to say you should inspect the genes which are filtered.

thanks!!!

And for the use of only coding genes?

I don't really have an opinion or experience on using this information.

Sorry ,I mean I have this results on the results file . I don't have any pvalue or padjust

 ENSG ...   1991.4851208834 2.29299 0.334158 6.86199 NA NA

This I obtained before introduce modification suggested by Michael. Using this Work

Things are getting a bit confused and hard to follow. Can you show me exactly the DESeq() and results() lines that produced an NA pvalue? If you use the arguments i suggested, there should not be filtering.
........................

When I use what you suggest I have the value  and not NA. Sorry for the confusion I refer  on the starting of the discussion  when I use dds<-DESeq(dds) . Now it is ok.