DESeq2 / How to visualize differences into control conditions
1
0
Entering edit mode
@vincentpailler-16742
Last seen 6.3 years ago

Hi guys,

I got quant files from Salmon and I'd like to visualize differences between my different conditions

My experimental design looks like this (it is in the same tissu) :     3 conditions x 2 replicates

 

control_1

control_2

heatshock_6h_1

heatshock_6h_2

recovery_48h_1

recovery_48h_2

I'd like to get differences into my differents replicates :

control_1 vs control_2  

heatshock_6h_1 vs heatshock_6h_2

recovery_48h_1 vs recovery_48h_2     

 

and also between my conditions 

control vs heatshock 6h

control vs heatshock 48h

heatshock 6h vs heatshock 48h

 

For the moment, I did :

 

library("GenomicFeatures")

txdb<-txdb<-makeTxDbFromGFF("/data/home/paillerv/prototype/Araport11_GFF3_genes_transposons.201606_filtred.gff",format="gff")

k<-keys(txdb,keytype="TXNAME")

tx2gene<-select(txdb,k,"GENEID","TXNAME")

sample_list<-read.delim("/data/home/paillerv/prototype/samples.txt")

files<-file.path("/data/home/paillerv/prototype/quant_salmon/",sample_list,"quant.sf")

 

At this moment, I get my  quant files from Salmon corresponding to each replicates

 

Then :

samples<-read.table(file.path("/data/home/paillerv/prototype/samples_bis"),header=TRUE)

samples$condition<-factor(rep(c("0h","6h","24h"),each=2))

> samples$condition
[1] 0h  0h  6h  6h  24h 24h
Levels: 0h 24h 6h

ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples,design = ~ condition)

dds<- DESeqDataSetFromTximport(txi, colData = samples,design = ~ condition)

> dds
class: DESeqDataSet
dim: 27655 6
metadata(1): version
assays(2): counts avgTxLength
rownames(27655): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
rowData names(0):
colnames(6): SRR2033948 SRR2033949 ... SRR2033952 SRR2033953
colData names(2): sampleName condition

res.A <- results(dds, name="condition0h")     I call "A" my first comparison (intra control)

> res.A
log2 fold change (MAP): condition0h
Wald test p-value: condition0h
DataFrame with 21040 rows and 6 columns
            baseMean log2FoldChange      lfcSE       stat       pvalue
           <numeric>      <numeric>  <numeric>  <numeric>    <numeric>
AT1G01010   58.43130     0.03245322 0.22607784  0.1435489  0.885856700
AT1G01020  221.92956     0.02807515 0.14966759  0.1875834  0.851203259
AT1G01030   24.26936     0.68344539 0.31908927  2.1418626  0.032204537
AT1G01040  804.06544    -0.19103578 0.09612531 -1.9873618  0.046882315
AT1G01050 1095.19908    -0.29733219 0.09184558 -3.2373053  0.001206642
...              ...            ...        ...        ...          ...
ATMG01350  11.614623     -1.4765696  0.5731928 -2.5760437 9.993801e-03
ATMG01360 312.329207     -1.7510820  0.2660925 -6.5807266 4.681547e-11
ATMG01370  71.879530     -0.5261983  0.3021740 -1.7413751 8.161784e-02
ATMG01400   2.890855     -0.6842089  0.5989392 -1.1423677 2.533012e-01
ATMG01410   7.160221     -0.5517957  0.5727257 -0.9634555 3.353190e-01
                  padj
             <numeric>
AT1G01010   0.92279159
AT1G01020   0.89826738
AT1G01030   0.06777275
AT1G01040   0.09300258
AT1G01050   0.00382125
...                ...
ATMG01350 2.486983e-02
ATMG01360 5.170845e-10
ATMG01370 1.477010e-01
ATMG01400 3.668733e-01
ATMG01410 4.576201e-01

 

I get this but when I compare with the second comparison

res.B <- results(dds, name="condition6h")

> res.B
log2 fold change (MAP): condition6h
Wald test p-value: condition6h
DataFrame with 21040 rows and 6 columns
            baseMean log2FoldChange      lfcSE      stat       pvalue
           <numeric>      <numeric>  <numeric> <numeric>    <numeric>
AT1G01010   58.43130     -0.3397217 0.23179191 -1.465632 1.427485e-01
AT1G01020  221.92956      0.2406209 0.14972459  1.607090 1.080347e-01
AT1G01030   24.26936     -1.2053118 0.35149000 -3.429150 6.054754e-04
AT1G01040  804.06544      0.3160557 0.09575211  3.300770 9.641995e-04
AT1G01050 1095.19908      0.5863747 0.09105759  6.439603 1.197866e-10
...              ...            ...        ...       ...          ...
ATMG01350  11.614623       3.240285  0.5374001  6.029557 1.644098e-09
ATMG01360 312.329207       4.135173  0.2317245 17.845210 3.149174e-71
ATMG01370  71.879530       2.515165  0.2852168  8.818433 1.160727e-18
ATMG01400   2.890855       1.637104  0.6043669  2.708791 6.752880e-03
ATMG01410   7.160221       2.248356  0.5588092  4.023477 5.734516e-05
                  padj
             <numeric>
AT1G01010 1.882437e-01
AT1G01020 1.465916e-01
AT1G01030 1.198645e-03
AT1G01040 1.859294e-03
AT1G01050 4.515876e-10
...                ...
ATMG01350 5.694126e-09
ATMG01360 1.222484e-69
ATMG01370 7.272690e-18
ATMG01400 1.152410e-02
ATMG01410 1.282464e-04

 

 

I get almost the same results between these 2 comparisons whereas I guess it should be very different (control vs stress)

Does it look good or I did wrong?

 

Best,

Vincent

 

deseq2 • 830 views
ADD COMMENT
0
Entering edit mode
@mikelove
Last seen 1 day ago
United States

What do you get with resultsNames(dds)?

I'm confused because with that design, the resultsNames(dds) should have names like condition_48_vs_0, condition_6_vs_0. Is there some additional code that is missing?

Note that you cannot compare between replicates with DESeq2, you can only make comparisons when there is replication. Comparing control 2 vs control 1 has no replication.

ADD COMMENT
0
Entering edit mode

Thanks Michael for answering me,

 

I get this :

 

> resultsNames(dds)
[1] "Intercept"    "condition0h"  "condition24h" "condition6h"

 

In addition, I can give more details about my analysis

 

> head(txi$counts)
                  SRR2033948 SRR2033949 SRR2033950 SRR2033951 SRR2033952 SRR2033953
AT1G01010   34.00000  119.00000         25    72.0000   66.00000   73.00000
AT1G01020  144.99990  407.99979        103   569.3609  174.00000  179.99994
AT1G01030   19.99999   70.99997          3    17.0000   26.99996   33.00000
AT1G01040  365.00000 1550.00000        441  1925.0000  715.00000  723.00000
AT1G01050  448.00045 1891.99510        748  2918.9996  830.49300  853.00000
AT1G01060   20.00000   98.00000        129   535.0003   49.00000   43.99999

 

> samples
  sampleName condition
1 SRR2033948        0h
2 SRR2033949        0h
3 SRR2033950        6h
4 SRR2033951        6h
5 SRR2033952       24h
6 SRR2033953       24h

 

 

> qs <- apply(counts(dds), 2, quantile, .9)
> sf <- qs / exp(mean(log(qs)))
> sizeFactors(dds) <- sf
> sizeFactors(dds)
SRR2033948 SRR2033949 SRR2033950 SRR2033951 SRR2033952 SRR2033953
 0.4801390  1.8416290  0.6328773  2.7121640  0.7915350  0.8323871

 

(About the sizeFactors, I guess it is normal I get these results because the size of the SRR files are very differents between them)

 

 

ADD REPLY
0
Entering edit mode

It looks like you are using an older version of DESeq2. Is it possible to update to the latest version of R and Bioconductor? We've added a lot of features since then, and the documentation you will find online no longer reflects your version on your machine. The current default value for betaPrior is now FALSE, and LFC shrinkage is performed by a separate function lfcShrink().

Anyway, to compare groups using either version of DESeq2, you would do this:

results(dds, contrast=c("condition","24h","0h"))
ADD REPLY
0
Entering edit mode

Indeed, I was running an older version of DESeq2 . And I've updated my R version (now, 3.5.1)

I've tried to reinstall GenomicFeatures

source("https://bioconductor.org/biocLite.R")
biocLite("GenomicFeatures")

But I got this error message :

The downloaded source packages are in
    ‘/tmp/RtmpkvCDyf/downloaded_packages’
installation path not writeable, unable to update packages: DelayedArray,
  maptools, XML
Warning messages:
1: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘openssl’ had non-zero exit status
2: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘httr’ had non-zero exit status
3: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘biomaRt’ had non-zero exit status
4: In install.packages(pkgs = doing, lib = lib, ...) :
  installation of package ‘GenomicFeatures’ had non-zero exit status

I tried to install each packages "openssl" , "httr" , "biomaRt" separately with two options :

 biocLite(...) and install.packages(...) but nothing happens

 

Do you think there's a problem of dependencies ?

ADD REPLY
0
Entering edit mode

To dive in deeper, it really helps to provide sessionInfo() as these installation problems are typically specific to your setup. In general you should provide this with any support request also.

ADD REPLY
0
Entering edit mode

Yes, I am sorry for that. I am new here

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS: /opt/R/3.5.1/lib/R/lib/libRblas.so
LAPACK: /opt/R/3.5.1/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_GB.UTF-8    
 [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_GB.UTF-8   
 [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_3.5.1       bit_1.1-14           IRanges_2.14.10     
 [4] parallel_3.5.1       DBI_1.0.0            memoise_1.1.0       
 [7] Rcpp_0.12.18         Biobase_2.40.0       bit64_0.9-7         
[10] AnnotationDbi_1.42.1 RSQLite_2.1.1        S4Vectors_0.18.3    
[13] blob_1.1.1           BiocGenerics_0.26.0  digest_0.6.15       
[16] stats4_3.5.1        

 

ADD REPLY

Login before adding your answer.

Traffic: 464 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6