Hi guys,
I got quant files from Salmon and I'd like to visualize differences between my different conditions
My experimental design looks like this (it is in the same tissu) : 3 conditions x 2 replicates
control_1
control_2
heatshock_6h_1
heatshock_6h_2
recovery_48h_1
recovery_48h_2
I'd like to get differences into my differents replicates :
control_1 vs control_2
heatshock_6h_1 vs heatshock_6h_2
recovery_48h_1 vs recovery_48h_2
and also between my conditions
control vs heatshock 6h
control vs heatshock 48h
heatshock 6h vs heatshock 48h
For the moment, I did :
library("GenomicFeatures")
txdb<-txdb<-makeTxDbFromGFF("/data/home/paillerv/prototype/Araport11_GFF3_genes_transposons.201606_filtred.gff",format="gff")
k<-keys(txdb,keytype="TXNAME")
tx2gene<-select(txdb,k,"GENEID","TXNAME")
sample_list<-read.delim("/data/home/paillerv/prototype/samples.txt")
files<-file.path("/data/home/paillerv/prototype/quant_salmon/",sample_list,"quant.sf")
At this moment, I get my quant files from Salmon corresponding to each replicates
Then :
samples<-read.table(file.path("/data/home/paillerv/prototype/samples_bis"),header=TRUE)
samples$condition<-factor(rep(c("0h","6h","24h"),each=2))
> samples$condition
[1] 0h 0h 6h 6h 24h 24h
Levels: 0h 24h 6h
ddsTxi <- DESeqDataSetFromTximport(txi, colData = samples,design = ~ condition)
dds<- DESeqDataSetFromTximport(txi, colData = samples,design = ~ condition)
> dds
class: DESeqDataSet
dim: 27655 6
metadata(1): version
assays(2): counts avgTxLength
rownames(27655): AT1G01010 AT1G01020 ... ATMG01400 ATMG01410
rowData names(0):
colnames(6): SRR2033948 SRR2033949 ... SRR2033952 SRR2033953
colData names(2): sampleName condition
res.A <- results(dds, name="condition0h") I call "A" my first comparison (intra control)
> res.A
log2 fold change (MAP): condition0h
Wald test p-value: condition0h
DataFrame with 21040 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
AT1G01010 58.43130 0.03245322 0.22607784 0.1435489 0.885856700
AT1G01020 221.92956 0.02807515 0.14966759 0.1875834 0.851203259
AT1G01030 24.26936 0.68344539 0.31908927 2.1418626 0.032204537
AT1G01040 804.06544 -0.19103578 0.09612531 -1.9873618 0.046882315
AT1G01050 1095.19908 -0.29733219 0.09184558 -3.2373053 0.001206642
... ... ... ... ... ...
ATMG01350 11.614623 -1.4765696 0.5731928 -2.5760437 9.993801e-03
ATMG01360 312.329207 -1.7510820 0.2660925 -6.5807266 4.681547e-11
ATMG01370 71.879530 -0.5261983 0.3021740 -1.7413751 8.161784e-02
ATMG01400 2.890855 -0.6842089 0.5989392 -1.1423677 2.533012e-01
ATMG01410 7.160221 -0.5517957 0.5727257 -0.9634555 3.353190e-01
padj
<numeric>
AT1G01010 0.92279159
AT1G01020 0.89826738
AT1G01030 0.06777275
AT1G01040 0.09300258
AT1G01050 0.00382125
... ...
ATMG01350 2.486983e-02
ATMG01360 5.170845e-10
ATMG01370 1.477010e-01
ATMG01400 3.668733e-01
ATMG01410 4.576201e-01
I get this but when I compare with the second comparison
res.B <- results(dds, name="condition6h")
> res.B
log2 fold change (MAP): condition6h
Wald test p-value: condition6h
DataFrame with 21040 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
AT1G01010 58.43130 -0.3397217 0.23179191 -1.465632 1.427485e-01
AT1G01020 221.92956 0.2406209 0.14972459 1.607090 1.080347e-01
AT1G01030 24.26936 -1.2053118 0.35149000 -3.429150 6.054754e-04
AT1G01040 804.06544 0.3160557 0.09575211 3.300770 9.641995e-04
AT1G01050 1095.19908 0.5863747 0.09105759 6.439603 1.197866e-10
... ... ... ... ... ...
ATMG01350 11.614623 3.240285 0.5374001 6.029557 1.644098e-09
ATMG01360 312.329207 4.135173 0.2317245 17.845210 3.149174e-71
ATMG01370 71.879530 2.515165 0.2852168 8.818433 1.160727e-18
ATMG01400 2.890855 1.637104 0.6043669 2.708791 6.752880e-03
ATMG01410 7.160221 2.248356 0.5588092 4.023477 5.734516e-05
padj
<numeric>
AT1G01010 1.882437e-01
AT1G01020 1.465916e-01
AT1G01030 1.198645e-03
AT1G01040 1.859294e-03
AT1G01050 4.515876e-10
... ...
ATMG01350 5.694126e-09
ATMG01360 1.222484e-69
ATMG01370 7.272690e-18
ATMG01400 1.152410e-02
ATMG01410 1.282464e-04
I get almost the same results between these 2 comparisons whereas I guess it should be very different (control vs stress)
Does it look good or I did wrong?
Best,
Vincent
Thanks Michael for answering me,
I get this :
> resultsNames(dds)
[1] "Intercept" "condition0h" "condition24h" "condition6h"
In addition, I can give more details about my analysis
> head(txi$counts)
SRR2033948 SRR2033949 SRR2033950 SRR2033951 SRR2033952 SRR2033953
AT1G01010 34.00000 119.00000 25 72.0000 66.00000 73.00000
AT1G01020 144.99990 407.99979 103 569.3609 174.00000 179.99994
AT1G01030 19.99999 70.99997 3 17.0000 26.99996 33.00000
AT1G01040 365.00000 1550.00000 441 1925.0000 715.00000 723.00000
AT1G01050 448.00045 1891.99510 748 2918.9996 830.49300 853.00000
AT1G01060 20.00000 98.00000 129 535.0003 49.00000 43.99999
> samples
sampleName condition
1 SRR2033948 0h
2 SRR2033949 0h
3 SRR2033950 6h
4 SRR2033951 6h
5 SRR2033952 24h
6 SRR2033953 24h
> qs <- apply(counts(dds), 2, quantile, .9)
> sf <- qs / exp(mean(log(qs)))
> sizeFactors(dds) <- sf
> sizeFactors(dds)
SRR2033948 SRR2033949 SRR2033950 SRR2033951 SRR2033952 SRR2033953
0.4801390 1.8416290 0.6328773 2.7121640 0.7915350 0.8323871
(About the sizeFactors, I guess it is normal I get these results because the size of the SRR files are very differents between them)
It looks like you are using an older version of DESeq2. Is it possible to update to the latest version of R and Bioconductor? We've added a lot of features since then, and the documentation you will find online no longer reflects your version on your machine. The current default value for betaPrior is now FALSE, and LFC shrinkage is performed by a separate function lfcShrink().
Anyway, to compare groups using either version of DESeq2, you would do this:
Indeed, I was running an older version of DESeq2 . And I've updated my R version (now, 3.5.1)
I've tried to reinstall GenomicFeatures
But I got this error message :
The downloaded source packages are in
‘/tmp/RtmpkvCDyf/downloaded_packages’
installation path not writeable, unable to update packages: DelayedArray,
maptools, XML
Warning messages:
1: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘openssl’ had non-zero exit status
2: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘httr’ had non-zero exit status
3: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘biomaRt’ had non-zero exit status
4: In install.packages(pkgs = doing, lib = lib, ...) :
installation of package ‘GenomicFeatures’ had non-zero exit status
I tried to install each packages "openssl" , "httr" , "biomaRt" separately with two options :
biocLite(...) and install.packages(...) but nothing happens
Do you think there's a problem of dependencies ?
To dive in deeper, it really helps to provide sessionInfo() as these installation problems are typically specific to your setup. In general you should provide this with any support request also.
Yes, I am sorry for that. I am new here
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)
Matrix products: default
BLAS: /opt/R/3.5.1/lib/R/lib/libRblas.so
LAPACK: /opt/R/3.5.1/lib/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=en_GB.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_GB.UTF-8 LC_COLLATE=en_GB.UTF-8
[5] LC_MONETARY=en_GB.UTF-8 LC_MESSAGES=en_GB.UTF-8
[7] LC_PAPER=en_GB.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.1 bit_1.1-14 IRanges_2.14.10
[4] parallel_3.5.1 DBI_1.0.0 memoise_1.1.0
[7] Rcpp_0.12.18 Biobase_2.40.0 bit64_0.9-7
[10] AnnotationDbi_1.42.1 RSQLite_2.1.1 S4Vectors_0.18.3
[13] blob_1.1.1 BiocGenerics_0.26.0 digest_0.6.15
[16] stats4_3.5.1