Search
Question: DESeq2: Analyzing just two normal+tumour pairs
0
gravatar for mmokrejs
12 days ago by
mmokrejs0
mmokrejs0 wrote:

Hi,

  I have RNAseq from two cancer+normal cases. I used salmon and proceeded with DESeq2. I do get the ./results/ directory out but the PDF figures are broken. The error message for example in boxplot.ENSG00000077454.15.pdf states: "Error using packet 1 arguments must have same length". The X-axis is labeled "factor", Y-axis is labeled "ENSG.... normalized counts".

$ cat samples.txt
pop    center    assay    sample    type    patient    experiment    topic    quant_file
1    lab    KAPA_stranded_RNAseq    PleioAdenVz1    cancer    patient1    PleiomorphicAdenoma1_S1    patient1_cancer    ./PleioAdenVz1/PleiomorphicAdenoma1_S1.gencode_v28.salmon_quant/quant.sf
2    lab    KAPA_stranded_RNAseq    PleioAdenVz2    normal    patient1    PleiomorphicAdenoma2_S2    patient1_normal    ./PleioAdenVz2/PleiomorphicAdenoma2_S2.gencode_v28.salmon_quant/quant.sf
3    lab    KAPA_stranded_RNAseq    PleioAdenVz9    cancer    patient2    PleiomorphicAdenoma9_S3    patient2_cancer    ./PleioAdenVz9/PleiomorphicAdenoma9_S3.gencode_v28.salmon_quant/quant.sf
4    lab    KAPA_stranded_RNAseq    PleioAdenVz10    normal    patient2    PleiomorphicAdenoma10_S4    patient2_normal    ./PleioAdenVz10/PleiomorphicAdenoma10_S4.gencode_v28.salmon_quant/quant.sf

$

 

I did something like the following:

 

$ for d in PleioAdenVz1 PleioAdenVz2 PleioAdenVz9 PleioAdenVz10; do

  pushd $d

   salmon quant --numThreads $threads --validateMappings --writeUnmappedNames -i ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all -l ISF -1 <(gunzip -c *.fwd_reads.gz) -2 <(gunzip -c *.rev_reads.gz) -o ./${d}.salmon_quant

  popd

done

$

I see I should have used --gencode argument because later tximport broke on the extra bars delimiting gene aliases:

 

[ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...]
# if somebody forgot to run salmon with --gencode here is a workaround
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)

 

I think I am still doing the right thing so far.

 

library(tximportData)
library(GenomicFeatures)

txdb <- makeTxDbFromGFF(file="ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz")
txdb
saveDb(x=txdb, file = "gencode.v28.annotation.TxDb")

k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
head(k)
head(tx2gene)
dim(tx2gene)
length(k)

dir <- '.'
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
samples

files <- file.path(samples$quant_file)
names(files) <- samples$run
all(file.exists(files))
files

library(tximport)
# work around
# [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...]
# if somebody forgot to run salmon with --gencode
txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE)
names(txi)

library(DESeq2)
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)

library(RColorBrewer)

dds <- DESeq(dds)
resultsNames(dds)

# now the shaky part

# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

library(ReportingTools)
desReport <- HTMLReport(shortName = "RNAseq_analysis_with_DESeq",
                        title = "Results of differential expression analysis using DESeq2 (cancer vs. normal)",
                        reportDirectory = "./reports")
publish(res, desReport, DataSet = dds, n = 50, pvalueCutoff = 0.01, lfc = 1,
        make.plots = TRUE, factor = dds$dex)
finish(desReport)

 

As you can see I tried several DESeqDataSetFromTximport alternatives but I always ended up with no result. Well, the page ./reports/RNAseq_analysis_with_DESeq.html does show p-values but can I trust that if elsewhere is silently broke? I haven't seen an error from DESeq2 in my R session.

 

ENSG00000077454.15 mini.ENSG00000077454.15.png -7.08 1.23e-38 4.09e-35
ENSG00000114098.17 mini.ENSG00000114098.17.png -3.58 8.59e-13 5.72e-10
ENSG00000119321.8 mini.ENSG00000119321.8.png -5.37 1.56e-47

1.30e-43

 

Thank you for your help.

ADD COMMENTlink modified 12 days ago • written 12 days ago by mmokrejs0
> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/gentoo/usr/lib64/librefblas.so.3.7.0
LAPACK: /apps/gentoo/usr/lib64/libreflapack.so.3.7.0

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2          pheatmap_1.0.10            
 [3] ReportingTools_2.22.0       knitr_1.20                 
 [5] DESeq2_1.22.1               SummarizedExperiment_1.12.0
 [7] DelayedArray_0.8.0          BiocParallel_1.16.1        
 [9] matrixStats_0.54.0          tximport_1.10.0            
[11] readr_1.2.1                 GenomicFeatures_1.34.1     
[13] AnnotationDbi_1.44.0        Biobase_2.42.0             
[15] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
[17] IRanges_2.16.0              S4Vectors_0.20.1           
[19] BiocGenerics_0.28.0         tximportData_1.10.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2         hwriter_1.3.2            biovizBase_1.30.0       
  [4] htmlTable_1.12           XVector_0.22.0           base64enc_0.1-3         
  [7] dichromat_2.0-0          rstudioapi_0.8           bit64_0.9-7             
 [10] R.methodsS3_1.7.1        splines_3.5.1            ggbio_1.30.0            
 [13] geneplotter_1.60.0       Formula_1.2-3            Rsamtools_1.34.0        
 [16] annotate_1.60.0          cluster_2.0.7-1          GO.db_3.7.0             
 [19] R.oo_1.22.0              graph_1.60.0             BiocManager_1.30.4      
 [22] compiler_3.5.1           httr_1.3.1               GOstats_2.48.0          
 [25] backports_1.1.2          assertthat_0.2.0         Matrix_1.2-15           
 [28] lazyeval_0.2.1           limma_3.38.2             acepack_1.4.1           
 [31] htmltools_0.3.6          prettyunits_1.0.2        tools_3.5.1             
 [34] bindrcpp_0.2.2           gtable_0.2.0             glue_1.3.0              
 [37] GenomeInfoDbData_1.2.0   Category_2.48.0          reshape2_1.4.3          
 [40] dplyr_0.7.8              Rcpp_1.0.0               Biostrings_2.50.1       
 [43] rtracklayer_1.42.1       stringr_1.3.1            ensembldb_2.6.3         
 [46] XML_3.98-1.16            edgeR_3.24.0             zlibbioc_1.28.0         
 [49] scales_1.0.0             BSgenome_1.50.0          VariantAnnotation_1.28.2
 [52] hms_0.4.2                ProtGenerics_1.14.0      RBGL_1.58.1             
 [55] AnnotationFilter_1.6.0   curl_3.2                 memoise_1.1.0           
 [58] gridExtra_2.3            ggplot2_3.1.0            biomaRt_2.38.0          
 [61] rpart_4.1-13             reshape_0.8.8            latticeExtra_0.6-28     
 [64] stringi_1.2.4            RSQLite_2.1.1            genefilter_1.64.0       
 [67] checkmate_1.8.5          rlang_0.3.0.1            pkgconfig_2.0.2         
 [70] bitops_1.0-6             lattice_0.20-38          purrr_0.2.5             
 [73] bindr_0.1.1              GenomicAlignments_1.18.0 htmlwidgets_1.3         
 [76] bit_1.1-14               tidyselect_0.2.5         GSEABase_1.44.0         
 [79] AnnotationForge_1.24.0   GGally_1.4.0             plyr_1.8.4              
 [82] magrittr_1.5             R6_2.3.0                 Hmisc_4.1-1             
 [85] DBI_1.0.0                pillar_1.3.0             foreign_0.8-71          
 [88] survival_2.43-3          RCurl_1.95-4.11          nnet_7.3-12             
 [91] tibble_1.4.2             crayon_1.3.4             OrganismDbi_1.24.0      
 [94] PFAM.db_3.7.0            progress_1.2.0           locfit_1.5-9.1          
 [97] grid_3.5.1               data.table_1.11.8        blob_1.1.1              
[100] Rgraphviz_2.26.0         digest_0.6.18            xtable_1.8-3            
[103] R.utils_2.7.0            munsell_0.5.0           
> 
ADD REPLYlink written 12 days ago by mmokrejs0
0
gravatar for Michael Love
12 days ago by
Michael Love20k
United States
Michael Love20k wrote:

The error is coming from ReportingTools, not DESeq2, which has a different maintainer. I’d recommend to make a post that includes just the ReportingTools code and the error and session info. You should tag that package instead of DESeq2.

ADD COMMENTlink written 12 days ago by Michael Love20k

Thank you, Michael, I will open a new thread then. I added the ReportingTools tag but probably too late to notify its author.

Would you please comment on these attempts, line by line? I am new to this and I do not really understand the underlying differences, hence the comments on lines commented out. And a third approach would be the defaults.

I added the samples.txt to give you the underlying info I have.

DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)
# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))
ADD REPLYlink written 12 days ago by mmokrejs0

You have three commented out lines. The first two are entirely different designs. If you don’t understand the difference between these you should consult a statistician. This forum is for software guidance, when users know what model they want to use. 

The last commented out like won’t work. See the man page for the results function for description of how the contrast argument works.

ADD REPLYlink written 12 days ago by Michael Love20k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 415 users visited in the last hour