Question

DESeq2: Analyzing just two normal+tumour pairs

0

Entering edit mode

mmokrejs • 0

@mmokrejs-16289

Last seen 5.2 years ago

Hi,

I have RNAseq from two cancer+normal cases. I used salmon and proceeded with DESeq2. I do get the ./results/ directory out but the PDF figures are broken. The error message for example in boxplot.ENSG00000077454.15.pdf states: "Error using packet 1 arguments must have same length". The X-axis is labeled "factor", Y-axis is labeled "ENSG.... normalized counts".

$ cat samples.txt
pop   center   assay   sample   type   patient   experiment   topic   quant_file
1    lab    KAPA_stranded_RNAseq   PleioAdenVz1   cancer   patient1   PleiomorphicAdenoma1_S1   patient1_cancer   ./PleioAdenVz1/PleiomorphicAdenoma1_S1.gencode_v28.salmon_quant/quant.sf
2    lab    KAPA_stranded_RNAseq   PleioAdenVz2   normal   patient1   PleiomorphicAdenoma2_S2   patient1_normal   ./PleioAdenVz2/PleiomorphicAdenoma2_S2.gencode_v28.salmon_quant/quant.sf
3    lab    KAPA_stranded_RNAseq   PleioAdenVz9   cancer   patient2   PleiomorphicAdenoma9_S3   patient2_cancer   ./PleioAdenVz9/PleiomorphicAdenoma9_S3.gencode_v28.salmon_quant/quant.sf
4    lab    KAPA_stranded_RNAseq   PleioAdenVz10   normal   patient2   PleiomorphicAdenoma10_S4   patient2_normal   ./PleioAdenVz10/PleiomorphicAdenoma10_S4.gencode_v28.salmon_quant/quant.sf

$

I did something like the following:

$ for d in PleioAdenVz1 PleioAdenVz2 PleioAdenVz9 PleioAdenVz10; do

pushd $d

salmon quant --numThreads $threads --validateMappings --writeUnmappedNames -i ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all -l ISF -1 <(gunzip -c *.fwd_reads.gz) -2 <(gunzip -c *.rev_reads.gz) -o ./${d}.salmon_quant

popd

done

$

I see I should have used --gencode argument because later tximport broke on the extra bars delimiting gene aliases:

I think I am still doing the right thing so far.

library(tximportData)
library(GenomicFeatures)

txdb <- makeTxDbFromGFF(file="ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz")
txdb
saveDb(x=txdb, file = "gencode.v28.annotation.TxDb")

k <- keys(txdb, keytype = "TXNAME")
tx2gene <- select(txdb, k, "GENEID", "TXNAME")
head(k)
head(tx2gene)
dim(tx2gene)
length(k)

dir <- '.'
samples <- read.table(file.path(dir, "samples.txt"), header = TRUE)
samples

files <- file.path(samples$quant_file)
names(files) <- samples$run
all(file.exists(files))
files

library(DESeq2)
dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)

library(RColorBrewer)

dds <- DESeq(dds)
resultsNames(dds)

# now the shaky part

# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

library(ReportingTools)
desReport <- HTMLReport(shortName = "RNAseq_analysis_with_DESeq",
                        title = "Results of differential expression analysis using DESeq2 (cancer vs. normal)",
                        reportDirectory = "./reports")
publish(res, desReport, DataSet = dds, n = 50, pvalueCutoff = 0.01, lfc = 1,
        make.plots = TRUE, factor = dds$dex)
finish(desReport)

As you can see I tried several DESeqDataSetFromTximport alternatives but I always ended up with no result. Well, the page ./reports/RNAseq_analysis_with_DESeq.html does show p-values but can I trust that if elsewhere is silently broke? I haven't seen an error from DESeq2 in my R session.

ENSG00000077454.15

mini.ENSG00000077454.15.png

-7.08

1.23e-38

4.09e-35

ENSG00000114098.17

mini.ENSG00000114098.17.png

-3.58

8.59e-13

5.72e-10

ENSG00000119321.8

mini.ENSG00000119321.8.png

-5.37

1.56e-47

1.30e-43

Thank you for your help.

deseq2 reporting tools • 939 views

ADD COMMENT • link 5.4 years ago mmokrejs • 0

0

Entering edit mode

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /apps/gentoo/usr/lib64/librefblas.so.3.7.0
LAPACK: /apps/gentoo/usr/lib64/libreflapack.so.3.7.0

locale:
[1] C

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] RColorBrewer_1.1-2          pheatmap_1.0.10            
 [3] ReportingTools_2.22.0       knitr_1.20                 
 [5] DESeq2_1.22.1               SummarizedExperiment_1.12.0
 [7] DelayedArray_0.8.0          BiocParallel_1.16.1        
 [9] matrixStats_0.54.0          tximport_1.10.0            
[11] readr_1.2.1                 GenomicFeatures_1.34.1     
[13] AnnotationDbi_1.44.0        Biobase_2.42.0             
[15] GenomicRanges_1.34.0        GenomeInfoDb_1.18.1        
[17] IRanges_2.16.0              S4Vectors_0.20.1           
[19] BiocGenerics_0.28.0         tximportData_1.10.0        

loaded via a namespace (and not attached):
  [1] colorspace_1.3-2         hwriter_1.3.2            biovizBase_1.30.0       
  [4] htmlTable_1.12           XVector_0.22.0           base64enc_0.1-3         
  [7] dichromat_2.0-0          rstudioapi_0.8           bit64_0.9-7             
 [10] R.methodsS3_1.7.1        splines_3.5.1            ggbio_1.30.0            
 [13] geneplotter_1.60.0       Formula_1.2-3            Rsamtools_1.34.0        
 [16] annotate_1.60.0          cluster_2.0.7-1          GO.db_3.7.0             
 [19] R.oo_1.22.0              graph_1.60.0             BiocManager_1.30.4      
 [22] compiler_3.5.1           httr_1.3.1               GOstats_2.48.0          
 [25] backports_1.1.2          assertthat_0.2.0         Matrix_1.2-15           
 [28] lazyeval_0.2.1           limma_3.38.2             acepack_1.4.1           
 [31] htmltools_0.3.6          prettyunits_1.0.2        tools_3.5.1             
 [34] bindrcpp_0.2.2           gtable_0.2.0             glue_1.3.0              
 [37] GenomeInfoDbData_1.2.0   Category_2.48.0          reshape2_1.4.3          
 [40] dplyr_0.7.8              Rcpp_1.0.0               Biostrings_2.50.1       
 [43] rtracklayer_1.42.1       stringr_1.3.1            ensembldb_2.6.3         
 [46] XML_3.98-1.16            edgeR_3.24.0             zlibbioc_1.28.0         
 [49] scales_1.0.0             BSgenome_1.50.0          VariantAnnotation_1.28.2
 [52] hms_0.4.2                ProtGenerics_1.14.0      RBGL_1.58.1             
 [55] AnnotationFilter_1.6.0   curl_3.2                 memoise_1.1.0           
 [58] gridExtra_2.3            ggplot2_3.1.0            biomaRt_2.38.0          
 [61] rpart_4.1-13             reshape_0.8.8            latticeExtra_0.6-28     
 [64] stringi_1.2.4            RSQLite_2.1.1            genefilter_1.64.0       
 [67] checkmate_1.8.5          rlang_0.3.0.1            pkgconfig_2.0.2         
 [70] bitops_1.0-6             lattice_0.20-38          purrr_0.2.5             
 [73] bindr_0.1.1              GenomicAlignments_1.18.0 htmlwidgets_1.3         
 [76] bit_1.1-14               tidyselect_0.2.5         GSEABase_1.44.0         
 [79] AnnotationForge_1.24.0   GGally_1.4.0             plyr_1.8.4              
 [82] magrittr_1.5             R6_2.3.0                 Hmisc_4.1-1             
 [85] DBI_1.0.0                pillar_1.3.0             foreign_0.8-71          
 [88] survival_2.43-3          RCurl_1.95-4.11          nnet_7.3-12             
 [91] tibble_1.4.2             crayon_1.3.4             OrganismDbi_1.24.0      
 [94] PFAM.db_3.7.0            progress_1.2.0           locfit_1.5-9.1          
 [97] grid_3.5.1               data.table_1.11.8        blob_1.1.1              
[100] Rgraphviz_2.26.0         digest_0.6.18            xtable_1.8-3            
[103] R.utils_2.7.0            munsell_0.5.0           
>

ADD REPLY • link 5.4 years ago mmokrejs • 0

score 0 · Answer 1 · 2018-12-05

0

Entering edit mode

Michael Love 41k

@mikelove

Last seen 10 hours ago

United States

The error is coming from ReportingTools, not DESeq2, which has a different maintainer. I’d recommend to make a post that includes just the ReportingTools code and the error and session info. You should tag that package instead of DESeq2.

ADD COMMENT • link 5.4 years ago Michael Love 41k

0

Entering edit mode

Thank you, Michael, I will open a new thread then. I added the ReportingTools tag but probably too late to notify its author.

Would you please comment on these attempts, line by line? I am new to this and I do not really understand the underlying differences, hence the comments on lines commented out. And a third approach would be the defaults.

I added the samples.txt to give you the underlying info I have.

DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient)
# dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient)
#dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type)
dds$type <- relevel(dds$type, ref="normal")
dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

ADD REPLY • link 5.4 years ago mmokrejs • 0

0

Entering edit mode

You have three commented out lines. The first two are entirely different designs. If you don’t understand the difference between these you should consult a statistician. This forum is for software guidance, when users know what model they want to use.

The last commented out like won’t work. See the man page for the results function for description of how the contrast argument works.

ADD REPLY • link 5.4 years ago Michael Love 41k