DESeq2: Analyzing just two normal+tumour pairs
1
0
Entering edit mode
mmokrejs • 0
@mmokrejs-16289
Last seen 2.8 years ago

Hi,

I have RNAseq from two cancer+normal cases. I used salmon and proceeded with DESeq2. I do get the ./results/ directory out but the PDF figures are broken. The error message for example in boxplot.ENSG00000077454.15.pdf states: "Error using packet 1 arguments must have same length". The X-axis is labeled "factor", Y-axis is labeled "ENSG.... normalized counts".

$cat samples.txt pop center assay sample type patient experiment topic quant_file 1 lab KAPA_stranded_RNAseq PleioAdenVz1 cancer patient1 PleiomorphicAdenoma1_S1 patient1_cancer ./PleioAdenVz1/PleiomorphicAdenoma1_S1.gencode_v28.salmon_quant/quant.sf 2 lab KAPA_stranded_RNAseq PleioAdenVz2 normal patient1 PleiomorphicAdenoma2_S2 patient1_normal ./PleioAdenVz2/PleiomorphicAdenoma2_S2.gencode_v28.salmon_quant/quant.sf 3 lab KAPA_stranded_RNAseq PleioAdenVz9 cancer patient2 PleiomorphicAdenoma9_S3 patient2_cancer ./PleioAdenVz9/PleiomorphicAdenoma9_S3.gencode_v28.salmon_quant/quant.sf 4 lab KAPA_stranded_RNAseq PleioAdenVz10 normal patient2 PleiomorphicAdenoma10_S4 patient2_normal ./PleioAdenVz10/PleiomorphicAdenoma10_S4.gencode_v28.salmon_quant/quant.sf$

I did something like the following:

$for d in PleioAdenVz1 PleioAdenVz2 PleioAdenVz9 PleioAdenVz10; do pushd$d

salmon quant --numThreads $threads --validateMappings --writeUnmappedNames -i ftp.ensembl.org/pub/release-92/fasta/homo_sapiens/cdna/Homo_sapiens.GRCh38.cdna.all -l ISF -1 <(gunzip -c *.fwd_reads.gz) -2 <(gunzip -c *.rev_reads.gz) -o ./${d}.salmon_quant

popd

done

$I see I should have used --gencode argument because later tximport broke on the extra bars delimiting gene aliases: [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...] # if somebody forgot to run salmon with --gencode here is a workaround txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE) I think I am still doing the right thing so far. library(tximportData) library(GenomicFeatures) txdb <- makeTxDbFromGFF(file="ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_28/gencode.v28.annotation.gff3.gz") txdb saveDb(x=txdb, file = "gencode.v28.annotation.TxDb") k <- keys(txdb, keytype = "TXNAME") tx2gene <- select(txdb, k, "GENEID", "TXNAME") head(k) head(tx2gene) dim(tx2gene) length(k) dir <- '.' samples <- read.table(file.path(dir, "samples.txt"), header = TRUE) samples files <- file.path(samples$quant_file)
names(files) <- samples$run all(file.exists(files)) files library(tximport) # work around # [ENST00000456328.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000362751.1|RP11-34P13.1-002|DDX11L1|1657|processed_transcript|, ENST00000450305.2|ENSG00000223972.5|OTTHUMG00000000961.2|OTTHUMT00000002844.2|RP11-34P13.1-001|DDX11L1|632|transcribed_unprocessed_pseudogene|, ENST00000488147.1|ENSG00000227232.5|OTTHUMG00000000958.1|OTTHUMT00000002839.1|RP11-34P13.2-001|WASH7P|1351|unprocessed_pseudogene|, ...] # if somebody forgot to run salmon with --gencode txi <- tximport(files, type = "salmon", tx2gene = tx2gene, ignoreAfterBar = TRUE) names(txi) library(DESeq2) dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient) library(RColorBrewer) dds <- DESeq(dds) resultsNames(dds) # now the shaky part # dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient) #dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type) dds$type <- relevel(dds$type, ref="normal") dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

library(ReportingTools)
desReport <- HTMLReport(shortName = "RNAseq_analysis_with_DESeq",
title = "Results of differential expression analysis using DESeq2 (cancer vs. normal)",
reportDirectory = "./reports")
publish(res, desReport, DataSet = dds, n = 50, pvalueCutoff = 0.01, lfc = 1,
make.plots = TRUE, factor = dds$dex) finish(desReport) As you can see I tried several DESeqDataSetFromTximport alternatives but I always ended up with no result. Well, the page ./reports/RNAseq_analysis_with_DESeq.html does show p-values but can I trust that if elsewhere is silently broke? I haven't seen an error from DESeq2 in my R session.  ENSG00000077454.15 mini.ENSG00000077454.15.png -7.08 1.23e-38 4.09e-35 ENSG00000114098.17 mini.ENSG00000114098.17.png -3.58 8.59e-13 5.72e-10 ENSG00000119321.8 mini.ENSG00000119321.8.png -5.37 1.56e-47 1.3e-43 Thank you for your help. deseq2 reporting tools • 389 views ADD COMMENT 0 Entering edit mode > sessionInfo() R version 3.5.1 (2018-07-02) Platform: x86_64-pc-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS: /apps/gentoo/usr/lib64/librefblas.so.3.7.0 LAPACK: /apps/gentoo/usr/lib64/libreflapack.so.3.7.0 locale: [1] C attached base packages: [1] stats4 parallel stats graphics grDevices utils datasets [8] methods base other attached packages: [1] RColorBrewer_1.1-2 pheatmap_1.0.10 [3] ReportingTools_2.22.0 knitr_1.20 [5] DESeq2_1.22.1 SummarizedExperiment_1.12.0 [7] DelayedArray_0.8.0 BiocParallel_1.16.1 [9] matrixStats_0.54.0 tximport_1.10.0 [11] readr_1.2.1 GenomicFeatures_1.34.1 [13] AnnotationDbi_1.44.0 Biobase_2.42.0 [15] GenomicRanges_1.34.0 GenomeInfoDb_1.18.1 [17] IRanges_2.16.0 S4Vectors_0.20.1 [19] BiocGenerics_0.28.0 tximportData_1.10.0 loaded via a namespace (and not attached): [1] colorspace_1.3-2 hwriter_1.3.2 biovizBase_1.30.0 [4] htmlTable_1.12 XVector_0.22.0 base64enc_0.1-3 [7] dichromat_2.0-0 rstudioapi_0.8 bit64_0.9-7 [10] R.methodsS3_1.7.1 splines_3.5.1 ggbio_1.30.0 [13] geneplotter_1.60.0 Formula_1.2-3 Rsamtools_1.34.0 [16] annotate_1.60.0 cluster_2.0.7-1 GO.db_3.7.0 [19] R.oo_1.22.0 graph_1.60.0 BiocManager_1.30.4 [22] compiler_3.5.1 httr_1.3.1 GOstats_2.48.0 [25] backports_1.1.2 assertthat_0.2.0 Matrix_1.2-15 [28] lazyeval_0.2.1 limma_3.38.2 acepack_1.4.1 [31] htmltools_0.3.6 prettyunits_1.0.2 tools_3.5.1 [34] bindrcpp_0.2.2 gtable_0.2.0 glue_1.3.0 [37] GenomeInfoDbData_1.2.0 Category_2.48.0 reshape2_1.4.3 [40] dplyr_0.7.8 Rcpp_1.0.0 Biostrings_2.50.1 [43] rtracklayer_1.42.1 stringr_1.3.1 ensembldb_2.6.3 [46] XML_3.98-1.16 edgeR_3.24.0 zlibbioc_1.28.0 [49] scales_1.0.0 BSgenome_1.50.0 VariantAnnotation_1.28.2 [52] hms_0.4.2 ProtGenerics_1.14.0 RBGL_1.58.1 [55] AnnotationFilter_1.6.0 curl_3.2 memoise_1.1.0 [58] gridExtra_2.3 ggplot2_3.1.0 biomaRt_2.38.0 [61] rpart_4.1-13 reshape_0.8.8 latticeExtra_0.6-28 [64] stringi_1.2.4 RSQLite_2.1.1 genefilter_1.64.0 [67] checkmate_1.8.5 rlang_0.3.0.1 pkgconfig_2.0.2 [70] bitops_1.0-6 lattice_0.20-38 purrr_0.2.5 [73] bindr_0.1.1 GenomicAlignments_1.18.0 htmlwidgets_1.3 [76] bit_1.1-14 tidyselect_0.2.5 GSEABase_1.44.0 [79] AnnotationForge_1.24.0 GGally_1.4.0 plyr_1.8.4 [82] magrittr_1.5 R6_2.3.0 Hmisc_4.1-1 [85] DBI_1.0.0 pillar_1.3.0 foreign_0.8-71 [88] survival_2.43-3 RCurl_1.95-4.11 nnet_7.3-12 [91] tibble_1.4.2 crayon_1.3.4 OrganismDbi_1.24.0 [94] PFAM.db_3.7.0 progress_1.2.0 locfit_1.5-9.1 [97] grid_3.5.1 data.table_1.11.8 blob_1.1.1 [100] Rgraphviz_2.26.0 digest_0.6.18 xtable_1.8-3 [103] R.utils_2.7.0 munsell_0.5.0 >  ADD REPLY 0 Entering edit mode @mikelove Last seen 1 day ago United States The error is coming from ReportingTools, not DESeq2, which has a different maintainer. I’d recommend to make a post that includes just the ReportingTools code and the error and session info. You should tag that package instead of DESeq2. ADD COMMENT 0 Entering edit mode Thank you, Michael, I will open a new thread then. I added the ReportingTools tag but probably too late to notify its author. Would you please comment on these attempts, line by line? I am new to this and I do not really understand the underlying differences, hence the comments on lines commented out. And a third approach would be the defaults. I added the samples.txt to give you the underlying info I have. DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient) # dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type + patient + type:patient) #dds <- DESeqDataSetFromTximport(txi, colData = samples, design = ~ type) dds$type <- relevel(dds$type, ref="normal") dds$IndividualN <- factor(c(1,1,2,2))
dds <- DESeq(dds)
#res <- results(dds,contrast=list("type","cancer","normal"))
res <- results(dds,contrast=c("type","cancer","normal"))

0
Entering edit mode

You have three commented out lines. The first two are entirely different designs. If you don’t understand the difference between these you should consult a statistician. This forum is for software guidance, when users know what model they want to use.

The last commented out like won’t work. See the man page for the results function for description of how the contrast argument works.