limma voom; fitted lines, factors that aren't of interest, and hypothesis testing
1
0
Entering edit mode
James • 0
@James-24772
Last seen 2 days ago
United States

This question is based on hypothesis testing using fitted lines and Factors that are not of interest as described by Law et al 2021, page 21-29.

The questions are, (1) "What genes have a similar slope across all 3 populations in response to dose and not influenced by lane?" (2) "What genes have different slopes across all 3 populations in response to dose and not influenced by lane?"

> library(limma)
> class (Data$samples$Pop)
[1] "factor"
> class (Data$samples$lane)
[1] "factor"
> class (Data$samples$dose)
[1] "numeric"

> Design <- model.matrix(~ 0 + Pop + Pop:dose + lane , data = Data$samples)
> colnames (MHEE.design) 
[1] "Pop1"  "Pop2" "Pop3" "lane2" "lane3" "lane4" "lane5" "Pop1:dose" "Pop2:dose" "Pop3:dose"

> colnames(Design) <- gsub ("Pop1", "Pop1.Intercept", colnames(Design))
> colnames(Design) <- gsub ("Pop2", "Pop2.Intercept", colnames(Design))
> colnames(Design) <- gsub ("Pop3", "Pop3.Intercept", colnames(Design))
> colnames(Design) <- gsub ("Pop1:dose", "Pop1.Slope", colnames(Design))
> colnames(Design) <- gsub ("Pop2:dose", "Pop2.Slope", colnames(Design))
> colnames(Design) <- gsub ("Pop3:dose", "Pop3.Slope", colnames(Design))
> colnames(Design)

[1] "Pop1.Intercept" "Pop2.Intercept" "Pop3.Intercept" "lane2" "lane3" "lane4" "lane5" "Pop1.Slope" "Pop2.Slope" "Pop3.Slope"

> v <- voom(Data, Design)
> vfit <- lmFit(v, Design)
#"vfit <- contrasts.fit(vfit, contrasts=my.contrasts)"  # if needed
> efit <- eBayes(vfit)

DE genes in columns "Pop1.Slope" "Pop2.Slope" "Pop3.Slope" will be only in reference to "lane 1" so the output will need to be evaluated with topTable() or contrasts before it can be interpreted. However, I haven not been able to find examples where 3 factors have been compared (not contrasted) in one limma voom program, especially in fitted line examples with factors that aren't of interest.

To answer question (1), what genes have the same slope across all 3 populations I have tried topTable(). However, these results clearly show us an average slope between the 3 factors. I do not believe that this answer the question.

topTable(efit, coef = c(8:10), number = Inf, adjust.method = "BH")

Another option that used in the limma handbook would be to look at the overlap in the ven diagram and look at DE genes for all 3 populations, however this approach does not block by all lanes and so I don't it can't be used before the lane effects have been accounted for.

vennDiagram(dt[,8:10], circle.col=c("red", "orange", "turquoise"))
de.common <- which(dt[,8]!=0 & dt[,9]!=0 & dt[,10]!=0)

Another approach that is commonly recommended is making contrasts. However contrasts look for differences between 2 factors. I could see how this could be used help answer question (2), genes where populations respond differently by slope, however the model doesn't answer the primary question (1).

And as a side question, if I wished to look for relationships that were logarithmic would this be the correct code for the model.matrix? It looks correct, but I still feel the need to ask.

> Design <- model.matrix(~ 0 + Pop + Pop:log(dose, 2) + lane , data = Data$samples)

I appreciate any help that can be offered.

> sessionInfo( )
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] gplots_3.1.1       ggplot2_3.3.5      RColorBrewer_1.1-2 Glimma_2.2.0       pheatmap_1.0.12    edgeR_3.34.0       limma_3.48.3      

loaded via a namespace (and not attached):
 [1] MatrixGenerics_1.4.2        Biobase_2.52.0              httr_1.4.2                  jsonlite_1.7.2              bit64_4.0.5                
 [6] splines_4.1.0               gtools_3.9.2                assertthat_0.2.1            BiocManager_1.30.16         stats4_4.1.0               
[11] blob_1.2.2                  GenomeInfoDbData_1.2.6      pillar_1.6.2                RSQLite_2.2.8               lattice_0.20-44            
[16] glue_1.4.2                  digest_0.6.27               GenomicRanges_1.44.0        XVector_0.32.0              colorspace_2.0-2           
[21] htmltools_0.5.1.1           Matrix_1.3-3                DESeq2_1.32.0               XML_3.99-0.7                pkgconfig_2.0.3            
[26] genefilter_1.74.0           zlibbioc_1.38.0             purrr_0.3.4                 xtable_1.8-4                scales_1.1.1               
[31] BiocParallel_1.26.2         tibble_3.1.3                annotate_1.70.0             KEGGREST_1.32.0             generics_0.1.0             
[36] IRanges_2.26.0              ellipsis_0.3.2              withr_2.4.2                 cachem_1.0.6                SummarizedExperiment_1.22.0
[41] BiocGenerics_0.38.0         survival_3.2-11             magrittr_2.0.1              crayon_1.4.1                memoise_2.0.0              
[46] fansi_0.5.0                 tools_4.1.0                 lifecycle_1.0.0             matrixStats_0.60.1          S4Vectors_0.30.0           
[51] munsell_0.5.0               locfit_1.5-9.4              DelayedArray_0.18.0         AnnotationDbi_1.54.1        Biostrings_2.60.2          
[56] compiler_4.1.0              GenomeInfoDb_1.28.1         caTools_1.18.2              rlang_0.4.11                grid_4.1.0                 
[61] RCurl_1.98-1.4              rstudioapi_0.13             htmlwidgets_1.5.3           bitops_1.0-7                gtable_0.3.0               
[66] DBI_1.1.1                   R6_2.5.1                    dplyr_1.0.7                 fastmap_1.1.0               bit_4.0.4                  
[71] utf8_1.2.2                  KernSmooth_2.23-20          parallel_4.1.0              Rcpp_1.0.7                  vctrs_0.3.8                
[76] geneplotter_1.70.0          png_0.1-7                   tidyselect_1.1.1
topTable voom contrasts • 193 views
ADD COMMENT
0
Entering edit mode
@gordon-smyth
Last seen 10 minutes ago
WEHI, Melbourne, Australia

The article you are reading is probably this one:

Law CW, Zeglinski K, Dong X, Alhamdoosh M, Smyth GK, Ritchie ME (2020). A guide to creating design matrices for gene expression experiments. F1000Research 9, 1444.

To compare three or more populations, just take contrasts between them -- see for example Section 9.3 of the limma User's Guide. To compare three slopes you need two contrasts. In your case

Design <- model.matrix(~ 0 + Pop + Pop:dose + lane , data = Data$samples)
Pop2SlopevsPop1Slope <- c(0,0,0,0,0,0,0,-1,1,0)
Pop3SlopevsPop1Slope <- c(0,0,0,0,0,0,0,-1,0,1)
ContMatrix <- cbind(Pop2SlopevsPop1Slope, Pop3SlopevsPop1Slope)

When you run topTable, you will get F-tests on 2 df for differences between the slopes. Genes with different slopes will have small p-values. Genes with similar slopes will have large p-values. Any two contrasts between the slopes will give the same F-tests.

Please be careful about renaming the columns of the design matrix. Your gsub code contains errors and will confuse the results.

ADD COMMENT
0
Entering edit mode

Thank you for your help Gordon,

Yes, that is the paper. I have found it to be another great source of information.

The information you just posted certainly helps with the second part of the second question of the study; (2) what genes have different slopes between pops when considering lane effects.

If you are willing, we are still interested in help on how to setup the model for first question (1) what genes have the same slope when lane effects are considered.

If 2 models were used, one model to answer each question, then can the design below be used to answer the first question?

> Design <- model.matrix(~ 0 + log(dose, 2) + Pop + lane , data = Data$samples) 
> colnames(Design)

[1] "dose" "Pop1" "Pop2" "Pop3" "lane2" "lane3" "lane4" "lane5"

> v <- voom(Data, Design)
> vfit <- lmFit(v, Design)
> efit <- eBayes(vfit)
> dt <- decideTests(efit)
> vennDiagram(dt[,1:4], circle.col=c("red", "orange", "turquoise", "blue"))
> de.common <- which(dt[,1]!=0 & dt[,2]!=0 & dt[,3]!=0 & dt[,4]!=0)
> results <- row.names (dt [de.common,])

I'l interpret this as all genes that have a significant slope (not equal to 0) for all 3 populations, but it doesn't let me know if the slopes can be considered similar.

Thanks for the warning about gsub() code. I used gsub() in this example for clarity on interpreting the results. Does your warning mean that I am not naming the columns correctly or were you noting the typo in the example's output (which I just edited/removed).

ADD REPLY

Login before adding your answer.

Traffic: 396 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6