Question: Genes that have NOT changed
0
7 months ago by
evocanres0 wrote:

Hi

I am trying to follow through this post as an exercise to find genes that are not differentially expressed. Few things in this post does not work [anymore]. Can somehow help ?

1

Specifically,

1. DESeqSummarizedExperimentFromMatrix. #does not work
#Instead I use the DESeqDataSetFromMatrix to load the data and I get this error.
#This is the warning I get.

Note: levels of factors in the design contain characters other than
letters, numbers, '_' and '.'. It is recommended (but not required) to use
only letters, numbers, and delimiters '_' or '.', as these are safe characters
for column names in R. [This is a message, not an warning or error]


More Importantly, the following command yeilds null.

betaSE <- mcols(rowData(dse))$SE_conditionuntreated  deseq2 • 198 views ADD COMMENTlink modified 7 months ago • written 7 months ago by evocanres0 Have you copied all code from the old post verbatim (without *any* changes), or have you adapted it to your data? If the latter, please post the *full* code. ADD REPLYlink written 7 months ago by Simon Anders3.5k I tried to run the exact code {r} > # Let's use the example data from the pasilla package > library( DESeq2 ) > library( pasilla ) > data( "pasillaGenes" ) > > # Create a DESeq2 data object from the pasilla data > dse <- DESeqSummarizedExperimentFromMatrix( + counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], + ~ type + condition ) Error in DESeqSummarizedExperimentFromMatrix(counts(pasillaGenes), pData(pasillaGenes)[, : could not find function "DESeqSummarizedExperimentFromMatrix" > > # Perform a standard DESeq2 analysis > dse <- DESeq(dse) Error in is(object, "DESeqDataSet") : object 'dse' not found > > # The log2 fold changes are found here > beta <- results(dse)$log2FoldChange Error in is(object, "DESeqDataSet") : object 'dse' not found > > # Just to make the following clearer, I should point out that the > # "results" accessor is just a short-cut for this access to the rowData: > all( beta == mcols(rowData(dse))$conditionuntreated, na.rm=TRUE ) Error in rowData(dse) : object 'dse' not found > # (returns TRUE) > > # The log fold change estimates all come with standard error information > # which we find in the rowData (maybe we should copy this to the > # 'results', too) > betaSE <- mcols(rowData(dse))$SE_conditionuntreated Error in rowData(dse) : object 'dse' not found > > # Internally, the Wald test is implemented as a simple two-sided > # z test of beta/betaSE. Two demonstrate this, we to the test > # manually and compare > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) Error in abs(beta) : non-numeric argument to mathematical function > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) Error: object 'pvalDE' not found > # (returns TRUE) > > # This was the test for DE, of course, i.e., small pvalDE means that > # the gene's expression change (the true value of beta) is not zero > > # What we want is the opposite, namely find gene, for which abs(beta) > # is smaller than some threshold, theta > theta <- .3 > > # So, we do our two one-sided tests. For a one-sided z test, we > # simply use tail probabilities from the normal distribution. > > # First, the test of H0_A: true_beta > thr > pA <- pnorm( beta, thr, betaSE, lower.tail=TRUE ) Error in pnorm(beta, thr, betaSE, lower.tail = TRUE) : object 'thr' not found > > # Next, the test of H0_B: true_beta < -thr > pB <- pnorm( beta, -thr, betaSE, lower.tail=FALSE ) Error in pnorm(beta, -thr, betaSE, lower.tail = FALSE) : object 'thr' not found > > # The overall p value is the maximum, because we want to reject H0_A > # and H0_B simultaneously > pvalTOST <- pmax( pA, pB ) Error in eval(quote(list(...)), env) : object 'pA' not found > > > # Let's adjust our two p values with BH: > sigDE <- p.adjust( pvalDE, "BH" ) < .1 Error in p.adjust(pvalDE, "BH") : object 'pvalDE' not found > sigSmall <- p.adjust( pvalTOST, "BH" ) < .1 Error in p.adjust(pvalTOST, "BH") : object 'pvalTOST' not found > > # And make an MA plot, with sigDE in red and sigSmall in green > plot( + rowMeans( counts(dse,normalized=TRUE) ), beta, + log="x", pch=20, cex=.2, + col = 1 + sigDE + 2*sigSmall ) Error in counts(dse, normalized = TRUE) : object 'dse' not found > # Plot is attached. >  So I changed few things. 1. {r} > dse <- DESeqDataSetFromMatrix( + counts(pasillaGenes), pData(pasillaGenes)[,c("condition","type")], + ~ type + condition ) Note: levels of factors in the design contain characters other than letters, numbers, '_' and '.'. It is recommended (but not required) to use only letters, numbers, and delimiters '_' or '.', as these are safe characters for column names in R. [This is a message, not an warning or error]  This runs okay, till I get to betaSE <- mcols(rowData(dse))$conditionuntreated

2. If I can replace that with

betaSE <- rowData(dse)$SE_condition_untreated_vs_treated because all.equal(mcols(dse), rowData(dse)) gives >TRUE > pvalDE <- 2 * pnorm( abs( beta ), sd = betaSE, lower.tail=FALSE ) > all( abs( pvalDE - results(dse)$pvalue ) < 1e-15, na.rm=TRUE ) [1] TRUE

3. I belive the thr is a typo, so  theta <- .3 # So, we do our two one-sided tests. For a one-sided z test, we # simply use tail probabilities from the normal distribution. # First, the test of H0_A: true_beta > thr pA <- pnorm( beta, theta, betaSE, lower.tail=TRUE ) # Next, the test of H0_B: true_beta < -thr pB <- pnorm( beta, -theta, betaSE, lower.tail=FALSE ) # The overall p value is the maximum, because we want to reject H0_A # and H0_B simultaneously pvalTOST <- pmax( pA, pB ) # Let's adjust our two p values with BH: sigDE <- p.adjust( pvalDE, "BH" ) < .1 sigSmall <- p.adjust( pvalTOST, "BH" ) < .1

4. the number of genes with no difference

{r} > sum(!is.na(sigSmall)) [1] 11836 #However, sigSmallv <- p.adjust( pvalTOST, "BH" ) min(na.omit(sigSmallv) [1] 0.301

If the smallest value is 0.301 then, it would not give 11836 significantly equivalent genes. Am I understanding it wrong ?

I am trying to put the code that was run :) running into formatting issues

Answer: Genes that have NOT changed
0
7 months ago by
Michael Love24k
United States
Michael Love24k wrote:

This code goes back to a post from 6 years ago (2013), and since then we published the DESeq2 paper (2014) with a section about this type of analysis and also have an easy to use function which does this for you. It is highlighted in the vignette, look here:

https://www.bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#tests-of-log2-fold-change-above-or-below-a-threshold

For more details on what is going on, read the DESeq2 paper where it discusses LFC thresholds.

For the record, here is the 2012 post where the code comes from:

A: Testing for no change in RNA-seq data?

Thank you Micheal, I did read it would be incorporated in the package, but didn't read the vignette carefully I guess.

Am I correct in reading that "lessAbs` - |β| < x - p values are the maximum of the upper and lower tests " is the replacement for the that code for TOST (test for equivalency)?

This is answered in the DESeq2 paper, we describe the methods exactly (see Methods).

resLA <- results(dds, lfcThreshold=.5, altHypothesis="lessAbs")  sum(!is.na(resLA\$padj) < 0.1)

can we say, this is the number of genes which are NOT different within logfoldchange of 0.5 ?