Question

How to use 'IHW' package to filter genes?

0

Entering edit mode

tofukaj • 0

@tofukaj-9928

Last seen 3.8 years ago

Dear all kind helpers,

First of all I would like to apologize you all, if I assume anything wrong or too annoying to tolerate here. Since I'm a basic immunologist who start to learn RNAseq, I hope you understand how far I can go with confusion and misunderstanding.

I have read 'IHW' package and try to apply it for uninformative gene filtering. With the same 'airway' data use in the package introduction, I modify the script a bit in order to acquire the best α value that renders the largest number of significant genes at FDR<=0.1. The script is as following.

library("IHW")
library("DESeq2")
library("dplyr")
data("airway", package = "airway")

dds <- DESeqDataSet(se = airway, design = ~ cell + dex) %>% DESeq
deRes <- as.data.frame(results(dds))

sel <- vector(mode='integer')
for(i in seq(from=0, to=0.5, by=0.01)) {
    ihwRes <- ihw(pvalue ~ baseMean, data = deRes, alpha = i)
    temp <- sum(adj_pvalues(ihwRes) <= 0.1, na.rm = TRUE)
    sel <- c(sel, temp)
    }
names(sel) <- seq(from=0, to=0.5, by=0.01)
sel_alpha <- as.numeric(names(which.max(sel))) #At this point I acquire alpha value which gave me maximal gene number with FDR<=0.1
ihwRes <- ihw(pvalue ~ baseMean, data = deRes, alpha = sel_alpha)
adj_pvalues(ihwRes) <= sel_alpha

gene_ind <- which(adj_pvalues(ihwRes) <= sel_alpha) # At this point I acquire filtered gene indices.

filtered_count_table <- counts(dds)[gene_ind,] # Now I acquired filtered count table.

sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS

locale:
[1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C
[3] LC_TIME=th_TH.UTF-8        LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=th_TH.UTF-8    LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=th_TH.UTF-8       LC_NAME=C
[9] LC_ADDRESS=C               LC_TELEPHONE=C
[11] LC_MEASUREMENT=th_TH.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base

other attached packages:
[1] IHW_1.2.0                  dplyr_0.5.0
[3] DESeq2_1.14.1              SummarizedExperiment_1.4.0
[5] Biobase_2.34.0             GenomicRanges_1.26.2
[7] GenomeInfoDb_1.10.2        IRanges_2.8.1
[9] S4Vectors_0.12.1           BiocGenerics_0.20.0

My questions are as following:

According to the above script, would you please clarify me if I acquire my filtered count table correctly or not?
In case that I plan to apply 'IHW' package with 'edgeR', 'limma' and 'EBSeq', what is the suitable covariate for each package and how to acquire them?

With great respects,

Kaj Chokeshaiusaha

ihw deseq2 edger limma voom • 2.0k views

ADD COMMENT • link updated 7.2 years ago by Bernd Klaus ▴ 610 • written 7.2 years ago by tofukaj • 0

score 1 · Answer 1 · 2017-01-31

Hi Kaj,

1.) No, IHW is not meant to be used with varying alpha parameters. This might inflate the FDR control, which is precisely what we want to avoid with IHW, so don't do this ;).

However, your lists are "monotone" in theory, i.e. all genes that are called significantly differentially expressed at an FDR bound (alpha) of 0.05 will also be called differentially expressed at alpha 0.1. However, as IHW learns the weights from the data, this might not always be the case in practice.

So basically, you should set a desired alpha and then run IHW.

2.) Suitable covariates are not dependent on the algorithm used but rather the experimental data at hand. Thus, you can always use the overall mean of the counts across samples for each gene in an RNA-Seq experiment with IHW.

Assuming your counts are in a matrix of the same name this would be an IHW call like:

ihw(pvalues = <pvalues from your method of choice>, covariates = rowMeans(counts), alpha = 0.1)