Dear all kind helpers,
First of all I would like to apologize you all, if I assume anything wrong or too annoying to tolerate here. Since I'm a basic immunologist who start to learn RNAseq, I hope you understand how far I can go with confusion and misunderstanding.
I have read 'IHW' package and try to apply it for uninformative gene filtering. With the same 'airway' data use in the package introduction, I modify the script a bit in order to acquire the best α value that renders the largest number of significant genes at FDR<=0.1. The script is as following.
library("IHW")
library("DESeq2")
library("dplyr")
data("airway", package = "airway")
dds <- DESeqDataSet(se = airway, design = ~ cell + dex) %>% DESeq
deRes <- as.data.frame(results(dds))
sel <- vector(mode='integer')
for(i in seq(from=0, to=0.5, by=0.01)) {
ihwRes <- ihw(pvalue ~ baseMean, data = deRes, alpha = i)
temp <- sum(adj_pvalues(ihwRes) <= 0.1, na.rm = TRUE)
sel <- c(sel, temp)
}
names(sel) <- seq(from=0, to=0.5, by=0.01)
sel_alpha <- as.numeric(names(which.max(sel))) #At this point I acquire alpha value which gave me maximal gene number with FDR<=0.1
ihwRes <- ihw(pvalue ~ baseMean, data = deRes, alpha = sel_alpha)
adj_pvalues(ihwRes) <= sel_alpha
gene_ind <- which(adj_pvalues(ihwRes) <= sel_alpha) # At this point I acquire filtered gene indices.
filtered_count_table <- counts(dds)[gene_ind,] # Now I acquired filtered count table.
sessionInfo()
R version 3.3.2 (2016-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.1 LTS
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=th_TH.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=th_TH.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=th_TH.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=th_TH.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] parallel stats4 stats graphics grDevices utils datasets
[8] methods base
other attached packages:
[1] IHW_1.2.0 dplyr_0.5.0
[3] DESeq2_1.14.1 SummarizedExperiment_1.4.0
[5] Biobase_2.34.0 GenomicRanges_1.26.2
[7] GenomeInfoDb_1.10.2 IRanges_2.8.1
[9] S4Vectors_0.12.1 BiocGenerics_0.20.0
My questions are as following:
- According to the above script, would you please clarify me if I acquire my filtered count table correctly or not?
- In case that I plan to apply 'IHW' package with 'edgeR', 'limma' and 'EBSeq', what is the suitable covariate for each package and how to acquire them?
With great respects,
Kaj Chokeshaiusaha
Thank you very much for your great and very helpful answer!!