I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.
It's perfectly valid to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]
Or for the most recent release of Bioconductor (3.1):
Then just subset the dds:
dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.
csvfile1 <- "table1.csv"
(sampleTable1 <- read.csv(csvfile1, row.names=1))
filenames1 <- file.path(paste0(sampleTable1$Run, ".bam"))
bamfiles1 <- BamFileList(filenames1, yieldSize=2000000)
(genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))
se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE)
dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k)
dds1$PTSD_1mo_k <- relevel(dds1$PTSD_1mo_k, "control")
## subsetting dds1
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]
Error in x[i, , drop = FALSE] : invalid subscript type 'list'