Question: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
4.2 years ago by
anpham0
anpham0 wrote:

I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.

rnaseq deseq2 • 2.1k views
modified 12 months ago by ta_awwad10 • written 4.2 years ago by anpham0

Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.

csvfile1 <- "table1.csv"

filenames1 <- file.path(paste0(sampleTable1$Run, ".bam")) library("Rsamtools") bamfiles1 <- BamFileList(filenames1, yieldSize=2000000) biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene") source("http://bioconductor.org/biocLite.R") library(TxDb.Hsapiens.UCSC.hg19.knownGene) (genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene")) library("GenomicAlignments") se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE) library("DESeq2") library("BiocParallel") register(MulticoreParam(4)) dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k) dds1$PTSD_1mo_k <- relevel(dds1\$PTSD_1mo_k, "control")

## subsetting dds1
seqnames(rowData(dds1))
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]

Error in x[i, , drop = FALSE] : invalid subscript type 'list'

I forgot, I was thinking GRanges but you have a GRangesList.

Hi Mike,

I tried to do the same but it gave me this error message:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("X","Y"), ]
Error in (function (classes, fdef, mtable)  :
unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedRleList"’

1

See Martins post below

thanks much mike, I tried it already but it doesn't seem working as the number of DE genes are still the same as if X and Y genes were not removed! .. I don't know what to do in order to make this correctly.

thanks

1

Are you using all()? I don’t see it in your code. Martin’s code shows how to properly subset if you have a GRangesList.

> seqnames(dds) %in% c("X", "Y")​
RleList of length 52636​

> !seqnames(dds) %in% c("X", "Y")
RleList of length 52636
Answer: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
4.2 years ago by
Michael Love24k
United States
Michael Love24k wrote:

It's perfectly valid (statistically) to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]

seqnames(rowData(dds))

Or for the most recent release of Bioconductor (3.1):

seqnames(rowRanges(dds))

Then just subset the dds:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
1

SummarizedExperiment (and hence DESeqDataSet) 'knows' that it has ranges as rows, so seqnames(dds) is enough.  For a GRangesList, I guess the assumption is that that all element ranges have the same chromosome, so dds[ all(!seqnames(dds) %in% c("chrX", "chrY")), ]. Unpacking a bit, seqnames() on a GRangesList returns an RleList of seqnames, one element of the list for each element of the GRangesList. %in% returns an RleList of logical values, again retaining the geometry. And all() is applied to each element of the list, returning a logical vector of the same length as the original GRangesList.