remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
1
0
Entering edit mode
anpham ▴ 60
@anpham-7402
Last seen 7.2 years ago

I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.

deseq2 rnaseq • 5.6k views
ADD COMMENT
0
Entering edit mode

Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.

 

csvfile1 <- "table1.csv"
(sampleTable1 <- read.csv(csvfile1, row.names=1))

filenames1 <- file.path(paste0(sampleTable1$Run, ".bam"))

library("Rsamtools")
bamfiles1 <- BamFileList(filenames1, yieldSize=2000000)

biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
source("http://bioconductor.org/biocLite.R")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

(genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))

library("GenomicAlignments")
se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE)

 

library("DESeq2")
library("BiocParallel")
register(MulticoreParam(4))
dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k)
dds1$PTSD_1mo_k <- relevel(dds1$PTSD_1mo_k, "control")

## subsetting dds1
seqnames(rowData(dds1))
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]

Error in x[i, , drop = FALSE] : invalid subscript type 'list'

 

ADD REPLY
0
Entering edit mode

I forgot, I was thinking GRanges but you have a GRangesList.

[edit] see Martin's answer above.

ADD REPLY
0
Entering edit mode

Hi Mike,

I tried to do the same but it gave me this error message:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("X","Y"), ]
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedRleList"’

ADD REPLY
1
Entering edit mode

See Martins post below

ADD REPLY
0
Entering edit mode

thanks much mike, I tried it already but it doesn't seem working as the number of DE genes are still the same as if X and Y genes were not removed! .. I don't know what to do in order to make this correctly.

thanks

ADD REPLY
1
Entering edit mode

Are you using all()? I don’t see it in your code. Martin’s code shows how to properly subset if you have a GRangesList.

ADD REPLY
0
Entering edit mode
> seqnames(dds) %in% c("X", "Y")​
RleList of length 52636​

> !seqnames(dds) %in% c("X", "Y")
RleList of length 52636
ADD REPLY
0
Entering edit mode
@mikelove
Last seen 3 days ago
United States

It's perfectly valid (statistically) to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]

seqnames(rowData(dds))

Or for the most recent release of Bioconductor (3.1):

seqnames(rowRanges(dds))

Then just subset the dds:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
ADD COMMENT
2
Entering edit mode

SummarizedExperiment (and hence DESeqDataSet) 'knows' that it has ranges as rows, so seqnames(dds) is enough.  For a GRangesList, I guess the assumption is that that all element ranges have the same chromosome, so dds[ all(!seqnames(dds) %in% c("chrX", "chrY")), ]. Unpacking a bit, seqnames() on a GRangesList returns an RleList of seqnames, one element of the list for each element of the GRangesList. %in% returns an RleList of logical values, again retaining the geometry. And all() is applied to each element of the list, returning a logical vector of the same length as the original GRangesList.

ADD REPLY

Login before adding your answer.

Traffic: 388 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6