Search
Question: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
gravatar for anpham
3.3 years ago by
anpham0
anpham0 wrote:

I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.

ADD COMMENTlink modified 6 weeks ago by ta_awwad10 • written 3.3 years ago by anpham0

Hi Mike,

I tried to do the same but it gave me this error message:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("X","Y"), ]
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedRleList"’

ADD REPLYlink written 6 weeks ago by ta_awwad10
1

See Martins post below

ADD REPLYlink written 6 weeks ago by Michael Love19k

thanks much mike, I tried it already but it doesn't seem working as the number of DE genes are still the same as if X and Y genes were not removed! .. I don't know what to do in order to make this correctly.

thanks

ADD REPLYlink written 6 weeks ago by ta_awwad10
1

Are you using all()? I don’t see it in your code. Martin’s code shows how to properly subset if you have a GRangesList.

ADD REPLYlink written 6 weeks ago by Michael Love19k
> seqnames(dds) %in% c("X", "Y")​
RleList of length 52636​

> !seqnames(dds) %in% c("X", "Y")
RleList of length 52636
ADD REPLYlink written 6 weeks ago by ta_awwad10
0
gravatar for Michael Love
3.3 years ago by
Michael Love19k
United States
Michael Love19k wrote:

It's perfectly valid to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]

seqnames(rowData(dds))

Or for the most recent release of Bioconductor (3.1):

seqnames(rowRanges(dds))

Then just subset the dds:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
ADD COMMENTlink modified 3.3 years ago • written 3.3 years ago by Michael Love19k
1

SummarizedExperiment (and hence DESeqDataSet) 'knows' that it has ranges as rows, so seqnames(dds) is enough.  For a GRangesList, I guess the assumption is that that all element ranges have the same chromosome, so dds[ all(!seqnames(dds) %in% c("chrX", "chrY")), ]. Unpacking a bit, seqnames() on a GRangesList returns an RleList of seqnames, one element of the list for each element of the GRangesList. %in% returns an RleList of logical values, again retaining the geometry. And all() is applied to each element of the list, returning a logical vector of the same length as the original GRangesList.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Martin Morgan ♦♦ 22k
0
gravatar for anpham
3.3 years ago by
anpham0
anpham0 wrote:

Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.

 

csvfile1 <- "table1.csv"
(sampleTable1 <- read.csv(csvfile1, row.names=1))

filenames1 <- file.path(paste0(sampleTable1$Run, ".bam"))

library("Rsamtools")
bamfiles1 <- BamFileList(filenames1, yieldSize=2000000)

biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
source("http://bioconductor.org/biocLite.R")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

(genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))

library("GenomicAlignments")
se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE)

 

library("DESeq2")
library("BiocParallel")
register(MulticoreParam(4))
dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k)
dds1$PTSD_1mo_k <- relevel(dds1$PTSD_1mo_k, "control")

## subsetting dds1
seqnames(rowData(dds1))
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]

Error in x[i, , drop = FALSE] : invalid subscript type 'list'

 

ADD COMMENTlink written 3.3 years ago by anpham0

I forgot, I was thinking GRanges but you have a GRangesList.

[edit] see Martin's answer above.

ADD REPLYlink modified 3.3 years ago • written 3.3 years ago by Michael Love19k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 351 users visited in the last hour