Question: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
gravatar for anpham
4.2 years ago by
anpham0
anpham0 wrote:

I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.

rnaseq deseq2 • 2.1k views
ADD COMMENTlink modified 12 months ago by ta_awwad10 • written 4.2 years ago by anpham0

Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.

 

csvfile1 <- "table1.csv"
(sampleTable1 <- read.csv(csvfile1, row.names=1))

filenames1 <- file.path(paste0(sampleTable1$Run, ".bam"))

library("Rsamtools")
bamfiles1 <- BamFileList(filenames1, yieldSize=2000000)

biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
source("http://bioconductor.org/biocLite.R")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

(genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))

library("GenomicAlignments")
se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE)

 

library("DESeq2")
library("BiocParallel")
register(MulticoreParam(4))
dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k)
dds1$PTSD_1mo_k <- relevel(dds1$PTSD_1mo_k, "control")

## subsetting dds1
seqnames(rowData(dds1))
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]

Error in x[i, , drop = FALSE] : invalid subscript type 'list'

 

ADD REPLYlink written 4.2 years ago by anpham0

I forgot, I was thinking GRanges but you have a GRangesList.

[edit] see Martin's answer above.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Michael Love24k

Hi Mike,

I tried to do the same but it gave me this error message:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("X","Y"), ]
Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘NSBS’ for signature ‘"CompressedRleList"’

ADD REPLYlink written 12 months ago by ta_awwad10
1

See Martins post below

ADD REPLYlink written 12 months ago by Michael Love24k

thanks much mike, I tried it already but it doesn't seem working as the number of DE genes are still the same as if X and Y genes were not removed! .. I don't know what to do in order to make this correctly.

thanks

ADD REPLYlink written 12 months ago by ta_awwad10
1

Are you using all()? I don’t see it in your code. Martin’s code shows how to properly subset if you have a GRangesList.

ADD REPLYlink written 12 months ago by Michael Love24k
> seqnames(dds) %in% c("X", "Y")​
RleList of length 52636​

> !seqnames(dds) %in% c("X", "Y")
RleList of length 52636
ADD REPLYlink written 12 months ago by ta_awwad10
Answer: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
gravatar for Michael Love
4.2 years ago by
Michael Love24k
United States
Michael Love24k wrote:

It's perfectly valid (statistically) to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]

seqnames(rowData(dds))

Or for the most recent release of Bioconductor (3.1):

seqnames(rowRanges(dds))

Then just subset the dds:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
ADD COMMENTlink modified 3 months ago • written 4.2 years ago by Michael Love24k
1

SummarizedExperiment (and hence DESeqDataSet) 'knows' that it has ranges as rows, so seqnames(dds) is enough.  For a GRangesList, I guess the assumption is that that all element ranges have the same chromosome, so dds[ all(!seqnames(dds) %in% c("chrX", "chrY")), ]. Unpacking a bit, seqnames() on a GRangesList returns an RleList of seqnames, one element of the list for each element of the GRangesList. %in% returns an RleList of logical values, again retaining the geometry. And all() is applied to each element of the list, returning a logical vector of the same length as the original GRangesList.

ADD REPLYlink modified 4.2 years ago • written 4.2 years ago by Martin Morgan ♦♦ 23k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 121 users visited in the last hour