Search
Question: remove X and Y chromosome genes in RNA-seq data using DESeq2 pipeline
0
gravatar for anpham
2.6 years ago by
anpham0
anpham0 wrote:

I'd like to remove the genes on the X and Y chromosomes from my human RNA-seq data before doing differential analysis using DESeq2. I've looked through the RNA-seq Workflow and DESeq2 manuals but didn't see this as an option. Any help in performing this step and still using the DESeq2 or RNA-seq Workflow pipeline would be much appreciated. Thanks.

ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by anpham0
0
gravatar for Michael Love
2.6 years ago by
Michael Love15k
United States
Michael Love15k wrote:

It's perfectly valid to do this, but you might have to do some legwork. How did you make the count matrix? If you used summarizeOverlaps, the chromosomes are right at hand [edit: see Martin's comment for GRangesList]

seqnames(rowData(dds))

Or for the most recent release of Bioconductor (3.1):

seqnames(rowRanges(dds))

Then just subset the dds:

dds.sub <- dds[ ! seqnames(rowRanges(dds)) %in% c("chrX","chrY"), ]
ADD COMMENTlink modified 2.6 years ago • written 2.6 years ago by Michael Love15k
1

SummarizedExperiment (and hence DESeqDataSet) 'knows' that it has ranges as rows, so seqnames(dds) is enough.  For a GRangesList, I guess the assumption is that that all element ranges have the same chromosome, so dds[ all(!seqnames(dds) %in% c("chrX", "chrY")), ]. Unpacking a bit, seqnames() on a GRangesList returns an RleList of seqnames, one element of the list for each element of the GRangesList. %in% returns an RleList of logical values, again retaining the geometry. And all() is applied to each element of the list, returning a logical vector of the same length as the original GRangesList.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Martin Morgan ♦♦ 20k
0
gravatar for anpham
2.6 years ago by
anpham0
anpham0 wrote:

Thank you for your response. I followed the instruction and got an error message and not sure how to fix it. I used the UCSC hg19 to make the count matrix using GenomicAlignments. I copied the codes and the error message here. (By the way, I named my dds "dds1"). Thanks.

 

csvfile1 <- "table1.csv"
(sampleTable1 <- read.csv(csvfile1, row.names=1))

filenames1 <- file.path(paste0(sampleTable1$Run, ".bam"))

library("Rsamtools")
bamfiles1 <- BamFileList(filenames1, yieldSize=2000000)

biocLite("TxDb.Hsapiens.UCSC.hg19.knownGene")
source("http://bioconductor.org/biocLite.R")
library(TxDb.Hsapiens.UCSC.hg19.knownGene)

(genes <- transcriptsBy(TxDb.Hsapiens.UCSC.hg19.knownGene, by="gene"))

library("GenomicAlignments")
se1 <- summarizeOverlaps(features=genes, reads=bamfiles1, mode="Union", singleEnd=FALSE, ignore.strand=TRUE, fragments=TRUE)

 

library("DESeq2")
library("BiocParallel")
register(MulticoreParam(4))
dds1 <- DESeqDataSet(se, design = ~ gender + agecat3 + sfrace2 + RNAbatch + Site + PTSD_1mo_k)
dds1$PTSD_1mo_k <- relevel(dds1$PTSD_1mo_k, "control")

## subsetting dds1
seqnames(rowData(dds1))
dds1.sub <- dds1[ ! seqnames(rowData(dds1)) %in% c("chrX", "chrY"), ]

Error in x[i, , drop = FALSE] : invalid subscript type 'list'

 

ADD COMMENTlink written 2.6 years ago by anpham0

I forgot, I was thinking GRanges but you have a GRangesList.

[edit] see Martin's answer above.

ADD REPLYlink modified 2.6 years ago • written 2.6 years ago by Michael Love15k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 2.2.0
Traffic: 212 users visited in the last hour