Extracting results for a single chromosome
1
0
Entering edit mode
Charlotte • 0
@charlotte-23646
Last seen 3.7 years ago

Hello,

I have run DESeq2 for genome wide ATAC-seq data. At the moment genomic locations are reported in the DESeq results table in this style: "Chr112421625" where 1245 and 1625 are the start and end positions along the chromosome. I would like to obtain the peaks for just one chromosome for downstream analysis. Is there any way of subsetting the results of DESeq2 by providing it with a chromosome number?

I am reluctant to subset the dataset and just run the analysis on the single chromosome as I assume that would reduce the power of the analysis,

I hope this makes sense,

Thank you!

deseq2 • 683 views
ADD COMMENT
1
Entering edit mode
@mikelove
Last seen 7 hours ago
United States

This is more about the container object that DESeq2 builds on:

https://bioconductor.org/packages/release/bioc/html/SummarizedExperiment.html

The dds object is a SummarizedExperiment, and you can subset a RangedSummarizedExperiment with square brackets, e.g.:

dds[dds %over% x,]

where x is a GRanges object.

The ranges of the dds are stored as rowRanges(dds). If you use tximeta to build the dds object then ranges are automatically added by the software. Otherwise, you may need to manually add the ranges for the genes in the dds.

ADD COMMENT
0
Entering edit mode

By the way, if you just want to subset the dds by gene name, you can do:

dds[genes,]

Where genes is a character vector of the names of genes on a particular chromosome.

ADD REPLY
0
Entering edit mode

Thank you. I don't think I can use tximeta as I work on a non-model organism. How do you manually add the gene ranges into the dds?

ADD REPLY
0
Entering edit mode

Ok, the part where you annotate the gene ranges is up to you.

You may want to ask a bioinformatics collaborator about how to annotate your data, as this is out of scope for the DESeq2 software. There are various annotation packages on Bioconductor that you can use to get the information you need.

Another option would be to post to Biostars (if you do, link from here to there so others can see your post).

ADD REPLY
0
Entering edit mode

I tried to use dds[dds %over% x,] where x is a vector containing the list of coordinates of the chromosome of interest. However, this returned the error:

Error in (function (classes, fdef, mtable) : unable to find an inherited method for function 'overlapsAny' for signature ''"DESeqDataSet", "character" '

Instead I used this:

ddsObjchr1 <- ddsObj[subrows, ] where sub_rows contains the row names corresponding to Chr1

Is this a valid way? I am not sure as I noticed when I check the table of padj < 0.05 for chr1 for the full dataset I retrieve less hits than when I check the table of padj < 0.05 for chr1 for the subsetted results (ddsObj_chr1).

ADD REPLY
0
Entering edit mode

The %over% trick would only worked if you had a RangedSummarizedExperiment.

The second way is how to go if you don't have ranges, it is perfectly valid.

"when I check the table of padj < 0.05 for chr1 for the full dataset I retrieve less hits than when I check the table of padj < 0.05 for chr1"

You will get different p-values depending on if you give DESeq2 different rows of data. It is a hierarchical Bayesian model, so it depends on information shared across genes. I can't even say if you will get more or less with subsetting, there are many complex interdependencies in the data and the model.

ADD REPLY
0
Entering edit mode

Ok thank you, I thought it would be due to that. I just thought it was worth double checking the validity as the number of hits for the chr rose from 4 to 13 which was surprising - I didn't want to proceed with the method simply because it gave more more hits without it being statistically sound

ADD REPLY

Login before adding your answer.

Traffic: 497 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6