The support.bioconductor.org editor has been updated to markdown! Please see more info at: Tutorial: Updated Support Site Editor

Question: issue retrieving metadata after psetdiff function
0
gravatar for andreag1987
12 months ago by
andreag19870 wrote:

Hello,

I am trying to fetch promoter regions that don't overlap with cds. This is what my code looks like:

phy<-makeTxDbFromGRanges(phy)
GR <- cds(phy,columns=c("GENEID","TXID","CDSNAME","CDSID"))

indexFa(file.choose())
#select the fasta file
fasta<-file.choose()
phy.fasta<-FaFile(fasta, index=sprintf("%s.fai", fasta))

#selecting the promoter regions
p<-promoters(phy,300,0,columns=c("GENEID","TXID","CDSNAME","CDSID"))
names(p)<-paste(p$GENEID)

#retrieving those sequences that don't fall into cds

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

#extracting sequences from the fasta file
promoter_seqs_fa <- getSeq(phy.fasta,promoter_seqs)

Initially I used subsetByOverlaps:

promoter_seqs<-subsetByOverlaps(p,GR,minoverlap=284,invert = TRUE)

but than I wanted to keep the sequences with less than 300bp without having overlaps, so I find this way:

hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)

The problem is that after calling psetdiff() and unlist (), I am not able to retrieve any metadata column using mcols. I use the metadata columns to map each fragment with the correspondent gene.

Using:

mcols(promoter_seqs)<-mcols(p)

after unlisting doesn't work because promoter_seqs and p will have different lenghts.

Thanks to everyone who can help me.

 

granges grangeslist • 212 views
ADD COMMENTlink modified 12 months ago • written 12 months ago by andreag19870
Answer: issue retrieving metadata after psetdiff function
0
gravatar for andreag1987
12 months ago by
andreag19870 wrote:

By the way I found a solution, I edited the psetdiff() methods adding the following line:

mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))

Here the code with the added line:

setMethod("psetdiff", c("GRanges", "GRangesList"),
          function(x, y, ignore.strand=FALSE)
          {
            ansSeqinfo <- merge(seqinfo(x), seqinfo(y))
            if (length(x) != length(y))
              stop("'x' and 'y' must have the same length")
            ok <- compatibleSeqnames(rep(seqnames(x), elementNROWS(y)),
                                     seqnames(y@unlistData))
            if (!ignore.strand)
              ok <- ok & compatibleStrand(rep(strand(x), elementNROWS(y)),
                                          strand(y@unlistData))
            if (!all(ok)) {
              ok <-
                new2("CompressedLogicalList", unlistData=as.vector(ok),
                     partitioning=y@partitioning)
              y <- y[ok]
            }
            
            ansRanges <- gaps(ranges(y), start=start(x), end=end(x))
            ansSeqnames <- rep(seqnames(x), elementNROWS(ansRanges))
            ansStrand <- rep(strand(x), elementNROWS(ansRanges))
            
            ansGRanges <-
              GRanges(ansSeqnames, unlist(ansRanges, use.names=TRUE), ansStrand)
            seqinfo(ansGRanges) <- ansSeqinfo
            mcols(ansGRanges)<-rep(mcols(x),elementNROWS(ansRanges))
            relist(ansGRanges, PartitioningByEnd(ansRanges))
            
          }
)

Hope it can be helpful

ADD COMMENTlink written 12 months ago by andreag19870

The current behavior is intentional. There is not a 1-1 correspondence between the input and output ranges (which required the use of rep() here). Therefore it does not make sense in general to carry over the metadata.

ADD REPLYlink written 12 months ago by Michael Lawrence10k
Please log in to add an answer.

Help
Access

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Powered by Biostar version 16.09
Traffic: 164 users visited in the last hour