Hello,
I am trying to fetch promoter regions that don't overlap with cds. This is what my code looks like:
phy<-makeTxDbFromGRanges(phy)
GR <- cds(phy,columns=c("GENEID","TXID","CDSNAME","CDSID"))
indexFa(file.choose())
#select the fasta file
fasta<-file.choose()
phy.fasta<-FaFile(fasta, index=sprintf("%s.fai", fasta))
#selecting the promoter regions
p<-promoters(phy,300,0,columns=c("GENEID","TXID","CDSNAME","CDSID"))
names(p)<-paste(p$GENEID)
#retrieving those sequences that don't fall into cds
hits <- findOverlaps(p,GR)
overlapl <- extractList(GR, as(hits, "List"))
promoter_seqs<-psetdiff(p, overlapl)
mcols(promoter_seqs)<-mcols(p)
promoter_seqs<-unlist(promoter_seqs)
#extracting sequences from the fasta file
promoter_seqs_fa <- getSeq(phy.fasta,promoter_seqs)
Initially I used subsetByOverlaps:
promoter_seqs<-subsetByOverlaps(p,GR,minoverlap=284,invert = TRUE)
but than I wanted to keep the sequences with less than 300bp without having overlaps, so I find this way:
hits <- findOverlaps(p,GR) overlapl <- extractList(GR, as(hits, "List")) promoter_seqs<-psetdiff(p, overlapl) mcols(promoter_seqs)<-mcols(p) promoter_seqs<-unlist(promoter_seqs)
The problem is that after calling psetdiff() and unlist (), I am not able to retrieve any metadata column using mcols. I use the metadata columns to map each fragment with the correspondent gene.
Using:
mcols(promoter_seqs)<-mcols(p)
after unlisting doesn't work because promoter_seqs and p will have different lenghts.
Thanks to everyone who can help me.

The current behavior is intentional. There is not a 1-1 correspondence between the input and output ranges (which required the use of rep() here). Therefore it does not make sense in general to carry over the metadata.