DESeq2 annotation file
1
0
Entering edit mode
zen • 0
@zen-22107
Last seen 4.2 years ago

Hello,

I have res output using DESeq2 which looks like:

              baseMean log2FoldChange     lfcSE      stat       pvalue      padj
B_08g068390    1266.29961      -3.271902 0.4937087 -6.627191 3.421356e-11   6.201550e-07

I have imported gff file for annotation using:

GFF <- makeTxDbFromGFF(file = "gene_models.gff")

gff has following columns:

##gff-version 3
B3ch01  maker   CDS     218 220 .   -   0   ID=B_06g024400.1.1:cds;Parent=B_06g024400;
B3ch01  maker   exon    218 220 .   -   .   ID=B_06g024400.1.1:exon:2;Parent=B_06g024400;
B3ch01  maker   gene    218 220 .   -   .   ID=B_06g024400.1;Name=B_06g024400;Note=Similar to 24400: LOW QUALITY:50S ribosomal protein;ref_id=24400;
B3ch01  maker   mRNA    218 220 126 -   .   ID=B_06g024400.1.1;Parent=B_06g024400;Name=B_06g024400;Note=Similar to 24400.1.1: LOW QUALITY:50S ribosomal protein;ref_id=24400;

The column names for gff file are:

seqnames      ranges    strand |   source     type     score     phase   ID    Parent    Name  Note  ref_id          Dbxref   Ontology_term

I want to add gene annotations (such as LOW QUALITY:50S ribosomal protein) for differentially expressed genes in res from GFF file. I will appreciate any help. Thank you!

annotation • 2.1k views
ADD COMMENT
0
Entering edit mode
Kevin Blighe ★ 4.0k
@kevin
Last seen 6 weeks ago
Republic of Ireland

Hi, there is a clear example in section 3.3 of the vignette for GenomicFeatures, here: 3.3 Retrieving data using the select method

So, you probably need something like:

select(GFF, keys = rownames(res), columns = 'Note', keytype = 'ID')
ADD COMMENT
0
Entering edit mode

Thank you, I used it and got this error:

Error in UseMethod("select_") : 
  no applicable method for 'select_' applied to an object of class "c('TxDb', 'AnnotationDb', 'envRefClass', '.environment', 'refClass', 'environment', 'refObject', 'AssayData')"

I also tried using:

AnnotationDbi::select(GFF1, keys = rownames(res), columns = 'Note', keytype = 'ID')

But it got error:

Error in testForValidKeytype(x, keytype) : 
  Invalid keytype: ID. Please use the keytypes method to see a listing of valid arguments.
ADD REPLY
0
Entering edit mode

Cool. What is the output of:

columns(GFF)
keytypes(GFF)

?

ADD REPLY
0
Entering edit mode

Thank you, here is the output:

> columns(GFF)
     [1] "CDSCHROM"   "CDSEND"     "CDSID"      "CDSNAME"    "CDSPHASE"   "CDSSTART"   "CDSSTRAND"  "EXONCHROM" 
     [9] "EXONEND"    "EXONID"     "EXONNAME"   "EXONRANK"   "EXONSTART"  "EXONSTRAND" "GENEID"     "TXCHROM"   
    [17] "TXEND"      "TXID"       "TXNAME"     "TXSTART"    "TXSTRAND"   "TXTYPE"    
> keytypes(GFF)
    [1] "CDSID"    "CDSNAME"  "EXONID"   "EXONNAME" "GENEID"   "TXID"     "TXNAME"
ADD REPLY
0
Entering edit mode

Thank you. I am confident that you can resolve the problem from this point. please take time to review the example in the vignette to which I linked in my original answer.

ADD REPLY
0
Entering edit mode

Thank you, I looked at the example you shared. The information from gff file that I want to join to DESeq2 output is under Note. I searched that there is a attribute column in gff file and Note comes in that. But in my gff file with columns(GFF), there is no column named attribute or Note. I tried looking into all the columns from the output and did not get the information I want.

ADD REPLY
0
Entering edit mode

Thank you, I looked at the example you shared. The information from gff file that I want to join to DESeq2 output is under Note. I searched that there is a attribute column in gff file and Note comes in that. But in my gff file with columns(GFF), there is no column named attribute or Note. I tried looking into all the columns from the output and did not get the information I want.

ADD REPLY
0
Entering edit mode

Is it even reading the GFF file correctly? From what I can see, the file should be GTF or GFF3. If you do this, what output appears?

head(select(GFF, columns = columns(GFF), keys = keys(GFF), keytype = c('GENEID')))
ADD REPLY
0
Entering edit mode

The output from above command is:

                 GENEID CDSID                     CDSNAME   CDSCHROM CDSSTRAND CDSSTART   CDSEND EXONID  EXONNAME  EXONCHROM EXONSTRAND EXONSTART  EXONEND TXID EXONRANK CDSPHASE   TXNAME TXTYPE    TXCHROM TXSTRAND  TXSTART    TXEND
BV_01g005000 21450 BV_01g005000.3.1:cds BV1.7ch02         + 10086101 10086415  22323  BV_00g005000.3.1:exon:16909 BV1.7ch02          +  10086101 10086415 4701        1        0  BV_00g005000.3.1   mRNA BV1.7ch02        + 10086101 10087561

Here is an example line from gff file:

BV1.7ch00   maker   gene    21882   22007   .   -   .   ID=BV_06g024400.1;Name=BV_06g024400.1;Note=Similar to c06g024400.1.1: LOW QUALITY:protein;ref_id=c06g024400.1.1;

I am not sure why Note section is not showing up in the head command. Thank you!

ADD REPLY
1
Entering edit mode

Hi, it seems that this may be a limitation of this package. I would recommend instead to read the GFF into a data frame using, for example, ape::read.gff(), and then annotate that way.

ADD REPLY
0
Entering edit mode

Thank you! It does show attributes column from gff now. I tried using same select method to use Note information from attributes column with DESeq2 res output file but it didn't work. Can you suggest if there is any other way to do this.

ADD REPLY
0
Entering edit mode

Thank you! It does show attributes column from gff now. I tried using same select method to use Note information from attributes column with DESeq2 res output file but it didn't work. Can you suggest if there is any other way to do this.

ADD REPLY
0
Entering edit mode

Thank you! It does show attributes column from gff now. I tried using same select method to use Note information from attributes column with DESeq2 res output file but it didn't work. Can you suggest if there is any other way to do this.

ADD REPLY
1
Entering edit mode

If you use ape::read.gff(), then you cannot use select() - these methods are incompatible. By using ape::read.gff(), you can then match the GFF annotation to the genes in your res object via other functions, such as which() and match()

ADD REPLY
0
Entering edit mode

Thank you! Based on the attributes column from gff, how can I mention ID to match and add Note to res dataset. I tried:

p3 <- GFF$attributes[match(res$seqid, GFF$attributes)]

Ideally I want to do,

p3 <- GFF$(Note from attributes)[match(res$seqid, GFF$(ID from attributes))]
ADD REPLY
0
Entering edit mode

Hi, for this secondary issue of matching up the annotation data to the DESeq2 results object, i would suggest that you take some time to go through tutorials about this online. There should be many out there. Thanks!

ADD REPLY

Login before adding your answer.

Traffic: 616 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6