merge error: 'by.x' and 'by.y' specify different numbers of columns
1
0
Entering edit mode
@tran-nhu-quynh-t-6628
Last seen 9.6 years ago
Hi bioconductor group, I'm working on a chIPseq dataset and trying to annotated my peaks. So, after getting the gene symbol using biomart and want to merge it back to my bed file and it give me this error. Any help is appreciated. Thanks. QT bed_file_orig <- read.delim(file_name, header=FALSE, skip=1) bed_file <- bed_file_orig[!(stri_sub(bed_file_orig$V1, 1, 2)%in%c("HG", "MT", "GL")),] peakList <- BED2RangedData(bed_file) annotatedPeak = annotatePeakInBatch(peakList, AnnotationData=hs_annotation_tss) #add gene ids to the peak: using addGeneIDs gives error if the database does not contain the feature. So use biomart #annotatedPeak_tss <- addGeneIDs(annotatedPeak_tss,"org.Hs.eg.db",c("symbol", "genename")) feature_ids <- unique(annotatedPeak$feature) feature_ids<-feature_ids[!is.na(feature_ids)] feature_ids<-feature_ids[feature_ids!=""] IDs2Add<-getBM(attributes=c("ensembl_gene_id","external_ge ne_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart) out_file_name <- paste("../data/processed/",patient,"_",TF,"_",cond_out, ".csv", sep="") write.csv(annotatedPeak, file=out_file_name) annotatedPeak <- read.csv(out_file_name) annotatedPeak_reorder <- annotatedPeak[,c(9, 1:8, 10:15)] annotatedPeak_tss <- merge(annotatedPeak_reorder, IDs2Add, by.x="feature", b.y="ensembl_gene_id") Error in merge.data.frame(annotatedPeak_reorder, IDs2Add, by.x = "feature", : 'by.x' and 'by.y' specify different numbers of columns _______________________________ Nhu Quynh T. Tran, Ph.D. Assistant Professor of Preventive Medicine University of Tennessee Health Science Center 66 N. Pauline, Suite 633 Memphis, TN 38105 Phone: 901-448-1361 Fax: 901-448-7041 Email: qtran1@uthsc.edu<mailto:qtran1@uthsc.edu> [[alternative HTML version deleted]]
ChIPSeq biomaRt chipseq ChIPSeq biomaRt chipseq • 3.0k views
ADD COMMENT
0
Entering edit mode
@james-w-macdonald-5106
Last seen just now
United States
Hi Nhu Quynh Tran, You are having problems with merge(), which is a base R function, so this question is better asked on R-help. On 6/26/2014 6:35 PM, Tran, Nhu Quynh T wrote: > Hi bioconductor group, > > I'm working on a chIPseq dataset and trying to annotated my peaks. So, after getting the gene symbol using biomart and want to merge it back to my bed file and it give me this error. Any help is appreciated. Thanks. QT > > > bed_file_orig <- read.delim(file_name, header=FALSE, skip=1) > bed_file <- bed_file_orig[!(stri_sub(bed_file_orig$V1, 1, 2)%in%c("HG", "MT", "GL")),] > peakList <- BED2RangedData(bed_file) > annotatedPeak = annotatePeakInBatch(peakList, AnnotationData=hs_annotation_tss) > > #add gene ids to the peak: using addGeneIDs gives error if the database does not contain the feature. So use biomart > #annotatedPeak_tss <- addGeneIDs(annotatedPeak_tss,"org.Hs.eg.db",c("symbol", "genename")) > feature_ids <- unique(annotatedPeak$feature) > feature_ids<-feature_ids[!is.na(feature_ids)] > feature_ids<-feature_ids[feature_ids!=""] > IDs2Add<-getBM(attributes=c("ensembl_gene_id","external _gene_id"),filters = "ensembl_gene_id", values = feature_ids, mart=mart) > > out_file_name <- paste("../data/processed/",patient,"_",TF,"_",cond_out, ".csv", sep="") > write.csv(annotatedPeak, file=out_file_name) > annotatedPeak <- read.csv(out_file_name) > annotatedPeak_reorder <- annotatedPeak[,c(9, 1:8, 10:15)] > annotatedPeak_tss <- merge(annotatedPeak_reorder, IDs2Add, by.x="feature", b.y="ensembl_gene_id") > Is that the actual code you used? If so, note that you used the argument 'b.y', not 'by.y'. So that is likely to be the reason. Best, Jim > > Error in merge.data.frame(annotatedPeak_reorder, IDs2Add, by.x = "feature", : > 'by.x' and 'by.y' specify different numbers of columns > _______________________________ > Nhu Quynh T. Tran, Ph.D. > Assistant Professor of Preventive Medicine > University of Tennessee Health Science Center > 66 N. Pauline, Suite 633 > Memphis, TN 38105 > Phone: 901-448-1361 > Fax: 901-448-7041 > Email: qtran1 at uthsc.edu<mailto:qtran1 at="" uthsc.edu=""> > > > > [[alternative HTML version deleted]] > > _______________________________________________ > Bioconductor mailing list > Bioconductor at r-project.org > https://stat.ethz.ch/mailman/listinfo/bioconductor > Search the archives: http://news.gmane.org/gmane.science.biology.informatics.conductor > -- James W. MacDonald, M.S. Biostatistician University of Washington Environmental and Occupational Health Sciences 4225 Roosevelt Way NE, # 100 Seattle WA 98105-6099
ADD COMMENT

Login before adding your answer.

Traffic: 909 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6