I have an output file containing upregulated genes from a non model organism ordered by adjusted p-value.
geneID
Rp.chr4.1864
Rp.chr1.1957
Rp.chr4.2000
Rp.chrX.1597
Rp.chr4.1782
Rp.chr4.1865
and a second file containing the same gene IDs as well as their best hits from different databases
geneID Nr Nt SwissProt KOG eggNOG Interpro GO KEGG
Rp.chr1.0001 protein BUD31 homolog PREDICTED: Megachile rotundata protein BUD31 homolog (LOC100880403), transcript variant X3, mRNA Protein BUD31 homolog KOG3404: G10 protein/predicted nuclear transcription regulator G10 protein IPR001748: G10 protein; IPR018230: BUD31/G10-related, conserved site GO:0000398: mRNA splicing, via spliceosome; GO:0005634: nucleus; GO:0010467: gene expression K12873: BUD31,G10;bud site selection protein 31
Rp.chr1.0002 putative ATP synthase subunit f, mitochondrial Riptortus pedestris mRNA for conserved hypothetical protein, complete cds, sequence id: Rped-0111 Putative ATP synthase subunit f, mitochondrial KOG4092: Mitochondrial F1F0-ATP synthase, subunit f Mitochondrial F1F0-ATP synthase, subunit f IPR019344: Mitochondrial F1-F0 ATP synthase subunit F, predicted GO:0000276: mitochondrial proton-transporting ATP synthase complex, coupling factor F(o); GO:0005622: intracellular; GO:0005623: cell; GO:0005737: cytoplasm; GO:0005739: mitochondrion; GO:0005740: mitochondrial envelope; GO:0005743: mitochondrial inner membrane; GO:0005753: mitochondrial proton-transporting ATP synthase complex; GO:1902600: proton transmembrane transport K02130: ATPeF0F,ATP5J2;F-type H+-transporting ATPase subunit f
Rp.chr1.0003 hypothetical protein EVAR_64278_1 PREDICTED: Bombyx mandarina uncharacterized LOC114246253 (LOC114246253), transcript variant X2, mRNA - - DNA helicase activity - - -
Rp.chr1.0005 Retrovirus-related Pol polyprotein from type-1 retrotransposable element R1 2 - - - Reverse transcriptase (RNA-dependent DNA polymerase) IPR000477: Reverse transcriptase domain - -
Rp.chr1.0006 - - - - - IPR005135: Endonuclease/exonuclease/phosphatase; IPR036691: Endonuclease/exonuclease/phosphatase superfamily - -
Rp.chr1.0007 piggyBac transposable element-derived protein 4-like; hypothetical protein AGLY_017479 - - - DDE superfamily endonuclease IPR029526: PiggyBac transposable element-derived protein - -
Rp.chr1.0008 hypothetical protein GE061_11589 - - - - - -
I want a command which selects all upregulated genes from file1 and outputs the annotation from file2 next to the correct geneID e.g.:
geneID Nr Nt SwissProt KOG eggNOG Interpro GO KEGG
Rp.chr4.1864 hexamerin Riptortus clavatus mRNA for cyanoprotein alpha subunit precursor, complete cds - - Hemocyanin, all-alpha domain IPR000896: Hemocyanin/hexamerin middle domain; IPR005203: Hemocyanin, C-terminal; IPR005204: Hemocyanin, N-terminal; IPR008922: Uncharacterised domain, di-copper centre; IPR013788: Hemocyanin/hexamerin; IPR014756: Immunoglobulin E-set; IPR036697: Hemocyanin, N-terminal domain superfamily; IPR037020: Hemocyanin, C-terminal domain superfamily - -
Rp.chr1.1957 cuticle protein 7-like - Cuticle protein 19 - pupal cuticle protein IPR000618: Insect cuticle protein GO:0005576: extracellular region; GO:0007275: multicellular organism development; GO:0008010: structural constituent of chitin-based larval cuticle; GO:0031012: extracellular matrix; GO:0040003: chitin-based cuticle development -
Rp.chr4.2000 prophenoloxidase PREDICTED: Acyrthosiphon pisum phenoloxidase 1 (LOC100160034), mRNA Hemocyanin F chain; Phenoloxidase 1 - Common central domain of tyrosinase IPR000896: Hemocyanin/hexamerin middle domain; IPR002227: Tyrosinase copper-binding domain; IPR005203: Hemocyanin, C-terminal; IPR005204: Hemocyanin, N-terminal; IPR008922: Uncharacterised domain, di-copper centre; IPR013788: Hemocyanin/hexamerin; IPR014756: Immunoglobulin E-set; IPR036697: Hemocyanin, N-terminal domain superfamily; IPR037020: Hemocyanin, C-terminal domain superfamily GO:0004503: monophenol monooxygenase activity; GO:0005576: extracellular region; GO:0005615: extracellular space; GO:0006583: melanin biosynthetic process from tyrosine; GO:0035011: melanotic encapsulation of foreign target; GO:0036263: L-DOPA monooxygenase activity; GO:0036264: dopamine monooxygenase activity; GO:0042417: dopamine metabolic process; GO:0050830: defense response to Gram-positive bacterium; GO:0050832: defense response to fungus; GO:0055114: oxidation-reduction process -
Rp.chrX.1597 chitooligosaccharidolytic beta-N-acetylglucosaminidase isoform X1 Riptortus pedestris mRNA for beta-hexosaminidase, partial cds, sequence id: Rped-0394, expressed in midgut Probable beta-hexosaminidase fdl; Chitooligosaccharidolytic beta-N-acetylglucosaminidase KOG2499: Beta-N-acetylhexosaminidase beta-acetyl hexosaminidase like IPR015883: Glycoside hydrolase family 20, catalytic domain; IPR017853: Glycoside hydrolase superfamily; IPR025705: Beta-hexosaminidase; IPR029018: Beta-hexosaminidase-like, domain 2; IPR029019: Beta-hexosaminidase, eukaryotic type, N-terminal GO:0005623: cell; GO:0005886: plasma membrane; GO:0005975: carbohydrate metabolic process; GO:0006032: chitin catabolic process; GO:0006491: N-glycan processing; GO:0006517: protein deglycosylation; GO:0016063: rhodopsin biosynthetic process; GO:0016231: beta-N-acetylglucosaminidase activity; GO:0048069: eye pigmentation; GO:0071944: cell periphery K12373: HEXA_B;hexosaminidase [EC:3.2.1.52]
Rp.chr4.1782 hypothetical protein GE061_16316 - - - - - -
Rp.chr4.1865 hexamerin Riptortus clavatus mRNA for cyanoprotein beta subunit precursor, complete cds - - Hemocyanin, all-alpha domain IPR000896: Hemocyanin/hexamerin middle domain; IPR005203: Hemocyanin, C-terminal; IPR005204: Hemocyanin, N-terminal; IPR008922: Uncharacterised domain, di-copper centre; IPR013788: Hemocyanin/hexamerin; IPR014756: Immunoglobulin E-set; IPR036697: Hemocyanin, N-terminal domain superfamily; IPR037020: Hemocyanin, C-terminal domain superfamily - -
I've tried several options in WSL and R such as join, awk, grep or somethings like:
comm -1 -3 <(sort gene_list.csv) <(sort upregulated_genes.csv) > upreg_genes.csv
or
df1<- read.csv("gene_list.csv")
df2<- read.csv("upregulated_genes.csv")
exporttab <- merge(x=df1, y=df2, by.x='geneID', by.y='gene_list', fill=-9999)
write.csv(exporttab, "known_genes.csv", row.names=FALSE)
However I can't get anything to work and am out of options online. Please help