·   Wiki Home
 ·   Data Processing
 ·   Hemileia vastatrix
 ·   Hypothenemus hampei
 ·   Coffea
 ·   Beauveria bassiana
 ·   Title List
 ·   Uncategorized Pages
 ·   Random Page
 ·   Recent Changes
 ·   Wiki Help
 ·   What Links Here

Active Members:



Create or Find Page:


View Gene Families with OrthoMCL

Before run OrthoMCL is necesary remove repeats:

blastall -p blastp -i RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70.fasta -d /opt/RepeatMasker/RepBase17.01_REPET.embl/repbase17.01_aaSeq_cleaned_TE.fa -e 1e-1 -o RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70.fastaVsRepbase17.01_aaSeq_cleaned_TE.fa -a 2 &

Extract no hits from blast output

grep -e ‘Query=’ -e ‘No hits’ RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70.fastaVsRepbase17.01_aaSeq_cleaned_TE.fa | grep -B 1 ‘No hits’ | grep ‘Query=’ | awk ‘{print $2}’ > RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70_NoTranspososn.ids

cdbfasta RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70.fasta
bq. cdbyank RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70.fasta.cidx < RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70_NoTransposons.ids > RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70_NoTransposons.ids.fasta

Run OrthoMCL 1.4

sudo /opt/orthomcl/ —mode 1 —fa_files “RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70_NoTransposons.ids.fasta”

Count genes by clusters

grep ‘ORTH’ all_orthomcl.out | awk ‘{print $1 $2}’ | sed ‘s/(/\t/’ | sed ‘s/genes,1//’ > Hv_peps70_OrthoMCL_clusterCount.txt

File:Hv peps70

Protein Annotation of the 20 most abundant Families

We performed a BLASTp (NCBI-Web) against nr of the 20 most abundant families. Most families gave hits against P.graminis and other fungi and also against retrotransposons specially from plants. It seems the most abundant families all contain retrotransposons. It is advisable to wait for the masking of the sequences and then run OrthoMCL again.

Result before Masking

OrthoMCL7 (75 genes) = NAD+-dependent aldehyde dehydrogenase-like (ALDH-like) super family. The aldehyde dehydrogenase-like (ALDH-like) group of the ALDH superfamily of NAD+-dependent enzymes which, in general, oxidize a wide range of endogenous and exogenous aliphatic and aromatic aldehydes to their corresponding carboxylic acids and play an important role in detoxification. This group includes families ALDH18, ALDH19, and ALDH20 and represents such proteins as gamma-glutamyl phosphate reductase, LuxC-like acyl-CoA reductase, and coenzyme A acylating aldehyde dehydrogenase. All of these proteins have a conserved cysteine that aligns with the catalytic cysteine of the ALDH group.
ORTHOMCL10 (42 genes) = Catalytic domain of the Protein Serine/Threonine Kinase, Yank1
Serine/Threonine Kinases (STKs), Yank1 or STK32A subfamily, catalytic © domain. STKs catalyze the transfer of the gamma-phosphoryl group from ATP to serine/threonine residues on protein substrates. The Yank1 subfamily is part of a larger superfamily that includes the catalytic domains of other protein STKs, protein tyrosine kinases, RIO kinases, aminoglycoside phosphotransferase, choline kinase, and phosphoinositide 3-kinase. This subfamily contains uncharacterized STKs with similarity to the human protein designated Yank1 or STK32A.
ORTHOMCL14 = inner membrane magnesium transporter mrs2 P.graminis
ORTHOMCL18 = Helicase superfamily c-terminal domain; associated with DEXDc-, DEAD-, and DEAH-box proteins, yeast initiation factor 4A, Ski2p, and Hepatitis C virus NS3 helicases; this domain is found in a wide variety of helicases and helicase related proteins; may not be an autonomously folding unit, but an integral part of the helicase; 4 helicase superfamilies at present according to the organization of their signature motifs; all helicases share the ability to unwind nucleic acid duplexes with a distinct directional polarity; they utilize the free energy from nucleoside triphosphate hydrolysis to fuel their translocation along DNA, unwinding the duplex in the process.
ORTHOMCL19 = DDE superfamily endonuclease
This family of proteins are related to pfam00665 and are probably endonucleases of the DDE superfamily. Transposase proteins are necessary for efficient DNA transposition. This domain is a member of the DDE superfamily, which contain three carboxylate residues that are believed to be responsible for coordinating metal ions needed for catalysis. The catalytic activity of this enzyme involves DNA cleavage at a specific site followed by a strand transfer reaction.

Run orthoMCL 2.0

orthomclInstallSchema /opt/orthomclSoftware-v2.0.3/doc/OrthoMCLEngine/Main/orthomcl.config

mkdir orthoMCL

mkdir orthoMCL/compliantFasta

orthomclAdjustFasta Hv ../../RNASeqCufflinksThirdHybridAssemblyCLC_saccharomyces_cerevisiae_S288C.augustus_up70_NoTransposons.ids.fasta 1

orthomclFilterFasta compliantFasta/ 10 20

sed ‘s/Hv|/Hv_/g’ goodProteins.fasta > tmpgoodproteins.fasta

mv tmpgoodproteins.fasta goodProteins.fasta

formatdb -i goodProteins.fasta -p T -o T

blastall -p blastp -i goodProteins.fasta -d goodProteins.fasta -e 1e-5 -F ‘m S’ -v 10000 -b 10000 -z 18234 -m 8 -o

sed ‘s/Hv_/Hv|/g’ >

orthomclBlastParser compliantFasta/ > similarSequences.txt

orthomclLoadBlast /opt/orthomclSoftware-v2.0.3/doc/OrthoMCLEngine/Main/orthomcl.config similarSequences.txt

orthomclPairs /opt/orthomclSoftware-v2.0.3/doc/OrthoMCLEngine/Main/orthomcl.config orthomcl_pairs.log cleanup=no

orthomclDumpPairsFiles /opt/orthomclSoftware-v2.0.3/doc/OrthoMCLEngine/Main/orthomcl.config

mcl mclInput —abc -I 1.5 -o mclOutput

orthomclMclToGroups hvas 10000 < mclOutput > groups.txt

n=9999; while read line; do echo -n “hvas$((n=$((n + 1)))) “; echo “$line” | tr -cd “H” | wc -c ; done < groups.txt > groups.txt.count