Description of DroID - the Drosophila Interactions Database (Version 4.0)
The Drosophila Interactions Database (DroID) assembles gene or protein interaction data from a variety of sources into one location. All of the data in DroID can be accessed and downloaded in part or whole at the DroID home page, http://www.droidb.org using an easy to use web interface. The data also can be searched, integrated, graphed, and downloaded using IM Browser
This database currently includes gene-gene and
protein-protein interactions. Although a gene may encode
multiple proteins, the methods used to detect protein
interactions rarely record which protein variant from a
gene was used. Thus, protein interactions are
represented here by pairs of genes. The precise way to
interpret a protein interaction represented as "gene 1 -
gene 2" is that one or more proteins encoded by gene 1
interact with one or more proteins encoded by gene 2.
The gene identifiers used in this database are Flybase
Gene Numbers, FBgn.
We updated Drosophila Interactions Database on
Februray 16, 2007. In this version, we took efforts to
make sure that every Flybase gene ID (FBgn) used in the
database is a primary FBgn at the time of updating, and
to remove FBgn's that are possibly ambiguous. If an old
FBgn split into two new primary FBgns, we deleted
records involving it. Refer to Flybase Document for more information
about primary and secondary FBgn's.
Below is a brief description of the various data sets currently available and how they are assembled. Definitions of the fields in each data set can be found further below.
Protein-protein interactions
Finley YTH - Includes protein interaction
data generated in the Finley
laboratory using the LexA yeast two-hybrid system,
mostly from high throughput screens. The project is
described here.
Data versions are as follows.
Finley YTH v1.0 - 08/01/2004 - 423 interactions
detected in a pilot screen using randomly selected
Drosophila "bait" BD proteins. A list of the BD
proteins used is here.
(Zhong, Patel, Zhang, Mangiola, Stanyon, Finley,
unpublished).
Finley YTH v2.5 - 12/10/2004 - Added 1,814
interactions detected in screens with 152 proteins
related to cell cycle regulators. This data is
described in Stanyon et al., 2004, Genome Biology,
5(12):R96. PMID: 15575970
Finley YTH v2.6 - 2/16/2007 - Secondary FBgn's
mapped to primary FBgn's. Ambiguous FBgn's removed.
Curagen YTH - Protein interactions detected in a high throughput yeast two-hybrid screen conducted at Curagen (New Haven, CT). The current version (V2.0) contains 20,248 interactions involving 6,919 proteins, or nearly half of the proteome. All of the interactions were assigned confidence scores, with roughly one quarter of them falling into the high confidence set (scores >0.5). This data was described in Giot et al., 2003, Science 203, 1727-1736. PMID: 14605208
Hybrigenics YTH - Protein interactions detected in high throughput yeast two-hybrid screens conducted at Hybrigenics (Paris, France). They used 102 bait proteins to detect >2,300 interactions, and assigned 710 of these to a high confidence group. This data was described in Formstecher et al., 2005, Genome Research 15, 376-384. PMID: 15710747.
Other physical protein-protein interactions - these are experimentally derived physical interactions other than those from the three major YTH datasets above. These interacitons are collected from the large databases ( BioGRID, DIP, IntAct, MINT). The orginal database source and information is available for each interaction.
Genetic interactions
Genetic Interactions - Includes gene-gene interactions downloaded from Flybase. These represent interactions between two gene alleles. For example, an allele of one gene may enhance or suppress the phenotype of an allele in another gene. Alternatively, the combination of two alleles may result in a "synthetic" phenotype not observed for either of the individual alleles.
Interolog data
Predicted interactions between Drosophila proteins based on experimental evidence for interactions between orthologers protein in other species. We collected and integrated interactions for yeast, worm, and human from online interaction databases. Proteins in the obtained interaction sets were then mapped to Fly orthologs using InParanoid (version 5.1, January 2007), which is an orthology mapping algorithm. The dates that original data was downloaded are noted in each table.
Yeast Interologs - Yeast interactions were downloaded from BOND, BioGRID, DIP, IntAct, MINT, and MIPS. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original yeast interaction and the associated PubMed IDs.
Worm Interologs - Worm interactions were downloaded from BOND, BioGRID, DIP, IntAct, and MINT. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original worm interaction and the associated PubMed IDs.
Human Interologs - Human interactions were downloaded from BOND, BioGRID, DIP, HPRD, IntAct, MINT, and PDZBase. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original human interaction and the associated PubMed IDs.
Table Definitions
The Drosophila Interactions Database contains two types of tables. Most tables store interaction data; there is one table which stores Drosophila gene attribute data. Table column names (used in downloaded text files), their short descriptive names (used in IM Browser when right clicking an interaction and choosing 'Edge attributes'), and their explanations are provided below for reference purpose.
Finley Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - GENE1 was fused to a DNA Binding Domain.
- FBGN_GENE2_AD (FBgn AD) - GENE2 was fused to an Activation Domain.
- GENE1_INTERACTIONS_AS_BD (Interactions as BD) - Number of interactions in which GENE1 was fused to a DNA Binding Domain.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - Number of interactions involving GENE1.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - Number of interactions in which GENE2 was fused to an Activation Domain.
- GENE2_INTERACITONS_TOTAL (Total Interactions for AD) - Number of interactions involving GENE2.
- SCREEN (Screen) - Original interaction screen. Note that interactions may have been detected in multiple screens, but only the original is listed. The number of total times the interaction was detected is given in "IST_RFCS" and "MATRIX_DETECTIONS".
- REFERENCE (Reference) - Literature reference (PubMed ID) for this data set.
- DATE1 (Date) - Date this data was first published.
- C_LEU (C_LEU) - Integer numbers representing leu2 reporter gene signal strength. The LEU2 reporter activity was scored by growth in the absence of leucine, on a 0-3 scale. The background activity due to the BD alone was subtracted.
- C_LACZ (C_LACZ) - Integer numbers representing lacZ reporter gene signal strength. LacZ reporter activity was scored as the level of blue colors on X-Gal plates on a scale of 0 (white) to 5 (dark blue). The background activity due to the BD alone was subtracted
- C_SUM (Strength) - C_SUM is a sum of the leu and lacZ scores after the background has been subtracted, which is taken as an indicator of overall two-hybrid reporter activity (range 0-8).
- MATRIX (Matrix) - Interaction screening method in which the final interaction is verified in a one-on-one mating. Indicates whether this interaction was detected in a matrix screen (Yes or No). A single detection is reproducible, and thus the number of times detected is not particularly relevant. Occasionally, the same interaction was detected in a different screen, in which case the number of Matrix_Detections is greater than 2.
- IST (IST) - Interaction Sequence Tag. Indicates whether this interaction was detected in a library screen (Yes or No). In such a screen, after mating a BD strain with the AD library, individual yeast clones are selected based on reporter expression and the interacting AD fusion is sequenced. The same AD fusion can be identified several times; multiple ISTs for a given interaction are less likely to represent a false positive than a single IST.
- MATRIX_DETECTIONS (Matrix Detections) - The number of times this interaction was detected in one-on-one "matrix" assays.
- ISTS_RFCS (ISTS_RFCS) - This is essentially the total number of AD clones that were identified for the particular interaction. It is the sum of the number of ISTs and clones with identical restriction fragment class (RFC) as the IST clone.
- DATA_VERSION (Version) - Version of current data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Curagen Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - see definition for Finley Yeast Two Hybrid Data.
- FBGN_GENE2_AD (FBgn AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTEARCTIONS_AS_BD (Interactions as BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_TOTAL (Total Interactions for AD) - see definition for Finley Yeast Two Hybrid Data.
- SCREEN (Screen) - see definition for Finley Yeast Two Hybrid Data.
- REFERENCE (Reference) - see definition for Finley Yeast Two Hybrid Data.
- DATE1 (Date) - see definition for Finley Yeast Two Hybrid Data.
- CDNA (CDNA) - Indicates whether the interaction was detected by screening a cDNA library. Curagen conducted library screens by mating bait strains either with a cDNA library or a pool of ~11,00 individually cloned full length ORFs referred to as the "collection".
- COLLECTION (Collection) - Indicates whether the interaction was detected by screening the cololection of full-length clones (see above).
- HEXPERT (HEXPERT) - Indicates whether the interaction was part of a training set generated by human experts. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- YEXPERT (YEXPERT) - Indicates whether the interaction was part of a training set generated by a bioinformatics approach. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- CEXPERT (CEXPERT) - Indicates whether the interaction was part a training set generated by combing the human and bioinformatics sets. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- CURAGEN_CONFIDENCE (Curagen Confidence) - Confidence score (0-1) generated using a statistical model that determined attributes that correlate with the likelihood of being in the true or false positive training set. The dividing line between high confidence and low confidence interactions was set to 0.5.
- ISTS_RFCS (ISTS_RFCS) - see definition for Finley Yeast Two Hybrid Data.
- DATA_VERSION (Version) - Version of curent data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Hybrigenics Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - see definition for Finley Yeast Two Hybrid Data.
- FBGN_GENE2_AD (FBgn AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTEARCTIONS_AS_BD (Interactions as BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_TOTAL (Total Interactions for AD) - see definition for Finley Yeast Two Hybrid Data.
- SCREEN (Screen) - see definition for Finley Yeast Two Hybrid Data.
- REFERENCE (Reference) - see definition for Finley Yeast Two Hybrid Data.
- DATE1 (Date) - see definition for Finley Yeast Two Hybrid Data.
- IST (IST) - see definition for Finley Yeast Two Hybrid Data.
- ISTS_RFCS (ISTS_RFCS) - see definition for Finley Yeast Two Hybrid Data.
- DATA_VERSION (Version) - Version of current data.
- PMID (PM ID) - PubMed ID of the article describing this data.
- URL_PUBMED (URL PubMed) - Web link to the article.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Genetic Interactions
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene .
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene.
- REFERENCE (Reference) - Flybase reference ids associated with this interaction.
- DATA_VERSION (Version) - Version of curent data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Yeast Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- YEAST_ORF1 (Yeast ORF1) - Yeast ORF corresponding to the first fly gene.
- YEAST_ORF2 (Yeast ORF2) - Yeast ORF corresponding to the second fly gene.
- YEAST_UNIPROT1 (Yeast uniprot_1) - Yeast UniProtKB accession number corresponding to the first fly gene.
- YEAST_UNIPROT2 (Yeast uniprot_2) - Yeast UniProtKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - Orthology mapping method. InParanoid version 5.1, January 2007 was used to build tables in the current database.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - PubMed IDs of articles discussing this interaction.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - Methods employed in detecing this interaction.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - Names of source databases containing this interaction, including date data was downloaded from the source databases.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_YEAST_ORF1_SCORE (Ortholog Yeast ORF_1 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_YEAST_ORF1_SCORE (Ortholog Yeast ORF_2 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
Worm Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- WORM_GENE1 (Worm gene_1) - Worm gene id corresponding to the first fly gene.
- WORM_GENE2 (Worm gene_2) - Worm gene idcorresponding to the second fly gene.
- WORM_PROTEIN1_UNIPROT (Worm uniprot_1) - Worm UniProtKB accession number corresponding to the first fly gene.
- WORM_PROTEIN2_UNIPROT (Worm uniprot_2) - UniProtKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - See definition for Yeast Interologs.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - See definition for Yeast Interologs.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - See definition for Yeast Interologs.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_WORM_GENE1_SCORE (Ortholog Worm gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - See definition for Yeast Interologs.
- ORTHOLOG_WORM_GENE2_SCORE (Ortholog Worm gene2 score) - See definition for Yeast Interologs.
Human Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- HUMAN_PROTEIN1_ENSEMBL (Human Ensembl ID_1) - Human protein Ensembl ID corresponding to the first fly gene.
- HUMAN_PROTEIN2_ENSEMBL (Human Ensembl ID_2) - Human protein Ensembl ID corresponding to the second fly gene.
- HUMAN_PROTEIN1_UNIPROT (Human Uniprot AC_1) - Human protein UniprotKB accession number corresponding to the first fly gene.
- HUMAN_PROTEIN2_UNIPROT (Human Uniprot AC_2) - Human protein UniprotKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - See definition for Yeast Interologs.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - See definition for Yeast Interologs.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - See definition for Yeast Interologs.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_HUMAN_PROTEIN1_SCORE (Ortholog Human protein1 score) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - See definition for Yeast Interologs.
- ORTHOLOG_HUMAN_PROTEIN2_SCORE (Ortholog Human protein2 score) - See definition for Yeast Interologs.
Gene Attributes
- FLY_GENE (Primary FlyBase ID) - Primary FlyBase FBgn.
- SYMBOL (Symbol) - Gene symbol.
- FULL_NAME (Full Name) - Full name of the gene.
- URL (URL) - Web link to FlyBase page describing this gene.
- SECONDARY_FBGNS (Secondary FlyBase ID) - Secondary FlyBase FBgns associated with the primary FlyBase FBgns.
- GENE_CLASS (Class of Gene) - Class of the gene.
- GO_MOLECULAR_FUNCTION (GO Molecular Function) - Gene Ontology (GO) Molecular Function annotations. It was formatted as GO_id(GO_evidence)===GO_term,GO_id(GO_evidence)===GO_term... .
- GO_BIOLOGICAL_PROCESS (GO Biological Process) - GO Biological Function annotations,same format as Molecular Functions.
- GO_CELLULAR_PROCESS (GO Cellular Component) - GO Cellular Component annotations, same format as Molecular Functions.
- SYNONYMS (Synonyms) - Synonyms of the gene.
- PROTEIN_DOMAINS (Protein Domains) - Protein domain annotations obtained from Interpro.
- CG_SYMBOLS (CG Symbol) - CG symbols associated with this gene.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.