Description of DroID - the Drosophila Interactions Database (Version 5.0)
The Drosophila Interactions Database (DroID) assembles gene or protein interaction data from a variety of sources into one location. All of the data in DroID can be accessed and downloaded in part or whole at the DroID home page, http://www.droidb.org. The data also can be searched, integrated, graphed, and downloaded using IM Browser.
This database currently includes gene-gene and
protein-protein interactions (more interaciton types are coming soon). Although a gene may encode
multiple proteins, the methods used to detect protein
interactions rarely record which protein variant from a
gene was used. Thus, protein interactions are
represented here by pairs of genes. The precise way to
interpret a protein interaction represented as "gene 1 -
gene 2" is that one or more proteins encoded by gene 1
interact with one or more proteins encoded by gene 2.
The gene identifiers used in this database are Flybase Gene Numbers, FBgn.
DroID is updated periodically. The current verion is described in this document. Previous versions are described on the version history page and are available to download.
Version 5.0 - We updated DroID on 17 July, 2009. Perrimon co-AP data was added in September 2009. As in the previous version, we took efforts to make sure that every Flybase gene ID (FBgn) used in the database is a protein coding gene (according to Flybase) at the time of updating, and to remove FBgn's that are possibly ambiguous. If an old FBgn split into two new primary FBgns, we deleted records involving it. Because of this it is possible for some data sets to have fewer interactions (or genes) than the previous version of DroID. Refer to Flybase Document for more information about primary and secondary FBgn's.
| Data set | Number of interactions | Number of genes |
|---|---|---|
| Curagen yeast two-hybrid | 20050 |
6833 |
| Finley Lab yeast two-hybrid | 3161 |
1338 |
| Hybrigenics yeast two-hybrid | 1850 |
1276 |
| Perrimon Lab co-AP/MS | 5,031 |
1,194 |
| Other physical interactions | 1062 |
713 |
| Human interologs | 42174 |
3943 |
| Yeast interologs | 69985 |
2635 |
| Worm interologs | 2740 |
1504 |
| Genetic interactions | 5385 |
1660 |
Total number of interactions: 144,352
Total number of genes: 9,524
Below is a brief description of the various data sets. Definitions of the fields in each data set can be found further below.
Protein-protein interactions
Finley YTH - Includes protein interaction data generated in the Finley laboratory using the LexA yeast two-hybrid system, mostly from high throughput screens. The project is described here and is ongoing. Data versions are as follows.
Finley YTH v1.0 - 08/01/2004 - 423 interactions detected in a pilot screen using randomly selected Drosophila "bait" BD proteins. A list of the BD proteins used is here. (Zhong, Patel, Zhang, Mangiola, Stanyon, Finley, unpublished).
Finley YTH v2.5 - 12/10/2004 - Added 1,814 interactions detected in screens with 152 proteins related to cell cycle regulators. This data is described in Stanyon et al., 2004, Genome Biology, 5(12):R96. PMID: 15575970
Finley YTH v2.6 - 2/16/2007 - Secondary FBgn's mapped to primary FBgn's. Ambiguous FBgn's removed.
Finley YTH v3.0 - 7/2/2008 - Added results from a Y2H screen that tested computationally predicted protein-protein interactions. Described in Schwatrz et al 2009 (PMID: 19079254). Two different types of predictions were tested, distinguished by data in the SCREEN field. Either "Test of combined evidence predictions (JY) 9_2006" or "Test of conservation-based predictions from Sharan 2005 PMID:1568750". There were also a number of random pairs tested and positive, indicated by "Test of random pairs 9_2006".
Curagen YTH - Protein interactions detected in a high throughput yeast two-hybrid screen conducted at Curagen (New Haven, CT). The current version (V3.0) contains 20,182 interactions involving 6,875 proteins, or nearly half of the proteome. All of the interactions were assigned confidence scores, with roughly one quarter of them falling into the high confidence set (scores >0.5). This data was described in Giot et al., 2003, Science 203, 1727-1736. PMID: 14605208
Hybrigenics YTH - Protein interactions detected in high throughput yeast two-hybrid screens conducted at Hybrigenics (Paris, France). They used 102 bait proteins to detect >2,300 interactions, and assigned 710 of these to a high confidence group. This data was described in Formstecher et al., 2005, Genome Research 15, 376-384. PMID: 15710747. Hybrigenics provides interaction data based on internal coding sequence ids, some of which could not be mapped to protein coding FBgns.
Perrimon coAP - (This data will be available soon). Protein interactions determined in large-scale co-affinity purification (co-AP)/MS screens in the Perrimon Lab. C-terminally TAP-tagged proteins were expresed in stably transfected S2R+ cells, complexes were affinity purified, and associatd proteins were determined by LC/MS/MS. Detailed field descriptions are below.
Perrimon CoAP v1.0 - 7/1/2009 - 5031 interactions among 1194 proteins determined using 15 canonical components of RTK/Ras/ERK pathways at baits. Binary interactions were determined using the hub and spoke model. The data includes dataset-specific confidence scores (PPI Index). 710 interactions felll into the high confidence set. Friedman, Perrimon, et al., submitted. PMID: pending
Other physical protein-protein interactions - these are experimentally derived physical interactions other than those from the three major YTH datasets above. These interacitons are collected from the large databases ( BioGRID, IntAct, MINT) at each refresh of DroID. The orginal database source and information is available for each interaction. This includes links to original publications for each interaction.
Genetic interactions
Genetic Interactions - Includes gene-gene interactions downloaded from Flybase. These represent interactions between two gene alleles. For example, an allele of one gene may enhance or suppress the phenotype of an allele in another gene. Alternatively, the combination of two alleles may result in a "synthetic" phenotype not observed for either of the individual alleles.
Interolog data
Predicted interactions between Drosophila proteins based on experimental evidence for interactions between orthologous proteins in other species. At each refresh of DroID we collect interactions for yeast, worm, and human from online interaction databases (noted below). Proteins for each species are mapped to Fly orthologs using InParanoid, which is an orthology mapping algorithm. The dates that original data was downloaded are noted in each table.
Yeast Interologs - Yeast interactions were downloaded from BioGRID, IntAct, MINT, and MIPS. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original yeast interaction and the associated PubMed IDs.
Worm Interologs - Worm interactions were downloaded from BioGRID, IntAct, and MINT. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original worm interaction and the associated PubMed IDs.
Human Interologs - Human interactions were downloaded from BioGRID, HPRD, IntAct, MINT, Reactome and PDZBase. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original human interaction and the associated PubMed IDs.
Table Definitions
The Drosophila Interactions Database contains two types of tables. Most tables store interaction data; other tables store Drosophila gene attribute data. Table column names (used in downloaded text files), their short descriptive names (used in IM Browser when right clicking an interaction and choosing 'Edge attributes'), and their explanations are provided below for reference purpose.
Finley Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - GENE1 was fused to a DNA Binding Domain.
- FBGN_GENE2_AD (FBgn AD) - GENE2 was fused to an Activation Domain.
- GENE1_INTERACTIONS_AS_BD (Interactions as BD) - Number of interactions in which GENE1 was fused to a DNA Binding Domain.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - Number of interactions involving GENE1.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - Number of interactions in which GENE2 was fused to an Activation Domain.
- GENE2_INTERACITONS_TOTAL (Total Interactions for AD) - Number of interactions involving GENE2.
- SCREEN (Screen) - Original interaction screen. Note that interactions may have been detected in multiple screens, but only the original is listed. The number of total times the interaction was detected from all screens is given in "IST_RFCS" and "MATRIX_DETECTIONS".
- REFERENCE (Reference) - Literature reference (PubMed ID) for this data set.
- DATE1 (Date) - Date this data was first published.
- C_LEU (C_LEU) - Integer numbers representing leu2 reporter gene signal strength. The LEU2 reporter activity was scored by growth in the absence of leucine, on a 0-3 scale. The background activity due to the BD alone was subtracted.
- C_LACZ (C_LACZ) - Integer numbers representing lacZ reporter gene signal strength. LacZ reporter activity was scored as the level of blue colors on X-Gal plates on a scale of 0 (white) to 5 (dark blue). The background activity due to the BD alone was subtracted
- C_SUM (Strength) - C_SUM is a sum of the leu and lacZ scores after the background has been subtracted, which is taken as an indicator of overall two-hybrid reporter activity (range 0-8).
- MATRIX (Matrix) - Interaction screening method in which the final interaction is verified in a one-on-one mating. Indicates whether this interaction was detected in a matrix screen (Yes or No). A single detection is reproducible, and thus the number of times detected is not particularly relevant. Occasionally, the same interaction was detected in a different screen, in which case the number of Matrix_Detections is greater than 2.
- IST (IST) - Interaction Sequence Tag. Indicates whether this interaction was detected in a library screen (Yes or No). In such a screen, after mating a BD strain with the AD library, individual yeast clones are selected based on reporter expression and the interacting AD fusion is sequenced. The same AD fusion can be identified several times; multiple ISTs for a given interaction are less likely to represent a false positive than a single IST.
- MATRIX_DETECTIONS (Matrix Detections) - The number of times this interaction was detected in one-on-one "matrix" assays.
- ISTS_RFCS (ISTS_RFCS) - This is essentially the total number of AD clones that were identified for the particular interaction. It is the sum of the number of ISTs and clones with identical restriction fragment class (RFC) as the IST clone.
- DATA_VERSION (Version) - Version of current data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Curagen Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - see definition for Finley Yeast Two Hybrid Data.
- FBGN_GENE2_AD (FBgn AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTEARCTIONS_AS_BD (Interactions as BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_TOTAL (Total Interactions for AD) - see definition for Finley Yeast Two Hybrid Data.
- SCREEN (Screen) - see definition for Finley Yeast Two Hybrid Data.
- REFERENCE (Reference) - see definition for Finley Yeast Two Hybrid Data.
- DATE1 (Date) - see definition for Finley Yeast Two Hybrid Data.
- CDNA (CDNA) - Indicates whether the interaction was detected by screening a cDNA library. Curagen conducted library screens by mating bait strains either with a cDNA library or a pool of ~11,00 individually cloned full length ORFs referred to as the "collection".
- COLLECTION (Collection) - Indicates whether the interaction was detected by screening the cololection of full-length clones (see above).
- HEXPERT (HEXPERT) - Indicates whether the interaction was part of a training set generated by human experts. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- YEXPERT (YEXPERT) - Indicates whether the interaction was part of a training set generated by a bioinformatics approach. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- CEXPERT (CEXPERT) - Indicates whether the interaction was part a training set generated by combing the human and bioinformatics sets. 1 or 0 indicates whether it was in the true positive or false positive training set, respectively.
- CURAGEN_CONFIDENCE (Curagen Confidence) - Confidence score (0-1) generated using a statistical model that determined attributes that correlate with the likelihood of being in the true or false positive training set. The dividing line between high confidence and low confidence interactions was set to 0.5.
- ISTS_RFCS (ISTS_RFCS) - see definition for Finley Yeast Two Hybrid Data.
- DATA_VERSION (Version) - Version of curent data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Hybrigenics Yeast Two Hybrid Data
- FBGN_GENE1_BD (FBgn BD) - see definition for Finley Yeast Two Hybrid Data.
- FBGN_GENE2_AD (FBgn AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTEARCTIONS_AS_BD (Interactions as BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE1_INTERACTIONS_TOTAL (Total Interactions for BD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_AS_AD (Interactions as AD) - see definition for Finley Yeast Two Hybrid Data.
- GENE2_INTERACTIONS_TOTAL (Total Interactions for AD) - see definition for Finley Yeast Two Hybrid Data.
- SCREEN (Screen) - see definition for Finley Yeast Two Hybrid Data.
- REFERENCE (Reference) - see definition for Finley Yeast Two Hybrid Data.
- DATE1 (Date) - see definition for Finley Yeast Two Hybrid Data.
- IST (IST) - see definition for Finley Yeast Two Hybrid Data.
- ISTS_RFCS (ISTS_RFCS) - see definition for Finley Yeast Two Hybrid Data.
- DATA_VERSION (Version) - Version of current data.
- PMID (PM ID) - PubMed ID of the article describing this data.
- URL_PUBMED (URL PubMed) - Web link to the article.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Perrimon coAP (coming soon)
- FBgn_Bait: Flybase gene ID for the gene encoding the bait protein.
- FBgn_Interactor: Flybase gene ID for the gene encoding the interacting protein.
- Method: method used to detect the interaction; PSI-MI term
- Reference: PubMed ID (PMID) if/when available.
- Screen: name of the screen(s) in which this interaction was detected
- Date: date this table was updated.
- Version: version of this table.
- Conditions: experimental conditions in which this interaction was detected (number of separate samples in which the interaction was detected). "Baseline" – Drosophila S2R+ cells untreated. "Insulin" – Drosophila S2R+ cells after 10 minutes of insulin stimulation. "Spitz/EGF" – Drosophila S2R+ cells after 10 minutes of sSpitz/EGF stimulation
- Samples: total number of experimental samples in which this interaction was detected.
- Baits_total: Number of baits that this interactor interacted with.
- Author Confidence: Author categorized as “low” or “high” confidence. Interactions involving interactors isolated with >6 baits were placed in the “low” confidence set, except for 13 canonical interactors.The remaining interactions were categorized as “low” or “high” based on an arbitrary PPI index threshold of 10% False Discovery Rate. 13 canonical-canonical interactions with interactors interacting with >6 baits were manually placed in the “high” confidence group for clarity.
- PPI_index: Author scoring system. The index was initially calculated based on all interactions using a machine learning algorithm to weight importance of total spectral count, number of samples, and absence in control pulldowns. Interactions involving interactors isolated with >6 baits were then removed and the index was recalculated for and based on the remaining interactions.
- Total_SpectralCount: sum of spectral counts for all experimental conditions.
- Ave_SpectralCount_BaitNorm: sample-averaged spectral count normalized by bait spectra.
- Ave_SpectralCount_Norm: sample-averaged spectral count normalized by sample total spectra.
- Ave_TIC_BaitNorm: sample-averaged average total ion current (TIC) normalized by bait average TIC.
- Control_Samples: number of control samples in which this interactor was detected. A control is a pull-down with the affinity tag only and no bait protein.
- Control_Total_SpectralCount: sum of spectral counts for this interactor in all control pull downs.
- Control_Ave_SpectralCount_Norm: sample-averaged spectral count normalized by sample total spectra for this interactor in control pull downs.
- Ratio_Insulin_to_Baseline: Log (2) transformed ratio of average TIC values in samples from cells treated for 10 min. with insulin over cells not treated (baseline) for this interaction.
- Ratio_sSpitz_to_Baseline: Log (2) transformed ratio of average TIC values in samples from cells treated for 10 min. with sSpitz/EGF over cells not treated (baseline) for this interaction.
- Ratio_sSpitz_to_Insulin: Log (2) transformed ratio of average TIC values in samples from cells treated for 10 min. with sSpitz/EGF over cells treated for 10 min. with insulin for this interaction.
- Dynamic: The interaction was considered dynamic if the log(2) transformed ratio of average TIC values was >1 or <-1 when comparing insulin, sSpitz/EGF, or baseline treatments.
- Dynamic_Condition: Ratios which showed 2-fold change in average TIC. “S10” is sSpitz/EGF treatment; “I10” is insulin treatment; “0” is baseline.
Genetic Interactions
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene .
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene.
- REFERENCE (Reference) - Flybase reference ids associated with this interaction.
- DATA_VERSION (Version) - Version of curent data.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
Yeast Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- YEAST_ORF1 (Yeast ORF1) - Yeast ORF corresponding to the first fly gene.
- YEAST_ORF2 (Yeast ORF2) - Yeast ORF corresponding to the second fly gene.
- YEAST_UNIPROT1 (Yeast uniprot_1) - Yeast UniProtKB accession number corresponding to the first fly gene.
- YEAST_UNIPROT2 (Yeast uniprot_2) - Yeast UniProtKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - Orthology mapping method. InParanoid version 5.1, January 2007 was used to build tables in the current database.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - PubMed IDs of articles discussing this interaction.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - Methods employed in detecing this interaction.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - Names of source databases containing this interaction, including date data was downloaded from the source databases.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_YEAST_ORF1_SCORE (Ortholog Yeast ORF_1 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
- ORTHOLOG_YEAST_ORF1_SCORE (Ortholog Yeast ORF_2 score) - Scores assigned by InParanoid to denote confidence of ortholgy mapping.
Worm Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- WORM_GENE1 (Worm gene_1) - Worm gene id corresponding to the first fly gene.
- WORM_GENE2 (Worm gene_2) - Worm gene idcorresponding to the second fly gene.
- WORM_PROTEIN1_UNIPROT (Worm uniprot_1) - Worm UniProtKB accession number corresponding to the first fly gene.
- WORM_PROTEIN2_UNIPROT (Worm uniprot_2) - UniProtKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - See definition for Yeast Interologs.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - See definition for Yeast Interologs.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - See definition for Yeast Interologs.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_WORM_GENE1_SCORE (Ortholog Worm gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - See definition for Yeast Interologs.
- ORTHOLOG_WORM_GENE2_SCORE (Ortholog Worm gene2 score) - See definition for Yeast Interologs.
Human Interologs
- FLY_GENE1 (FBgn Gene1) - Fly FBgn of the first gene in the predicted interaction.
- FLY_GENE2 (FBgn Gene2) - Fly FBgn of the second gene in the predicted interaction.
- HUMAN_PROTEIN1_ENSEMBL (Human Ensembl ID_1) - Human protein Ensembl ID corresponding to the first fly gene.
- HUMAN_PROTEIN2_ENSEMBL (Human Ensembl ID_2) - Human protein Ensembl ID corresponding to the second fly gene.
- HUMAN_PROTEIN1_UNIPROT (Human Uniprot AC_1) - Human protein UniprotKB accession number corresponding to the first fly gene.
- HUMAN_PROTEIN2_UNIPROT (Human Uniprot AC_2) - Human protein UniprotKB accession number corresponding to the second fly gene.
- ORTHOLOG_METHOD (Ortholog Method) - See definition for Yeast Interologs.
- INTERACTION_PUBMEDS (Interaction Pubmed IDs) - See definition for Yeast Interologs.
- INTERACTION_DETECT_METHODS (Interaction Detection Methods) - See definition for Yeast Interologs.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.
- ORIGINAL_INTERACTION_SOURCE (Original interaction source) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE1_SCORE (Ortholog Fly gene1 score) - See definition for Yeast Interologs.
- ORTHOLOG_HUMAN_PROTEIN1_SCORE (Ortholog Human protein1 score) - See definition for Yeast Interologs.
- ORTHOLOG_FLY_GENE2_SCORE (Ortholog Fly gene2 score) - See definition for Yeast Interologs.
- ORTHOLOG_HUMAN_PROTEIN2_SCORE (Ortholog Human protein2 score) - See definition for Yeast Interologs.
Gene Attributes
- FLY_GENE (Primary FlyBase ID) - Primary FlyBase FBgn.
- SYMBOL (Symbol) - Gene symbol.
- FULL_NAME (Full Name) - Full name of the gene.
- URL (URL) - Web link to FlyBase page describing this gene.
- SECONDARY_FBGNS (Secondary FlyBase ID) - Secondary FlyBase FBgns associated with the primary FlyBase FBgns.
- GENE_CLASS (Class of Gene) - Class of the gene.
- GO_MOLECULAR_FUNCTION (GO Molecular Function) - Gene Ontology (GO) Molecular Function annotations. It was formatted as GO_id(GO_evidence)===GO_term,GO_id(GO_evidence)===GO_term... .
- GO_BIOLOGICAL_PROCESS (GO Biological Process) - GO Biological Function annotations,same format as Molecular Functions.
- GO_CELLULAR_PROCESS (GO Cellular Component) - GO Cellular Component annotations, same format as Molecular Functions.
- SYNONYMS (Synonyms) - Synonyms of the gene.
- PROTEIN_DOMAINS (Protein Domains) - Protein domain annotations obtained from Interpro.
- CG_SYMBOLS (CG Symbol) - CG symbols associated with this gene.
- DATE_LAST_UPDATED (Date last updated) - Date of most recent update.
- DATA_VERSION (Version) - Version of current data.