Description of DroID - the Drosophila Interactions Database (Data version 2014_10 updated 2 October 2014)
DroID assembles gene
or protein interaction data from a variety of sources
into one location. Drosophila interactome data in DroID can be accessed and downloaded at the DroID home page, http://www.droidb.org. The data also can be searched, integrated,
graphed, and downloaded using IM Browser or the DroID Cystoscape plugin.
DroID is updated periodically. The current version is described in this document. Previous versions are described and available for downlaoding on the version history page.
All Flybase gene IDs have been updated to Flybase release FB2014_05. As in previous versions, FBgn's were removed if they were ambiguous. If an old FBgn split into two new primary FBgns, interaction records involving it were deleted. Because of this it is possible for some data sets to have fewer interactions (or genes) than the previous version of DroID. Refer to Flybase Document for more information about primary and secondary FBgn's.
|Data set||Number of interactions||Number of genes|
|PPI: Curagen yeast two-hybrid||
|PPI: Finley Lab yeast two-hybrid||
|PPI: Hybrigenics yeast two-hybrid||
|PPI: Perrimon Lab co-AP/MS||
|PPI: DPIM co-AP/MS||61262||4981|
|PPI: From other databases*||
|PPI: Human interologs**||
|PPI: Yeast interologs||
|PPI: Worm interologs||
|PDI: Transcription Factor-gene||
|GI: Genetic Interactions - Flybase||
* Protein-protein interactions from all other major databases
** As of DroID v2014_10, interactions from Reactome are no longer used to calculate human interologs.
Total number of unique interactions: 513,335
Total number of genes: 15,148
Below is a brief description of the various data sets. Definitions of the fields in each data set can be found here.
Normalized gene expression values - the Percent Max or pmax scale
As of DroID v2014_01, DroID includes normalized gene expression values that can be used to filter interaction data, as described in Murali et al. 2014 (PMID: 24913703). Based on FlyAtlas gene expression data, a gene is expressed in a particular tissue at some percentage of its maximal level across all tissues. In DroID, each gene has a 'percent maximum' or pmax value for each tissue. Likewise, based on modENCODE RNA-seq data, a gene is expressed at a particular developmental time point at some percentage of its maximal level across all time points. Thus, in DroID each gene also has a pmax value for each developmental time point. Filtering interactome data using the pmax values can reveal gene and protein networks that are likely to operate in specific tissues or developmental time points.
Gene expression correlation values
As of DroID v2013_02, gene expresion correlation values are calculated based on developmental expresison profiles from modENCODE and tissue expression profiles from FlyAtlas. Both expression data sets can also be used to filter interaction data (see below). The Weighted Correlation valus for each gene pair provides a measure of how fequently the two genes are expressed together across tissues or develomental time. The Weighted Correlation values can be displayed and used to filter lists of gene or protein interactions.
Gene expression data from modENCODE and FlyAtlas
First added to DroID v2011_08. Interaction data can be filtered using the gene expression data from the modENCODE project (PMID: 21179090) or from FlyAtlas (PMID: 17534367). The modENCODE data are from RNA-seq for 30 developmental time points from early embryos to adult males and females. Expression levels are represented as "fragments (sequenced) per kilobase of transcript per million fragments mapped" or FKPM. Interaction data can be filter at different FKPM values for each developmental time point by selecting the "Life Cycle Expression" link at the top of a list of interactions. More information at modENCODE.org. FlyAtlas data are from DNA microarray-based gene expression profiling of ~25 tissues. More information at flyatlas.org.
Transcription Factor (TF) - Gene Interactions
First added to DroID v2010_08. This table contains interactions between TFs and specific genes that they may regulate. The current release includes high quality, curated TF-gene interactions from the REDFly database (PMID: 18039705) and experimentally determined TF-gene interactions from the modENCODE project (PMID: 21177974). These are a subclass of protein-DNA interactions (PDI) for which there is experimental evidence that the TF binds to the gene and regulates its transcription. The modENCODE interactions have been inferred from the binding profiles of TFs in the genomic regions of the target genes using genome-wide location analysis (ChIP-chip and ChIP-seq). Links to REDFly and modENCODE and the original literature citations are included in the data.
microRNA - Gene Interactions
First added to DroID v2010_08. This table includes putative regulatory interactions between miroRNAs or miRs and their target genes. Since miRs regulate their targets by base pairing with target RNA, in molecular interaction terms these are RNA-RNA interactions (RRI). The miR-Gene interactions currently in DroID are from MinoTar and the modENCODE project. These are interactions predicted by TargetScanFly based on base complementarity (PMID: 15652477; PMID: 17989254) in the 3’ UTRs of target genes, , MinoTar interactions are predicted conserved miR targeting within the protein-coding regions (PMID: 20729470) and the modENCODE interactions (PMID: 21177974) are predicted based on the genome-wide occurrence of evolutionarily conserved of miR seed motifs.
Phenotype and Gene Expression terms
First added to DroID v2010_08. Genes can be searched and interaction data can be filtered based on gene expression and phenotype terms from Flybase controlled vocabularies (CV). To search for genes on the DroID Home/Search page click on the "Phenotype" or "Gene Expression" check box and enter a term, such as "female sterile" or "eye disc". Similalry, these terms can be used to filter interactions from a list of interactions page. Phenotype and Gene Expression data are from Flybase. For a small set of genes the phenotype terms were modified from Flybase to enable efficient searching and filtering. These are: FBgn0004635 FBgn0003731 FBgn0011674 FBgn0000492 FBgn0000014 FBgn0003205 FBgn0000490 FBgn0004644 FBgn0003944 FBgn0000463 FBgn0004647 FBgn0002973 FBgn0004009.
Genetic interactions - from Flybase
Gene-gene interactions downloaded from Flybase. These represent interactions between two gene alleles. For example, an allele of one gene may enhance or suppress the phenotype of an allele in another gene. Alternatively, the combination of two alleles may result in a "synthetic" phenotype not observed for either of the individual alleles.
DroID includes protein-protein (PPI) and protein-DNA (PDI; i.e., transcription factor-gene) interactions. Although a gene may encode multiple proteins, the methods used to detect PPI and PDI rarely record which protein variant from a gene was used. Thus, interactionsinvolving proteins are represented by pairs of genes. The precise way to interpret a protein interaction represented as "gene 1 - gene 2" is that one or more proteins encoded by gene 1 interact with one or more proteins encoded by gene 2. The gene identifiers used in this database are Flybase Gene Numbers, FBgn.
Protein-protein interactions from other databases - these are experimentally derived physical interactions other than those from the major high throughput datasets listed separately below. These interactions are collected from other large databases ( BioGRID, IntAct, MINT, and BIND) at each refresh of DroID. As of DroID v2014_10, interactions curated by MINT are obtained from IntAct and not from MINT driectly. The orginal database source and information is available for each interaction. This includes links to original publications for each interaction.
Perrimon coAP complex - Protein interactions determined in large-scale co-affinity purification (co-AP)/MS screens in the Perrimon Lab. The co-complex data was converted to binary interactions using the hub-spoke model, where baits are predicted to interact with each of the co-purified proteins.
Perrimon CoAP complex data - 11/23/2011. Data from Friedman et al., 2011. Includes 384 interactions among 252 proteins determined using 15 canonical components of RTK/Ras/ERK pathways as baits. C-terminally TAP-tagged bait proteins were expresed in stably transfected S2R+ cells at baseline or stimulated with either insulin or EEGF, complexes were affinity purified, and associatd proteins were determined by LC-MS/MS. The data includes dataset-specific confidence scores (SAINT scores). This dataset includes interactions above the author-determined SAINT score cutoff of 0.83 and FDR of 7.2%. Friedman et al., correlated this interactome data with RNAi screens designed to detect genes required for EGF- or insulin-stimulated ERK activation. The interaction data can be searched or filtered using the RNAi data with IM Browser. Friedman et al., Science Signaling 25 October 2011 (PMID: 22028469).
DPIM coAP complex - Protein interactions determined in large-scale co-affinity purification (co-AP)/MS screens by the Drosophila Protein Interaction Mapping (DPIM) project, a collaboration among the laboratories of Spyros Artavanis-Tsakonas, Steven Gygi , Susan Celniker, and K. Vijay Raghavan (DPIM web site). The co-complex data was converted to binary interactions using the hub-spoke model, where baits are predicted to interact with each of the co-purified proteins.
DPIM coAP complex data - 11/16/2011. Data from Guruharsha et al., 2011 aquired from DPIM 11/16/2011. Includes 61586 interactions among 4996 proteins. C-terminally FLAG-HA-tagged bait proteins were expresed in transiently transfected S2R+ cells, the bait and associated proteins were immunoaffinity putified with anti-HA resin, and proteins were identified by LC-MS/MS. 4,273 full-length bait clones were used, 3,488 of which resulted in successful puriifications. This dataset includes all interactions involving proteins that interacted with fewer than 10% (~320) of all bait proteins. The data includes dataset-specific confidence scores (HGSCores). Guruharsha et al., Cell 28 October 2011 (PMID: 22036573).
Finley YTH - Includes protein interaction data generated in the Finley laboratory using the LexA yeast two-hybrid system, mostly from high throughput screens. The project is described here and is ongoing. Data versions are as follows.
Finley YTH v1.0 - 08/01/2004 - 423 interactions detected in a pilot screen using randomly selected Drosophila "bait" BD proteins. A list of the BD proteins used is here. (Zhong, Patel, Zhang, Mangiola, Stanyon, Finley, unpublished).
Finley YTH v2.5 - 12/10/2004 - Added 1,814 interactions detected in screens with 152 proteins related to cell cycle regulators. This data is described in Stanyon et al., 2004, Genome Biology, 5(12):R96. (PMID: 15575970)
Finley YTH v2.6 - 2/16/2007 - Secondary FBgn's mapped to primary FBgn's. Ambiguous FBgn's removed.
Finley YTH v3.0 - 7/2/2008 - Added results from a Y2H screen that tested computationally predicted protein-protein interactions. Described in Schwatrz et al 2009 (PMID: 19079254). Two different types of predictions were tested, distinguished by data in the SCREEN field. Either "Test of combined evidence predictions (JY) 9_2006" or "Test of conservation-based predictions from Sharan 2005 PMID:1568750". There was also a number of random pairs tested and positive, indicated by "Test of random pairs 9_2006".
Finley YTH v4.0 - 9/18/2010 - Added results from two ongoing projects, including ~4,000 interactions detected in a genome-wide screen using baits with no previously detected YTH interacitons ("Untouched Proteome 2010" screen) and ~2000 interactions detected in tests of interactions originally reportered using the Gal4 system by Hybrigenics or Curagen (see below) ("Gal4 retests in LexA system" screen). These two datasets are unpublished. When using them please cite the DroID web site.
Curagen YTH - Protein interactions detected in a high throughput yeast two-hybrid screen conducted at Curagen (New Haven, CT) in collaboration with the Finley lab. All of the interactions were assigned dataset-specific confidence scores, with roughly one quarter of them falling into the high confidence set (scores >0.5). This data was described in Giot et al., 2003, Science 203, 1727-1736. PMID: 14605208
Hybrigenics YTH - Protein interactions
detected in high throughput yeast two-hybrid screens
conducted at Hybrigenics (Paris, France). They
used 102 bait proteins to detect >2,300
interactions, and assigned 710 of these to a high
confidence group. This data was described in
Formstecher et al., 2005, Genome Research 15, 376-384.
PMID: 15710747. Hybrigenics provides interaction data based on internal coding sequence ids, some of which could not be mapped to protein coding FBgns.
Predicted interactions between Drosophila proteins based on experimental evidence for interactions between orthologous proteins in other species. At each refresh of DroID we collect interactions for yeast, worm, and human from online interaction databases (noted below). Proteins for each species are mapped to Fly orthologs using InParanoid, which is an orthology mapping algorithm. The dates that original data was downloaded are noted in each table.
Yeast Interologs - Yeast interactions were downloaded from BioGRID, IntAct, and MINT. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original yeast interaction and the associated PubMed IDs.
Worm Interologs - Worm interactions were downloaded from BioGRID, IntAct, and MINT. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original worm interaction and the associated PubMed IDs.
Human Interologs - Human interactions were downloaded from BioGRID, HPRD, IntAct, MINT, and Reactome. As of DroID v2014_10, interactios from Reactome are no longer used to calculate human interologs. The integrated interaction set was then mapped to Fly interologs using InParanoid, see above. For each interolog, IM Browser lists the source databases containing the original human interaction and the associated PubMed IDs.