Badapple
(Bioassay-Data Associative Promiscuity Pattern Learning Engine)
About
This web app analyzes each input query molecule by searching a database of bioactivity data experimentally
produced by NIH screening centers (data from PubChem). For each scaffold in the query molecule, the Badapple promiscuity score (pScore)
is computed according to the following scaffold scoring formula:
score = ((sActive) / (sTested + median(sTested)) *
(aActive) / (aTested + median(aTested)) *
(wActive) / (wTested + median(wTested)) *
1e5)
where:
sTested = # tested substances containing this scaffold
sActive = # active substances containing this scaffold
aTested = # assays with tested compounds containing this scaffold
aActive = # assays with active compounds containing this scaffold
wTested = # wells (samples) containing this scaffold
wActive = # active wells (samples) containing this scaffold
The inDrug flag indicates whether the corresponding scaffold exists in any approved drug*.
A high score for an inDrug scaffold thus represents conflicting evidence, but existence of an
approved drug is normally much stronger evidence.
Note that benzene is not considered as a scaffold due to it being such a common substructure.
Interpreting Scores
The table below provides an overview of how to interpret different pScores provided by Badapple.
pScore range | advisory |
~ |
unknown; no data |
0-99 |
low pScore; no indication |
100-299 |
moderate pScore; weak indication of promiscuity |
≥300 |
high pScore; strong indication of promiscuity |
Programmatic access
For programmatic access to Badapple please see the API Docs Page
For large use cases, or if one wishes to explore the Badapple data in more detail, please see the local installation guide.
Badapple Paper
For more information about Badapple please see the following paper:
Badapple: promiscuity patterns from noisy evidence
Although the original Badapple project was developed several years ago, we have re-created the original Badapple database and website using updated free and open-source software. See next section for more details.
Databases: badapple vs badapple_classic vs badapple2
Under the Badapple project there have been three unique databases developed so far. They can be summarized as follows:
- badapple: The original Badapple database developed several years ago which resulted in the publication of Badapple: promiscuity patterns from noisy evidence. This DB is not available as part of the webapp because it relies on outdated and proprietary code.
- badapple_classic: Updated version of badapple, built using identical assay records and other data.
- badapple2: Newest version of Badapple which incorporates an additional 83 new assay records.
For more information on the differences between these databases please see the next two sections.
What's different between badapple and badapple_classic?
The only differences badapple_classic and badapple are:
We have performed several analyses to confirm that badapple_classic and badapple align closely.
Details on the analyses we've done can be found here.
What's different between badapple_classic and badapple2?
The most significant differences between badapple_classic and badapple2 are:
-
badapple_classic uses the same set of (823) assays as the original badapple DB, whereas badapple2 incorporates an additional 83 new assay records (906 total).
- In addition to the 83 new assay records, badapple2 also uses updated versions of the original 823 assay records.
- PubChem is constantly updated. Substances, compounds, and bioassays can be modified/removed over time.
One particular way these updates have impacted badapple2 is that some PubChem compounds have been removed since the creation of badapple (e.g., CIDs 6212642, 24761676, and 24762101).
Thus, some scaffolds which are in badapple_classic may no longer be in badapple2 (e.g., the scaffold with SMILES 'C1=C2CCCC=C2Nc2ccccc21').
- badapple & badapple_classsic date cutoff: 2017-08-14
- badapple2 date cutoff: 2024-11-26
-
badapple_classic was restricted to substances from MLSMR, whereas badapple2 incorporates bioactivity data from all compounds tested in at least 50 unique assays.
- Omitting this restriction often leads to an artificially reduced sActive/sTested ratio
- This restriction was not necessary in badapple_classic due to the MLSMR filter
- Because of this restriction many scaffolds in badapple2 have a null pScore. However, even for these cases the website still provides inDrug information as well as assays where the scaffold was present in an active substance. Users can download a local version of badapple2 (see instructions above) to explore bioactivity data from these cases and can test different NAT restrictions by following the instructions here.
- badapple2 uses the badapple_classic medians to normalize pScores in order to ensure that our criteria for what constitutes a "high" amounts of evidence remains consistent.
- Note that because we no longer use the MLSMR filter, there are many scaffolds present in badapple2 which have a low amount of evidence. Using the badapple2 medians to normalize scores would significantly lower the bar for what is considered enough evidence to assign a high pScore.
In addition to the items above, badapple2 stores information not present in badapple_classic, including:
- Biological target information for each assay
- A record of the specific approved drug(s) a scaffold is present in if inDrug is true.
- Assay descriptors (description and protocol text + annotations from BARD)
If you select badapple2 in the webapp, details on the associated biological targets and approved drugs are available for each scaffold.
Help
If you have any issues or questions please raise them here.
Code Availability
All of our code, including the DBs, API, and UI are publicly available. Please see the links below.
Authors and Acknowledgement
This project was developed within the UNM School of Medicine, Dept. of Internal Medicine, Translational Informatics Division.
We would like to thank Cristian Bologa for his guidance, as well as Oleg Ursu, Tudor Oprea,
Christopher A. Lipinski, and Larry Sklar for their previous efforts on this project.
As well, we would like to acknowledge the developers of the many open-source software packages that have been vital to the success of this project. We'd like to especially acknowledge the following projects: