Badapple

(Bioassay-Data Associative Promiscuity Pattern Learning Engine)

About

This web app analyzes each input query molecule by searching a database of bioactivity data experimentally produced by NIH screening centers (data from PubChem). For each scaffold in the query molecule, the Badapple promiscuity score (pScore) is computed according to the following scaffold scoring formula:

score = ((sActive) / (sTested + median(sTested)) *
        (aActive) / (aTested + median(aTested)) *
        (wActive) / (wTested + median(wTested)) *
        1e5)

where:

sTested = # tested substances containing this scaffold

sActive = # active substances containing this scaffold

aTested = # assays with tested compounds containing this scaffold

aActive = # assays with active compounds containing this scaffold

wTested = # wells (samples) containing this scaffold

wActive = # active wells (samples) containing this scaffold

The inDrug flag indicates whether the corresponding scaffold exists in any approved drug^*. A high score for an inDrug scaffold thus represents conflicting evidence, but existence of an approved drug is normally much stronger evidence.

* This analysis applies only to drugs with 5 ring systems or less. Drugs with many ring systems (e.g., venetoclax) are excluded.

Note that benzene is not considered as a scaffold due to it being such a common substructure.

Interpreting Scores

The table below provides an overview of how to interpret different pScores provided by Badapple.

pScore range	advisory
~	unknown; no data
0-99	low pScore; no indication
100-299	moderate pScore; weak indication of promiscuity
≥300	high pScore; strong indication of promiscuity

Programmatic access

For programmatic access to Badapple please see the API Docs Page

For large use cases, or if one wishes to explore the Badapple data in more detail, please see the local installation guide.

Badapple Paper

For more information about Badapple please see the following paper:

Badapple: promiscuity patterns from noisy evidence

Although the original Badapple project was developed several years ago, we have re-created the original Badapple database and website using updated free and open-source software. See next section for more details.

Databases: badapple vs badapple_classic vs badapple2

Under the Badapple project there have been three unique databases developed so far. They can be summarized as follows:

badapple: The original Badapple database developed several years ago which resulted in the publication of Badapple: promiscuity patterns from noisy evidence. This DB is not available as part of the webapp because it relies on outdated and proprietary code.
badapple_classic: Updated version of badapple, built using identical assay records and other data.
badapple2: Newest version of Badapple which incorporates an additional 83 new assay records.

For more information on the differences between these databases please see the next two sections.

What's different between badapple and badapple_classic?

The only differences badapple_classic and badapple are:

badapple_classic uses the RDKit-based HierS algorithm from ScaffoldGraph rather than our Chemaxon-based version of HierS.
badapple_classic generates canonical SMILES using RDKit, whereas badapple used openbabel
badapple_classic counts ring systems using RingSystemFinder (RDKit-based), whereas badapple used RawRingsystemCount (Chemaxon-based).
badapple_classic uses a newer version of PostgreSQL

We have performed several analyses to confirm that badapple_classic and badapple align closely. Details on the analyses we've done can be found here.

What's different between badapple_classic and badapple2?

The most significant differences between badapple_classic and badapple2 are:

badapple_classic uses the same set of (823) assays as the original badapple DB, whereas badapple2 incorporates an additional 83 new assay records (906 total).
- For both cases, assay records are restricted to HTS (>= 20k compounds) and come from NIH centers.
- You can compare the files badapple_classic_tested.aid and badapple2_tested.aid to see the exact difference.
In addition to the 83 new assay records, badapple2 also uses updated versions of the original 823 assay records.

PubChem is constantly updated. Substances, compounds, and bioassays can be modified/removed over time. One particular way these updates have impacted badapple2 is that some PubChem compounds have been removed since the creation of badapple (e.g., CIDs 6212642, 24761676, and 24762101). Thus, some scaffolds which are in badapple_classic may no longer be in badapple2 (e.g., the scaffold with SMILES 'C1=C2CCCC=C2Nc2ccccc21').
badapple & badapple_classsic date cutoff: 2017-08-14
badapple2 date cutoff: 2024-11-26

badapple_classic was restricted to substances from MLSMR, whereas badapple2 incorporates bioactivity data from all compounds tested in at least 50 unique assays.
- Omitting this restriction often leads to an artificially reduced sActive/sTested ratio
- This restriction was not necessary in badapple_classic due to the MLSMR filter
- Because of this restriction many scaffolds in badapple2 have a null pScore. However, even for these cases the website still provides inDrug information as well as assays where the scaffold was present in an active substance. Users can download a local version of badapple2 (see instructions above) to explore bioactivity data from these cases and can test different NAT restrictions by following the instructions here.
badapple2 uses the badapple_classic medians to normalize pScores in order to ensure that our criteria for what constitutes a "high" amounts of evidence remains consistent.
- Note that because we no longer use the MLSMR filter, there are many scaffolds present in badapple2 which have a low amount of evidence. Using the badapple2 medians to normalize scores would significantly lower the bar for what is considered enough evidence to assign a high pScore.

In addition to the items above, badapple2 stores information not present in badapple_classic, including:

Biological target information for each assay
A record of the specific approved drug(s) a scaffold is present in if inDrug is true.
Assay descriptors (description and protocol text + annotations from BARD)

If you select badapple2 in the webapp, details on the associated biological targets and approved drugs are available for each scaffold.

Help

If you have any issues or questions please raise them here.

Code Availability

All of our code, including the DBs, API, and UI are publicly available. Please see the links below.

Authors and Acknowledgement

This project was developed within the UNM School of Medicine, Dept. of Internal Medicine, Translational Informatics Division.

Lead Developer: Jack Ringer
Supervision: Jeremy Yang

We would like to thank Cristian Bologa for his guidance, as well as Oleg Ursu, Tudor Oprea, Christopher A. Lipinski, and Larry Sklar for their previous efforts on this project.

As well, we would like to acknowledge the developers of the many open-source software packages that have been vital to the success of this project. We'd like to especially acknowledge the following projects: