Flag aggregator compounds during HTS Triage – Enhanced visualization with TIBCO Spotfire® Connector for KNIME
Co-authored by Maxime Guitet
Joining Discngine in March, asides my Business Development activities for 3decision, I became a new member of the Discngine KNIME Guild. I discovered KNIME during my post-doc and I never stopped using it. For example, when I worked at Evotec, I developed a global workflow to calibrate docking systems before using them for virtual screening. I presented this work the 1st of June 2022 during the KNIME Data Talks: Drug Discovery — From Hit Generation to the Clinic (link to my slides, to my talk and to the journal article). I have seen medicinal chemists use Spotfire and KNIME intensively, but they have to jump from one tool to another to achieve their goal which is painful.
That’s why Discngine developed a special product called Connector that can connect various scientific software such as TIBCO Spotfire®, BIOVIA Pipeline Pilot, KNIME Server and Schrödinger LiveDesign.
Today, I will show you how easy it is to integrate a KNIME workflow into Spotfire as a Data Function thanks to the TIBCO Spotfire® Connector for KNIME.
Recently, Christophe Molina from PIKAÏROS published conjointly with Infogene and Sanofi a paper called Isometric Stratified Ensembles: A Partial and Incremental Adaptive Applicability Domain and Consensus-Based Classification Strategy for Highly Imbalanced Data Sets with Application to Colloidal Aggregation. This article presents a complex and advanced strategy of quantitative structure-interference relationship (QSIR) for the prediction of colloidal aggregators, also called small colloidal aggregating molecules (SCAMs). They are the most common source of false positives in high-throughput screening (HTS) campaigns. Computational methods are thus needed to flag potential SCAMs during HTS triage [like SCAM Detective]. Aggregators are often hydrophobic and highly conjugated molecules. However, a very diverse range of compounds has been reported to aggregate and it is difficult to predict whether a compound will aggregate just by looking at them. A small-molecule aggregate, or large colloid, adsorbs protein to its surface and inhibits enzyme activity by causing partial denaturation [from Aggregator Advisor].
The related KNIME workflows are available for download here: https://pikairos.eu/download/isometric-stratified-ensembles-ise-reset/.
Step by Step Guide
In order to visualize the flagging of potential aggregators by the KNIME Workflow in Spotfire, I am going to walk you through the needed steps.
The first one is to adapt the KNIME workflow: replace the SDF Reader by the SDBFReader for your structure input data and the Excel Sheet Appender by the SDBFWriter for the output data. These two nodes were developed by Discngine and are available here: https://docs.discngine.com/public/docs/connector-spo-kn-download
Then you need to adapt the workflows to just provide the predictions and related statistics.
Finally, deploy them on your KNIME Server. For this, you need to develop the main workflow to adapt and connect the three workflows using Call Workflow (Table Based), Container Input (Table) and Container Output (Table). The SBDF uploaded from Spotfire will be sanitized using the pre-processing workflow, then using a case switch, you will be able to choose which model to run from within Spotfire.
To register the Data Function in Spotfire, simply follow the procedure described in the documentation. First, browse the server workflow repository and select your newly created workflow — 0_MAIN.
Upon selection, the workflow details are loaded, and the KNIME Data Function interface shows you what inputs the workflow takes and what outputs will be generated. In this case, you need a column with identifiers, one with molecular structures, the model you want to run and the path to your workflow.
Finally, to allow you to select which model you want to use, add a Text Area in the document controlling a Document Property. This Document Property will be used later as an input to the Data Function. The three different models described in the original paper and based on several datasets are included in our Discngine development. They are called ABL, PSIC and UCSF. You can find the information on their related datasets and assays directly from PubChem in the table below:
The following table generated by the KNIME Colloidal Aggregation workflow is important for understanding what is being done and how to interpret the predictions. It represents the AUC values for each consensus level and applicability domain level. Indeed, the prediction value itself, between 0 and 1, is not sufficient. You should also carefully consider first the consensus level (CL) and in secondly the applicability domain level (ADL). The higher the CL and the smaller the ADL are, the better the prediction will be. In this table, the colors range from dark blue to intense red qualitatively represents the AUC level from 0.5 to 0.93, respectively. The marginal cardinal distribution of the compounds is added to the axes of the AUC ISE-map as explained in the original paper.
Now let’s imagine that your HTS output data is already loaded into Spotfire (here we loaded data from an HTS on choline transporter). You want to predict which compounds could be aggregators, i.e. false positive compounds. You can use your workflow previously registered as a Data Function.
For that, simply add the Data Function to the Spotfire Document, fill in the form that pops up to indicate which columns contain your compound identifiers and structures, bind the model selection to the Document Property previously created for the Textarea and run the Data Function.
The molecule structures and identifiers are going to be pushed to KNIME Server and the workflow applied to the models corresponding to what you selected in the Textarea (here PSIC). Then, the prediction data will be pushed back to Spotfire.
During the HTS triage, you will be able to use them to add a flag, allowing you to filter properly your compounds and highlight clusters containing aggregator compounds. Indeed, you can do what you want with this data because you will have access to all of TIBCO Spotfire®’s functionalities.
Here is a visualization example combining Consensus Level (CL), Applicability Domain Level (ADL) and Probability (P) as calculated by the workflows described and provided with the original paper. You can see that at the highest consensus level, compounds predicted as aggregators and non-aggregators are well separated and the prediction values are close to 0 or 1. As the consensus level decreases, the probability values are still separated but more spread between 0 and 1. Finally for consensus level 0 to 3, probability values are continuous, so we are less confident in the given predictions.
Let’s try another example of a visualization that can be interesting during your HTS Triage. After clustering of your active molecules, you can represent by a rectangle the size of each cluster. Then you can color each box by the average of the probability values.
The main issue here is to ensure the validity of each prediction value. Here comes the power of Spotfire: interactivity! Let’s prepare a document with a table and these two previous visualization types. Hide inactive compounds to ease the viewing. Then click on one cluster where average probability value is low like the cluster 46. Immediately, you will have the details of each compound in your specific cluster. You can easily see that 5 of the 6 compounds have CL=7 and ADL=0. Therefore, you can trust from the prediction that this cluster is not filled of aggregators and the compounds are likely active against your protein target.
For the cluster 79, among the nine molecules, 1 molecule is predicted as non-aggregator, but the probability value is close to the threshold of 0.5 (P=0.42, CL=0 and ADL=1). You can also doubt this prediction because the 8 other molecules in the same cluster are predicted as aggregators. You can hypothesize that these 9 compounds are active just because of their high likelihood of being SCAMs. They are most likely false positive compounds from your HTS campaign. So, you would not want to proceed further with them.
Conclusion
In conclusion, using the Discngine’s Spotfire Connector for KNIME, you can quickly implement publicly available KNIME workflows, call them from Spotfire and see their outcome there. This will allow you to combine the computational capabilities of KNIME with the visualization power of TIBCO Spotfire®. Furthermore, the Data Function model provided by Spotfire will allow you to record any workflow from your KNIME server and run them in any Spotfire Document. The adapted workflows used here are available on demand at contact@discngine.com.
Moreover, as Discngine is an official KNIME Partner and an official TIBCO Spotfire Partner, don’t hesitate to contact us via this form in case you want to get more info on these products and on related services Discngine could offer you.