{ "cells": [ { "cell_type": "markdown", "id": "every-mortgage", "metadata": {}, "source": [ "# Docking Oracle and Generative Leaderboard Demo" ] }, { "cell_type": "markdown", "id": "beneficial-supervisor", "metadata": {}, "source": [ "Wenhao Gao (whgao@mit.edu)" ] }, { "cell_type": "markdown", "id": "given-creativity", "metadata": {}, "source": [ "# Outline\n", "- Docking oracle\n", " - what is docking\n", " - Why we prefer docking\n", " - how to use docking oracle in TDC\n", "- generative leaderboard\n", " - docking benchmark group\n", " - Train your own model with docking oracle\n", " - Submit to leaderboard" ] }, { "cell_type": "markdown", "id": "blank-hunger", "metadata": {}, "source": [ "## Molecular design is an optimization problem" ] }, { "cell_type": "markdown", "id": "restricted-repeat", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "compound-sapphire", "metadata": {}, "source": [ "## Docking Oracle" ] }, { "cell_type": "markdown", "id": "becoming-matrix", "metadata": {}, "source": [ "* What is docking?\n", "\n", "Molecular docking is a computational approach to evaluate the binding affinity of a ligand (small molecule, flexible) and a receptor (protein target, rigid). Typically it includes two components: a scoring function to evaluate the free energy of a determined conformation, and a searching algorithm that search for the stronger binding conformation. A stronger binding affinity leads to higher probability to be a potent inhibiter (recall the lock and key model)." ] }, { "cell_type": "markdown", "id": "rising-spice", "metadata": {}, "source": [ "Bender, B. J., Gahbauer, S., Luttens, A., Lyu, J., Webb, C. M., Stein, R. M., ... & Shoichet, B. K. (2021). A practical guide to large-scale docking. Nature Protocols, 1-34." ] }, { "cell_type": "markdown", "id": "continent-blink", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "multiple-refund", "metadata": {}, "source": [ "* Why we use docking?\n", "\n", "1. Docking is actually used as a first step in virtual screening for drug discovery, more useful and challenging.\n", "2. Docking serves as an affordable simulation of resource-consuming oracles (docking: ~30s/mol, QED: ms/mol)." ] }, { "cell_type": "markdown", "id": "durable-representation", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "adapted-rocket", "metadata": {}, "source": [ "## How to run docking (traditional vs. TDC) ?" ] }, { "cell_type": "markdown", "id": "hazardous-fancy", "metadata": {}, "source": [ "Traditional docking simulation: https://vina.scripps.edu/manual/#linux" ] }, { "cell_type": "markdown", "id": "better-feeling", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "fuzzy-spare", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "weekly-daisy", "metadata": {}, "source": [ "We provide easy access with AutoDock Vina as backend via PyScreener, install PyScreener following https://github.com/coleygroup/pyscreener" ] }, { "cell_type": "markdown", "id": "raising-median", "metadata": {}, "source": [ "We provide 9 curated targets: PDB ID: 3pbl, 1iep, 2rgp, 3eml, 3ny8, 4rlu, 4unn, 5mo4, 7l11. Using them is similar to using other oracles:" ] }, { "cell_type": "markdown", "id": "conditional-average", "metadata": {}, "source": [ "| PDB ID | Description | Source |\n", "| ----------- | ----------- | ----------- |\n", "| [1iep](https://www.rcsb.org/structure/1IEP) | c-Abl kinase, complex with imatinib. | [vina tutorial](https://autodock-vina.readthedocs.io/en/latest/docking_python.html) |\n", "| [2rgp](https://www.rcsb.org/structure/2RGP) | Epidermal growth factor receptor, complex with hydrazone. | [SBMolGen](https://chemrxiv.org/engage/chemrxiv/article-details/60c75725842e65c7acdb4638) |\n", "| [3eml](https://www.rcsb.org/structure/3EML) | Human A2A adenosine receptor, complex with ZM241385. | [SBMolGen](https://chemrxiv.org/engage/chemrxiv/article-details/60c75725842e65c7acdb4638) |\n", "| [3ny8](https://www.rcsb.org/structure/3NY8) | Beta2 adrenergic receptor, complex with the inverse agonist ICI 118,551. | [SBMolGen](https://chemrxiv.org/engage/chemrxiv/article-details/60c75725842e65c7acdb4638) |\n", "| [3pbl](https://www.rcsb.org/structure/3PBL) | Human dopamine D3 receptor, complex with eticlopride. | [DUD.E](http://dude.docking.org/targets/drd3) |\n", "| [4rlu](https://www.rcsb.org/structure/4RLU) | beta-hydroxyacyl-ACP dehydratase HadAB, complex with a flavonoid inhibitor. | [A 3D generative model for structure-based drug design](https://proceedings.neurips.cc/paper/2021/hash/314450613369e0ee72d0da7f6fee773c-Abstract.html) |\n", "| [4unn](https://www.rcsb.org/structure/4UNN) | M. Tuberculosis Thymidylate Kinase (Mtb Tmk), complex witha. inhibitor. | [MolPAL](https://pubs.rsc.org/en/content/articlehtml/2021/sc/d0sc06805e) |\n", "| [5mo4](https://www.rcsb.org/structure/5MO4) | ABL1 kinase, complex with asciminib. | [L-Net](https://arxiv.org/abs/2104.08474) |\n", "| [7l11](https://www.rcsb.org/structure/7L11) | The Main Protease (M$^{pro}$) of SARS-CoV-2, complex with a known inhibitor. | [Zhang, Chun-Hui, et al.](https://pubs.acs.org/doi/abs/10.1021/acscentsci.1c00039) |" ] }, { "cell_type": "code", "execution_count": 1, "id": "comic-thesis", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found local copy...\n", "2022-01-24 23:17:41,348\tINFO services.py:1253 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n", "Docking: 100%|██████████| 1/1 [00:23<00:00, 23.08s/ligand]\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Docking score: -8.0 kcal/mol\n", "Consumed time: 23.086 s\n" ] } ], "source": [ "from tdc import Oracle\n", "import time\n", "\n", "oracle = Oracle(name='3pbl_docking')\n", "t1 = time.time()\n", "print(f\"Docking score: {oracle('CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1')} kcal/mol\")\n", "print(f\"Consumed time: {time.time() - t1:.3f} s\")" ] }, { "cell_type": "markdown", "id": "proprietary-marketplace", "metadata": {}, "source": [ "## Visualize the result" ] }, { "cell_type": "code", "execution_count": 2, "id": "quick-bennett", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import py3Dmol\n", "import os\n", "\n", "pdbid = '3pbl'\n", "\n", "# conda install openbabel -c conda-forge\n", "os.system(\"obabel -i pdbqt out.pdbqt -o pdb -O \" + pdbid + \"_ligand.pdb\")\n", "os.system(\"obabel -i pdbqt oracle/\" + pdbid + \".pdbqt -o pdb -O\" + pdbid + \"_receptor.pdb\")\n", "os.system(\"obabel -i pdbqt out.pdbqt -o xyz -O \" + pdbid + \"_ligand.xyz\")" ] }, { "cell_type": "code", "execution_count": 3, "id": "relative-authorization", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "7.394807692307693 22.16196153846154 23.601230769230767\n" ] }, { "data": { "application/3dmoljs_load.v0": "
\n

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n jupyter labextension install jupyterlab_3dmol

\n
\n", "text/html": [ "
\n", "

You appear to be running in JupyterLab (or JavaScript failed to load for some other reason). You need to install the 3dmol extension:
\n", " jupyter labextension install jupyterlab_3dmol

\n", "
\n", "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "lines = []\n", "for line in open(pdbid + \"_ligand.xyz\", 'rt'): \n", " lines.append(line.strip())\n", "lines = lines[2:]\n", "\n", "xs = [float(line.split()[1]) for line in lines]\n", "ys = [float(line.split()[2]) for line in lines]\n", "zs = [float(line.split()[3]) for line in lines]\n", "center_x = sum(xs)/len(lines)\n", "center_y = sum(ys)/len(lines)\n", "center_z = sum(zs)/len(lines)\n", "print(center_x, center_y, center_z)\n", "\n", "def visbox2(objeto, center, box_size): \n", " objeto.addBox({'center':{'x': center[0],'y': center[1],'z': center[2]}, \n", " 'dimensions': {'w': box_size[0],'h': box_size[1],'d': box_size[2]},'color':'blue','opacity': 0.5})\n", "\n", "def complxvis(objeto,protein_name,ligand_name):\n", " mol1 = open(protein_name, 'r').read()\n", " mol2 = open(ligand_name, 'r').read()\n", " objeto.addModel(mol1,'pdb')\n", " objeto.setStyle({'cartoon': {'color':'spectrum'}})\n", " objeto.addModel(mol2,'pdb')\n", " objeto.setStyle({'model':1},{'stick':{}})\n", "\n", "def vismol(center=[center_x, center_y, center_z], box_size=[20, 20, 20]): \n", " mol_view = py3Dmol.view(1200, 800) \n", " visbox2(mol_view, center, box_size)\n", " complxvis(mol_view, pdbid + '_receptor.pdb', pdbid + '_ligand.pdb')\n", " mol_view.setBackgroundColor('white')\n", " mol_view.rotate(90, {'x':0,'y':1,'z':0},viewer=(0,1));\n", " mol_view.zoomTo() \n", " mol_view.show()\n", "\n", "vismol()" ] }, { "cell_type": "markdown", "id": "micro-point", "metadata": {}, "source": [ "## Others" ] }, { "cell_type": "code", "execution_count": 4, "id": "bridal-softball", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found local copy...\n", "2022-01-24 23:18:09,718\tINFO services.py:1253 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n", "Docking: 0%| | 0/1 [00:00" ] }, { "cell_type": "markdown", "id": "applied-celtic", "metadata": {}, "source": [ "### Step 1: Initialize the docking benchmark group class" ] }, { "cell_type": "code", "execution_count": 8, "id": "innocent-insured", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['1iep', '2rgp', '3eml', '3ny8', '4rlu', '4unn', '5mo4', '7l11', '3pbl']" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from tdc import utils\n", "utils.retrieve_benchmark_names('Docking_Group')" ] }, { "cell_type": "code", "execution_count": 9, "id": "level-domestic", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Found local copy...\n", "Found local copy...\n", "2022-01-24 23:19:27,496\tINFO services.py:1253 -- View the Ray dashboard at \u001b[1m\u001b[32mhttp://127.0.0.1:8266\u001b[39m\u001b[22m\n" ] } ], "source": [ "from tdc.benchmark_group import docking_group\n", "group = docking_group(path='./data')\n", "\n", "benchmark = group.get('3pbl', num_max_call = 5000) \n", "oracle_fct, data, name = benchmark['oracle'], benchmark['data'], benchmark['name'] " ] }, { "cell_type": "code", "execution_count": 10, "id": "broad-healing", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
smiles
0CC(C)(C)c1ccc2occ(CC(=O)Nc3ccccc3F)c2c1
1C[C@@H]1CC(Nc2cncc(-c3nncn3C)c2)C[C@@H](C)C1
2N#Cc1ccc(-c2ccc(O[C@@H](C(=O)N3CCCC3)c3ccccc3)...
3CCOC(=O)[C@@H]1CCCN(C(=O)c2nc(-c3ccc(C)cc3)n3c...
4N#CC1=C(SCC(=O)Nc2cccc(Cl)c2)N=C([O-])[C@H](C#...
\n", "
" ], "text/plain": [ " smiles\n", "0 CC(C)(C)c1ccc2occ(CC(=O)Nc3ccccc3F)c2c1\n", "1 C[C@@H]1CC(Nc2cncc(-c3nncn3C)c2)C[C@@H](C)C1\n", "2 N#Cc1ccc(-c2ccc(O[C@@H](C(=O)N3CCCC3)c3ccccc3)...\n", "3 CCOC(=O)[C@@H]1CCCN(C(=O)c2nc(-c3ccc(C)cc3)n3c...\n", "4 N#CC1=C(SCC(=O)Nc2cccc(Cl)c2)N=C([O-])[C@H](C#..." ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data.head()" ] }, { "cell_type": "code", "execution_count": 11, "id": "extreme-rebate", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Docking: 100%|██████████| 1/1 [00:24<00:00, 24.91s/ligand]\n" ] }, { "data": { "text/plain": [ "-8.4" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "oracle_fct(data.smiles[0])" ] }, { "cell_type": "code", "execution_count": 12, "id": "authentic-journey", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'3pbl'" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "name" ] }, { "cell_type": "markdown", "id": "gross-imaging", "metadata": {}, "source": [ "### Step 2: Train your model (a genetic algorithm with SynNet)" ] }, { "cell_type": "markdown", "id": "signed-detroit", "metadata": {}, "source": [ "Gao, W., Mercado, R., & Coley, C. W. Amortized Tree Generation for Bottom-up Synthesis Planning and Synthesizable Molecular Design. ICLR 2022." ] }, { "cell_type": "markdown", "id": "figured-mileage", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "adopted-tutorial", "metadata": {}, "source": [ "" ] }, { "cell_type": "code", "execution_count": 13, "id": "completed-barcelona", "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "with open('opt_drd3_5000.json', 'r') as f:\n", " pred_smiles = json.load(f)" ] }, { "cell_type": "code", "execution_count": 14, "id": "younger-three", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'5000': {'O=c1c2c(F)c(-c3cccc4ccccc34)ccc2ncn1CC12CCC(CC1)OC2': 12.3,\n", " 'O=c1c2c(Br)cccc2ncn1Cc1cc(F)c(-c2n~c3c(C4=NNN=N4)cccc3o2)cc1F': 12.2,\n", " 'O=C1OC(=O)C23CCOCC12N=NN3CC12CC(C3=NC(c4cncc5ccccc45)=NN3)(CO1)C2': 12.2,\n", " 'FC(F)(F)c1cncc(CN2CCc3cc(-c4cccc5ccccc45)ccc32)c1': 12.0,\n", " 'Cc1cc(N)n2ncc(-c3nc(-c4ccc(-c5cccc6ccccc56)cc4)n[nH]3)c2n1': 12.0,\n", " 'CC(=O)C(c1cccc(C(=O)c2cccc3ccccc23)c1)c1noc(-c2c[nH]nc2C2CCCCC2)n1': 12.0,\n", " 'Fc1cc(Br)c(CNc2nc3cc(-c4cccc5ccccc45)ccc3n2Cc2ccccc2)c(F)c1F': 12.0,\n", " 'O=C(Cl)Cc1cn(-c2ccc3cc(-c4ccccc4F)[nH]c3c2)c2cccnc12': 11.9,\n", " 'CC1(C)Cc2cc3cc(-c4cccc5ccccc45)ccc3nc2CC1=O': 11.8,\n", " 'O=C(CNc1ccc2c(-c3ccc(F)cc3)c(-c3ccccc3)[nH]c2c1)c1cccc2ccccc12': 11.8,\n", " 'Cc1ccc2c(=O)-n(Cc3ccc(NC(=O)C(F)(F)c4cccc5ccccc45)cc3)cnc2c1F': 11.8,\n", " 'Cc1cc(-c2cccc(NC(=O)NC3CCCCCC3(F)F)c2)c2ccccc2c1': 11.8,\n", " 'Nc1nc(-c2ccccc2)c(-c2ccc3cc(C(=O)N4CC5CCC4CN5)[nH]c3c2F)s1': 11.7,\n", " 'Cn1ncc2c1CCCn1c-2nc2ccc(-c3cccc(C(=O)C(C)(F)F)c3)cc2c1=O': 11.7,\n", " 'NCC(=O)c1cccc(C(=O)c2nc(-c3noc(-c4ccccc4C4CCCC4)n3)ccc2F)c1': 11.7,\n", " 'CC(F)(F)c1ccccc1-c1ccc2c(c1)C(NC(=O)Nc1cccc3ccccc13)CC2': 11.7,\n", " 'CC(C)n1c(-c2c(F)ccc3c2OCC3)n~c2cc(-c3cccc(C4NC(=O)CC4=O)c3)ccc21': 11.6,\n", " 'NCC(COc1cccc2c(-c3ccc4ccc(Cl)nc4c3)cccc12)c1cccc(Br)c1': 11.6,\n", " 'Cc1c(N)cc(-c2ccc3nc4n(c(=O)c3c2)C2(CCCCC2)C4)cc1C(F)(F)F': 11.5,\n", " 'O=C(NCc1cc(F)cc2ccccc12)Nc1ccc(Oc2ccccc2)cc1': 11.5,\n", " 'O=C(Nc1cc(F)c(Cl)c(-c2ccnc(C(F)(F)F)c2)c1)Nc1cccc2ccccc12': 11.4,\n", " 'O=C(Nc1ccc(C(=O)c2cccc(F)c2)c2ccccc12)NC1CCCc2cn[nH]c21': 11.4,\n", " 'O=C(Nc1ccc2cc(-c3ccccc3Cl)ccc2c1)C1CCCc2[nH]ncc21': 11.4,\n", " 'O=C1Cc2cccc(-c3cccc(-c4nc(NC(=O)Nc5cccc6ccccc56)no4)c3)c2N1': 11.4,\n", " 'Brc1c(CNc2cccc3[nH]c(-c4ccccc4-c4ccccc4)cc23)ccc2ccccc12': 11.4,\n", " 'Nc1ccc(NC(=O)NCc2cccc(-c3cccc4ccccc34)c2)cc1': 11.3,\n", " 'Cc1sc(C(=O)O)cc1Cn1cnc2ccc(-c3cccc(C4(F)CC4(F)F)c3)cc2c1=O': 11.3,\n", " 'O=C(Nc1ccc(-c2cccc3ccccc23)cc1)N1CCCCCCC1': 11.3,\n", " 'NC(=O)C1CCCN(c2ccc(CNC(=O)Nc3cc(Cl)nc4ccccc34)cc2)C1': 11.3,\n", " 'O=S1(=O)Cc2c([nH]c3c(F)cc(C#CCNc4cccc5ccccc45)cc23)-c2cc(F)ccc21': 11.3,\n", " 'O=c1c2cc(F)ccc2ncn-1Cc1cccc(Oc2ccc3ccccc3c2)c1': 11.3,\n", " 'O=C(Nc1cccc(-c2cccc3cc[nH]c23)c1)Nc1ccnn1Cc1ccccc1': 11.2,\n", " 'Cc1ccc(NC(=O)Nc2ccc(-c3cccc4ccccc34)c(N)n2)cc1-c1ccncc1': 11.2,\n", " 'Cn1ncc(F)c1-c1ccc2c(c1)CCC2NC(=O)Nc1cccc2ccccc12': 11.2,\n", " 'Nc1cc(O)cc(NCCc2cccc(F)c2C2=NNC(c3cccc4ccccc34)=N2)n1': 11.2,\n", " 'O=C(Nc1nc2ccccc2s1)c1ccc2[nH]c(-c3ccccc3-c3ccccc3)cc2c1': 11.2,\n", " 'Cc1ccnc(NC(=O)Nc2ccc(-c3cccc4ccccc34)cc2F)c1C(N)=S': 11.2,\n", " 'CC(=O)c1ccc(F)c(-c2ccc3c(NCc4ccccc4-n4cccn4)ncnc3c2)c1': 11.2,\n", " 'Cc1cc(NC(=O)Nc2cccc3ccccc23)ccc1-c1ccc2c(c1)C(=O)CCCO2': 11.2,\n", " 'O=c1c2cc(-c3cccc4ccccc34)c(F)cc2ncn1C1CCc2nnnn2C1': 11.2,\n", " 'Cn1ccc(NC(=O)Nc2cccc(-c3ccc(F)c(C4=NNN=N4)c3)c2)nc1=O': 11.2,\n", " 'Cc1cc2c(cc1C)C(=O)NC(c1cccc(-c3cc(F)cc(-c4cc(N)n[nH]4)c3)c1F)=N2': 11.1,\n", " 'CCC1CCc2nc(NS(=O)(=O)Cc3cc(-c4cccc5ccccc45)ccn3)sc2C1': 11.1,\n", " 'O=C(Nc1cccc(C2(C(F)(F)F)N=N2)c1)Nc1cccc2ccccc12': 11.1,\n", " 'CC1Cc2ccc(F)cc2CN1C(=S)Nc1ccc(-c2cccc3ccccc23)cc1': 11.1,\n", " 'Cc1nnc(C(=O)c2c[nH]c3nc(-c4ccc5c(c4)OCC5)ccc23)cc1-c1ccccc1': 11.1,\n", " 'CC(c1nc(-c2cccc(C(O)(c3cccc4ccccc34)C3CCC3)c2)n[nH]1)c1ncccc1F': 11.1,\n", " 'O=C(Nc1ncccc1OCc1ccccc1)Nc1cccc2ccccc12': 11.1,\n", " 'CC(NC(=O)Nc1cccc(-c2cccc3ccccc23)c1)c1cccc(F)c1': 11.1,\n", " 'Nc1cccc2cc(-c3noc(CN4Cc5ccccc5C4)n3)ccc12': 11.1,\n", " 'Cc1ccccc1Oc1cccc(NC(=O)Nc2cccc3ccccc23)c1': 11.1,\n", " 'O=C(Cc1ccc(-c2noc(CC3(Sc4ccccc4)CCC3)n2)c2ccccc12)OC1CCc2cc(F)ccc21': 11.0,\n", " 'CCC(=O)c1cccc(-c2ccc(F)c(-c3nc(-c4ccnc(Oc5ccccc5Cl)c4)no3)c2)c1F': 11.0,\n", " 'Cc1ccc(Cl)cc1CC1CCN(C(=O)Nc2cccc3ccccc23)CC1': 11.0,\n", " 'Cc1cc(n2ncc3cc(-c4cccc5ccccc45)ccc32)cc(C)n1': 11.0,\n", " 'O=C(NCc1ccccc1)Nc1ccc(-c2cccc3cc(O)ncc23)cc1': 11.0,\n", " 'CC1(C)CNc2ncc(-c3cccc(OCC4CC(=O)Nc5ccccc54)c3)cc2O1': 11.0,\n", " 'Cn1nccc1C1CCCN1C(=O)NCc1cccc(-c2cccc3ccccc23)c1': 11.0,\n", " 'OC(c1cccc(OC2CCCC2)c1)c1cccc2ccc(CNc3cnn4ccccc34)nc12': 11.0,\n", " 'Fc1cccc(F)c1-c1cc2cc3ccn(Cc4ccccc4)c3cc2[nH]1': 11.0,\n", " 'Cc1cc2cc(-c3cccc(CN4CCCCC4Cc4ccccc4F)c3O)ccc2cn1': 11.0,\n", " 'Cc1nc(Cl)c(-c2n~c3cc(-c4cccc(S(=O)(=O)C(F)(F)F)c4)ccc3o2)c(Cl)n1': 10.9,\n", " 'O=C(Nc1ccc(Oc2ccnc3ccccc23)c(F)c1)c1cccc(F)c1Br': 10.9,\n", " 'Cc1ccc(CNC(=O)N2CCC(c3cccc4ccccc34)CC2)c(C)c1': 10.9,\n", " 'Cc1nc(Cn2cnc3cc(-c4cnc5ccccc5c4)cc(Cl)c3c2=O)ccc1C(=O)O': 10.9,\n", " 'Fc1cccc(-c2ccc(-c3cccc4c3cnn4c3cccc4ccc(Cl)nc43)cc2)c1': 10.9,\n", " 'O=C(Nc1cc(Cl)ccc1F)N1CCCCC1Cc1cccc(-c2cncc3ccccc23)c1': 10.9,\n", " 'CCC(F)c1nc(-c2cccc3c(-c4cccc(C(F)(F)F)c4)cccc23)no1': 10.9,\n", " 'O=C(Nc1ncc(-c2ccccc2)cn1)Nc1cccc2ccccc12': 10.9,\n", " 'O=c1[nH]c(-c2ccnc(OCc3ccccc3)c2)nc2cc(Cl)ccc12': 10.9,\n", " 'O=C(NCC1(O)Cc2ccccc2C1)c1cc(F)cc2cc(-c3cccc(Cl)c3)[nH]c12': 10.9,\n", " 'O=C(NCc1cc2cc(F)ccc2s1)Nc1cccc2ccccc12': 10.9,\n", " 'Oc1c(-c2noc(-c3cc4sccc4n3CC(F)(F)F)n2)cnc2c(Br)cccc12': 10.8,\n", " 'Fc1cccc(-c2ccc3c(cnn3c3cccc4ccccc43)c2)c1': 10.8,\n", " 'Cc1nsc(NC(=O)Nc2cccc(-c3cc(F)c4ccccc4c3)c2)n1': 10.8,\n", " 'Cn1c(-c2cc(-c3ccc4ccccc4c3)n[nH]2)n~c2ccc(Br)cc21': 10.8,\n", " 'CCn1c(-c2cc(-c3cc4c(ccc5[nH]ncc54)o3)n(C)n2)n~c2c(F)cccc21': 10.8,\n", " 'CC(F)(F)c1ccc(-c2nc(-c3ccc(Oc4cccc5ccccc45)cc3)no2)nc1': 10.8,\n", " 'CC(F)(F)c1cccc(COC(=O)c2c[nH]c3c(-c4ccc(F)c(C5CC5)c4)cccc23)n1': 10.8,\n", " 'Cc1cc(Cl)cc2c1oc(-c1cccc(C#Cc3cccc(CC4CCCN4)c3)n1)n~2': 10.7,\n", " 'Cc1nc(C(F)(F)F)ccc1-c1cccc(CC=CCCc2ccnc3ccccc23)c1': 10.7,\n", " 'O=C(NCc1cccc(C2(C(F)(F)F)N=N2)c1)Nc1cccc2ccccc12': 10.7,\n", " 'Cc1nn(-c2ccc3c(c2)CCC2=Nc4cc(Cl)ccc4C(=O)N23)c2ncccc12': 10.7,\n", " 'O=C(Nc1cccc2c1OCCO2)N1CC(Cc2ccccc2C(F)(F)F)C1': 10.7,\n", " 'Cc1cc(C2=NN(C3=C(c4cccc(F)c4)C(=O)NC3=O)N=N2)nc2ccccc12': 10.7,\n", " 'Oc1c(-c2nc(-c3cccc(CC4CCCCN4)c3)no2)ncc2ccccc12': 10.7,\n", " 'c1ccc(Cc2ccccc2-c2ccc3cc(-c4nn[nH]n4)ccc3c2)cc1': 10.7,\n", " 'CC(C(=O)CC(=O)C1CCC(C2=NC(Cc3cc(Cl)ccc3F)=NN2)N1)c1ccccc1': 10.6,\n", " 'O=C1NCCc2ccc(CNc3cccc(-c4nc(-c5ccccc5CC5CCCCCN5)no4)c3)cc21': 10.6,\n", " 'NC(c1cccc(-c2noc(-c3ccc(C4CC(F)(F)C4)cc3)n2)c1)C(F)(F)F': 10.6,\n", " 'Cc1ccc(CNC(=O)Nc2cccc(-c3cccc4ccccc34)c2)cn1': 10.6,\n", " 'Cc1cccc(-c2cc3cc(OCC(=O)C45C(=O)NC4C4CCC5C4)ccc3[nH]2)c1': 10.6,\n", " 'CC(C)(c1ccc(F)cc1)C1(NC(=O)Nc2cccc(-c3cccc4ccccc34)c2)CC1': 10.6,\n", " 'O=C(c1ccc(-n2cc(-c3cccc(CBr)c3)cn2)c(F)c1)c1cccc2ccccc12': 10.6,\n", " 'COC(=O)c1ccc(F)c(COc2ccc(-c3cccc4ccccc34)c(C(F)(F)F)n2)c1': 10.6,\n", " 'Cc1cc(C(F)(F)F)n2ncc(-c3nc(-c4cccc(-c5ccc(F)cc5)c4)no3)c2n1': 10.6,\n", " 'Cc1nc2c(-c3ccc(COC(=O)c4nc(C(F)(F)F)n5ccccc45)c(N)c3)cccc2o1': 10.6,\n", " 'Cn1c(-c2csc(CC3CCNCC3)n2)n~c2cc(-c3cc4ccccc4cc3Br)ccc21': 10.6,\n", " 'COc1c(Cl)cccc1-c1ccc(C(=O)O)c(NC(=O)Cc2ccc(F)cc2C(F)(F)F)c1': 10.5,\n", " 'Cc1cc(Br)c2oc(-c3cccc(CNc4ccc(C5CC5)cc4)c3Br)n~c2c1': 10.5}}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pred_smiles" ] }, { "cell_type": "markdown", "id": "metropolitan-found", "metadata": {}, "source": [ "### Step 3: Evaluate the result" ] }, { "cell_type": "code", "execution_count": 15, "id": "competent-sapphire", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The input is a dictionary, expected to have SMILES string as key and docking score as value!\n", "---- Calculating average docking scores ----\n", "---- Calculating synthetic accessibility score ----\n", "Found local copy...\n", "---- Calculating molecular filters scores ----\n", "MolFilter is using the following filters:\n", "Rule_Glaxo: True\n", "Rule_PAINS: True\n", "Rule_SureChEMBL: True\n", "---- Calculating diversity ----\n", "---- Calculating novelty ----\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Consumed time: 45.081 s\n" ] } ], "source": [ "results = {}\n", "results[name] = pred_smiles \n", "\n", "# submission-ready results\n", "t1 = time.time()\n", "out = group.evaluate(results)\n", "print(f\"Consumed time: {time.time() - t1:.3f} s\")" ] }, { "cell_type": "code", "execution_count": 16, "id": "prime-motorcycle", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['docking_scores_dict', 'top100', 'top10', 'top1', 'sa_dict', 'sa', 'pass_list', '%pass', 'top1_%pass', 'diversity', 'novelty', 'top smiles'])" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "i = 0\n", "out['3pbl']['5000'].keys()" ] }, { "cell_type": "code", "execution_count": 28, "id": "pregnant-static", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "top smiles\n", "['O=c1c2c(F)c(-c3cccc4ccccc34)ccc2ncn1CC12CCC(CC1)OC2', 'O=c1c2c(Br)cccc2ncn1Cc1cc(F)c(-c2n~c3c(C4=NNN=N4)cccc3o2)cc1F', 'O=C1OC(=O)C23CCOCC12N=NN3CC12CC(C3=NC(c4cncc5ccccc45)=NN3)(CO1)C2', 'FC(F)(F)c1cncc(CN2CCc3cc(-c4cccc5ccccc45)ccc32)c1', 'Cc1cc(N)n2ncc(-c3nc(-c4ccc(-c5cccc6ccccc56)cc4)n[nH]3)c2n1', 'CC(=O)C(c1cccc(C(=O)c2cccc3ccccc23)c1)c1noc(-c2c[nH]nc2C2CCCCC2)n1', 'Fc1cc(Br)c(CNc2nc3cc(-c4cccc5ccccc45)ccc3n2Cc2ccccc2)c(F)c1F', 'O=C(Cl)Cc1cn(-c2ccc3cc(-c4ccccc4F)[nH]c3c2)c2cccnc12', 'CC1(C)Cc2cc3cc(-c4cccc5ccccc45)ccc3nc2CC1=O', 'O=C(CNc1ccc2c(-c3ccc(F)cc3)c(-c3ccccc3)[nH]c2c1)c1cccc2ccccc12', 'Cc1ccc2c(=O)-n(Cc3ccc(NC(=O)C(F)(F)c4cccc5ccccc45)cc3)cnc2c1F', 'Cc1cc(-c2cccc(NC(=O)NC3CCCCCC3(F)F)c2)c2ccccc2c1', 'Nc1nc(-c2ccccc2)c(-c2ccc3cc(C(=O)N4CC5CCC4CN5)[nH]c3c2F)s1', 'Cn1ncc2c1CCCn1c-2nc2ccc(-c3cccc(C(=O)C(C)(F)F)c3)cc2c1=O', 'NCC(=O)c1cccc(C(=O)c2nc(-c3noc(-c4ccccc4C4CCCC4)n3)ccc2F)c1', 'CC(F)(F)c1ccccc1-c1ccc2c(c1)C(NC(=O)Nc1cccc3ccccc13)CC2', 'CC(C)n1c(-c2c(F)ccc3c2OCC3)n~c2cc(-c3cccc(C4NC(=O)CC4=O)c3)ccc21', 'NCC(COc1cccc2c(-c3ccc4ccc(Cl)nc4c3)cccc12)c1cccc(Br)c1', 'Cc1c(N)cc(-c2ccc3nc4n(c(=O)c3c2)C2(CCCCC2)C4)cc1C(F)(F)F', 'O=C(NCc1cc(F)cc2ccccc12)Nc1ccc(Oc2ccccc2)cc1', 'O=C(Nc1cc(F)c(Cl)c(-c2ccnc(C(F)(F)F)c2)c1)Nc1cccc2ccccc12', 'O=C(Nc1ccc(C(=O)c2cccc(F)c2)c2ccccc12)NC1CCCc2cn[nH]c21', 'O=C(Nc1ccc2cc(-c3ccccc3Cl)ccc2c1)C1CCCc2[nH]ncc21', 'O=C1Cc2cccc(-c3cccc(-c4nc(NC(=O)Nc5cccc6ccccc56)no4)c3)c2N1', 'Brc1c(CNc2cccc3[nH]c(-c4ccccc4-c4ccccc4)cc23)ccc2ccccc12', 'Nc1ccc(NC(=O)NCc2cccc(-c3cccc4ccccc34)c2)cc1', 'Cc1sc(C(=O)O)cc1Cn1cnc2ccc(-c3cccc(C4(F)CC4(F)F)c3)cc2c1=O', 'O=C(Nc1ccc(-c2cccc3ccccc23)cc1)N1CCCCCCC1', 'NC(=O)C1CCCN(c2ccc(CNC(=O)Nc3cc(Cl)nc4ccccc34)cc2)C1', 'O=S1(=O)Cc2c([nH]c3c(F)cc(C#CCNc4cccc5ccccc45)cc23)-c2cc(F)ccc21', 'O=c1c2cc(F)ccc2ncn-1Cc1cccc(Oc2ccc3ccccc3c2)c1', 'O=C(Nc1cccc(-c2cccc3cc[nH]c23)c1)Nc1ccnn1Cc1ccccc1', 'Cc1ccc(NC(=O)Nc2ccc(-c3cccc4ccccc34)c(N)n2)cc1-c1ccncc1', 'Cn1ncc(F)c1-c1ccc2c(c1)CCC2NC(=O)Nc1cccc2ccccc12', 'Nc1cc(O)cc(NCCc2cccc(F)c2C2=NNC(c3cccc4ccccc34)=N2)n1', 'O=C(Nc1nc2ccccc2s1)c1ccc2[nH]c(-c3ccccc3-c3ccccc3)cc2c1', 'Cc1ccnc(NC(=O)Nc2ccc(-c3cccc4ccccc34)cc2F)c1C(N)=S', 'CC(=O)c1ccc(F)c(-c2ccc3c(NCc4ccccc4-n4cccn4)ncnc3c2)c1', 'Cc1cc(NC(=O)Nc2cccc3ccccc23)ccc1-c1ccc2c(c1)C(=O)CCCO2', 'O=c1c2cc(-c3cccc4ccccc34)c(F)cc2ncn1C1CCc2nnnn2C1', 'Cn1ccc(NC(=O)Nc2cccc(-c3ccc(F)c(C4=NNN=N4)c3)c2)nc1=O', 'Cc1cc2c(cc1C)C(=O)NC(c1cccc(-c3cc(F)cc(-c4cc(N)n[nH]4)c3)c1F)=N2', 'CCC1CCc2nc(NS(=O)(=O)Cc3cc(-c4cccc5ccccc45)ccn3)sc2C1', 'O=C(Nc1cccc(C2(C(F)(F)F)N=N2)c1)Nc1cccc2ccccc12', 'CC1Cc2ccc(F)cc2CN1C(=S)Nc1ccc(-c2cccc3ccccc23)cc1', 'Cc1nnc(C(=O)c2c[nH]c3nc(-c4ccc5c(c4)OCC5)ccc23)cc1-c1ccccc1', 'CC(c1nc(-c2cccc(C(O)(c3cccc4ccccc34)C3CCC3)c2)n[nH]1)c1ncccc1F', 'O=C(Nc1ncccc1OCc1ccccc1)Nc1cccc2ccccc12', 'CC(NC(=O)Nc1cccc(-c2cccc3ccccc23)c1)c1cccc(F)c1', 'Nc1cccc2cc(-c3noc(CN4Cc5ccccc5C4)n3)ccc12', 'Cc1ccccc1Oc1cccc(NC(=O)Nc2cccc3ccccc23)c1', 'O=C(Cc1ccc(-c2noc(CC3(Sc4ccccc4)CCC3)n2)c2ccccc12)OC1CCc2cc(F)ccc21', 'CCC(=O)c1cccc(-c2ccc(F)c(-c3nc(-c4ccnc(Oc5ccccc5Cl)c4)no3)c2)c1F', 'Cc1ccc(Cl)cc1CC1CCN(C(=O)Nc2cccc3ccccc23)CC1', 'Cc1cc(n2ncc3cc(-c4cccc5ccccc45)ccc32)cc(C)n1', 'O=C(NCc1ccccc1)Nc1ccc(-c2cccc3cc(O)ncc23)cc1', 'CC1(C)CNc2ncc(-c3cccc(OCC4CC(=O)Nc5ccccc54)c3)cc2O1', 'Cn1nccc1C1CCCN1C(=O)NCc1cccc(-c2cccc3ccccc23)c1', 'OC(c1cccc(OC2CCCC2)c1)c1cccc2ccc(CNc3cnn4ccccc34)nc12', 'Fc1cccc(F)c1-c1cc2cc3ccn(Cc4ccccc4)c3cc2[nH]1', 'Cc1cc2cc(-c3cccc(CN4CCCCC4Cc4ccccc4F)c3O)ccc2cn1', 'Cc1nc(Cl)c(-c2n~c3cc(-c4cccc(S(=O)(=O)C(F)(F)F)c4)ccc3o2)c(Cl)n1', 'O=C(Nc1ccc(Oc2ccnc3ccccc23)c(F)c1)c1cccc(F)c1Br', 'Cc1ccc(CNC(=O)N2CCC(c3cccc4ccccc34)CC2)c(C)c1', 'Cc1nc(Cn2cnc3cc(-c4cnc5ccccc5c4)cc(Cl)c3c2=O)ccc1C(=O)O', 'Fc1cccc(-c2ccc(-c3cccc4c3cnn4c3cccc4ccc(Cl)nc43)cc2)c1', 'O=C(Nc1cc(Cl)ccc1F)N1CCCCC1Cc1cccc(-c2cncc3ccccc23)c1', 'CCC(F)c1nc(-c2cccc3c(-c4cccc(C(F)(F)F)c4)cccc23)no1', 'O=C(Nc1ncc(-c2ccccc2)cn1)Nc1cccc2ccccc12', 'O=c1[nH]c(-c2ccnc(OCc3ccccc3)c2)nc2cc(Cl)ccc12', 'O=C(NCC1(O)Cc2ccccc2C1)c1cc(F)cc2cc(-c3cccc(Cl)c3)[nH]c12', 'O=C(NCc1cc2cc(F)ccc2s1)Nc1cccc2ccccc12', 'Oc1c(-c2noc(-c3cc4sccc4n3CC(F)(F)F)n2)cnc2c(Br)cccc12', 'Fc1cccc(-c2ccc3c(cnn3c3cccc4ccccc43)c2)c1', 'Cc1nsc(NC(=O)Nc2cccc(-c3cc(F)c4ccccc4c3)c2)n1', 'Cn1c(-c2cc(-c3ccc4ccccc4c3)n[nH]2)n~c2ccc(Br)cc21', 'CCn1c(-c2cc(-c3cc4c(ccc5[nH]ncc54)o3)n(C)n2)n~c2c(F)cccc21', 'CC(F)(F)c1ccc(-c2nc(-c3ccc(Oc4cccc5ccccc45)cc3)no2)nc1', 'CC(F)(F)c1cccc(COC(=O)c2c[nH]c3c(-c4ccc(F)c(C5CC5)c4)cccc23)n1', 'Cc1cc(Cl)cc2c1oc(-c1cccc(C#Cc3cccc(CC4CCCN4)c3)n1)n~2', 'Cc1nc(C(F)(F)F)ccc1-c1cccc(CC=CCCc2ccnc3ccccc23)c1', 'O=C(NCc1cccc(C2(C(F)(F)F)N=N2)c1)Nc1cccc2ccccc12', 'Cc1nn(-c2ccc3c(c2)CCC2=Nc4cc(Cl)ccc4C(=O)N23)c2ncccc12', 'O=C(Nc1cccc2c1OCCO2)N1CC(Cc2ccccc2C(F)(F)F)C1', 'Cc1cc(C2=NN(C3=C(c4cccc(F)c4)C(=O)NC3=O)N=N2)nc2ccccc12', 'Oc1c(-c2nc(-c3cccc(CC4CCCCN4)c3)no2)ncc2ccccc12', 'c1ccc(Cc2ccccc2-c2ccc3cc(-c4nn[nH]n4)ccc3c2)cc1', 'CC(C(=O)CC(=O)C1CCC(C2=NC(Cc3cc(Cl)ccc3F)=NN2)N1)c1ccccc1', 'O=C1NCCc2ccc(CNc3cccc(-c4nc(-c5ccccc5CC5CCCCCN5)no4)c3)cc21', 'NC(c1cccc(-c2noc(-c3ccc(C4CC(F)(F)C4)cc3)n2)c1)C(F)(F)F', 'Cc1ccc(CNC(=O)Nc2cccc(-c3cccc4ccccc34)c2)cn1', 'Cc1cccc(-c2cc3cc(OCC(=O)C45C(=O)NC4C4CCC5C4)ccc3[nH]2)c1', 'CC(C)(c1ccc(F)cc1)C1(NC(=O)Nc2cccc(-c3cccc4ccccc34)c2)CC1', 'O=C(c1ccc(-n2cc(-c3cccc(CBr)c3)cn2)c(F)c1)c1cccc2ccccc12', 'COC(=O)c1ccc(F)c(COc2ccc(-c3cccc4ccccc34)c(C(F)(F)F)n2)c1', 'Cc1cc(C(F)(F)F)n2ncc(-c3nc(-c4cccc(-c5ccc(F)cc5)c4)no3)c2n1', 'Cc1nc2c(-c3ccc(COC(=O)c4nc(C(F)(F)F)n5ccccc45)c(N)c3)cccc2o1', 'Cn1c(-c2csc(CC3CCNCC3)n2)n~c2cc(-c3cc4ccccc4cc3Br)ccc21', 'COc1c(Cl)cccc1-c1ccc(C(=O)O)c(NC(=O)Cc2ccc(F)cc2C(F)(F)F)c1', 'Cc1cc(Br)c2oc(-c3cccc(CNc4ccc(C5CC5)cc4)c3Br)n~c2c1']\n" ] } ], "source": [ "print(list(out['3pbl']['5000'].keys())[i])\n", "print(out['3pbl']['5000'][list(out['3pbl']['5000'].keys())[i]])\n", "i += 1" ] }, { "cell_type": "markdown", "id": "awful-thought", "metadata": {}, "source": [ "### Submit to leaderboard" ] }, { "cell_type": "markdown", "id": "disciplinary-paris", "metadata": {}, "source": [ "https://tdcommons.ai/benchmark/overview/#step-by-step-instructions" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }