{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# TDC 102: Data Functions\n",
"\n",
"[Kexin](https://twitter.com/KexinHuang5)\n",
"\n",
"Welcome to the TDC community! In this tutorial, we will cover the basics of TDC data functions and after this tutorial, you will be able to leverage most of the useful functions supported!\n",
"\n",
"We assume you have familiarize yourself with the installations and data loaders. If not, please visit [TDC 101 Data Loaders](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_101_Data_Loader.ipynb) first!\n",
"\n",
"First, we introduce data splits. The data spliting function splits data into training, validation and test set for machine learning practitioners to train, tune and evaluate their models. This function is called directly on the data loader class. It mainly takes in the following three inputs:\n",
"\n",
"* `method` the spliting scheme. TDC provides various spliting schemes to reflect realistic evaluations (details in section below). Default is random split.\n",
"\n",
"* `seed` the random seed. TDC has a benchmark seed for fair comparison, which is set in default.\n",
"\n",
"* `frac` the fraction of train/validation/test set, in default, it is set to be [0.7, 0.1, 0.2].\n",
"\n",
"As the default TDC data format is Pandas DataFrame, it will return a dictionary with key 'train', 'valid', and 'test' and value of each set's data frame.\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Drug_ID
\n",
"
Drug
\n",
"
Y
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
VLA-4 antagonist 3
\n",
"
S1CN(S(=O)(=O)c2cn(nc2)C)[C@H](C(=O)N[C@@H](Cc...
\n",
"
-5.17
\n",
"
\n",
"
\n",
"
1
\n",
"
Astilbin
\n",
"
O1[C@@H](C)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]1O[...
\n",
"
-6.82
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Drug_ID Drug Y\n",
"0 VLA-4 antagonist 3 S1CN(S(=O)(=O)c2cn(nc2)C)[C@H](C(=O)N[C@@H](Cc... -5.17\n",
"1 Astilbin O1[C@@H](C)[C@H](O)[C@@H](O)[C@@H](O)[C@@H]1O[... -6.82"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tdc.single_pred import ADME\n",
"data = ADME(name = 'Caco2_Wang')\n",
"split = data.get_split(method = 'random')\n",
"split['test'].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TDC also provides two realistic splits. First is for compound property prediction task, which is based on the scaffold of the molecules so that train/val/test set is more structurally different. Note that scaffold split requires RDKit package. You can find the installation instruction [here](https://www.rdkit.org/docs/Install.html). For example, to do scaffold split on Caco2 data, you can do:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"100%|██████████| 910/910 [00:00<00:00, 1029.21it/s]\n"
]
},
{
"data": {
"text/html": [
"
\n",
"\n",
"
\n",
" \n",
"
\n",
"
\n",
"
Drug_ID
\n",
"
Drug
\n",
"
Y
\n",
"
\n",
" \n",
" \n",
"
\n",
"
0
\n",
"
PNU-184421
\n",
"
Fc1cc(N2C[C@@H](OC2=O)CNC(=O)C)ccc1-c1ccc(nc1)C#N
\n",
"
-4.378375
\n",
"
\n",
"
\n",
"
1
\n",
"
PNU-184470
\n",
"
S=C(NC[C@@H]1OC(=O)N(C1)c1cc(F)c(N2CCCS(=O)(=O...
\n",
"
-4.639136
\n",
"
\n",
" \n",
"
\n",
"
"
],
"text/plain": [
" Drug_ID Drug Y\n",
"0 PNU-184421 Fc1cc(N2C[C@@H](OC2=O)CNC(=O)C)ccc1-c1ccc(nc1)C#N -4.378375\n",
"1 PNU-184470 S=C(NC[C@@H]1OC(=O)N(C1)c1cc(F)c(N2CCCS(=O)(=O... -4.639136"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"split = data.get_split(method = 'scaffold')\n",
"split['test'].head(2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In addition to scaffold split, we also include a cold-start split for multi-instance prediction problem such as DTI, GDA, DrugRes, MTI, where they present two entity types. It first splits on one entity type into train/valid/test and then move all pairs associated with that entity in each set as the final splits. To do that, first set `column_name` to be the entity you want to split on. For example, to do cold drug split on DTI prediction task:"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
}
],
"source": [
"from tdc.multi_pred import DTI\n",
"data = DTI(name = 'DAVIS')\n",
"split = data.get_split(method = 'cold_split', column_name = 'Drug')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we talk about several ways to do label manipulation. As an example, we load the Caco2 data from ADME in single instance preediction task:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
}
],
"source": [
"from tdc.single_pred import ADME\n",
"data = ADME(name = 'Caco2_Wang')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can visualize the label distribution by typing:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"scrolled": true
},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAA/e0lEQVR4nO3dd3yV5fn48c+VQQZkQEISkgBhI0vEyHAgdYKj1G+pda9ftcrX2dpdW9t+7f5+66iLtta9cYtWQFEURMLeECCQsBNIyJ7X74/nBE+SEzLIWTnX+/U6r+Tcz7qes67nuZ/7uW9RVYwxxoSuMH8HYIwxxr8sERhjTIizRGCMMSHOEoExxoQ4SwTGGBPiLBEYY0yIs0QQQERkkYh8z9fLupY/S0S2dHZ5D+v7QESud/1/g4h83oXrvlpEPuqq9XVgu2eIyDYRKRORb/l6+8Z4iyUCLxCRPBE5z99xNBKR+0WkVkRKXY+tIvJ3EenXOI+qLlbVEe1c1/NtzaeqM1T1mS6IPUtEVEQi3Nb9gqpecKLr7oTfAn9X1V6q+panGUTkKhHJcSWLfa6EeKa3AhKRySIyX0QOi8ghEXnN/X1tZZkpInJURMLdyv7RStkT3oq9ldjyRKTS9TktFpElInKriLTrt8rT58VLcfpkO75iiSB0vKKqcUAf4DIgDVjR1o9GR4mju36uBgIbWpsoIj8AHgR+D6QCA4DHgJlejKk3MAfIcsVXCvy7jWVygHBgglvZWcDeZmVTgc+6KtAOuNT1WR0I/BH4CfAvP8QROlTVHl38APKA8zyU9wbeAw4BR1z/Z7pNXwT8AfgKKAHeBvq4TZ8MLAGKgTXAtGbLfq+VeO4Hnm9WFu5ax19dz6cBBW7TfwLswflh2QKcC0wHaoBaoAxY47btB4AvgEpgqHs8wA2uaY+49mszcG5rr5d7vMBuQF3bKwOmuNb3udv8pwPLXeteDpze7HX5nWv7pcBHQPJx3rubgVzgMPAOkO4q3w40uPavDIhqtlyCq/w7x1n3RGCp6/3bB/wd6OE2fTQw37XtA8DPXeVROAlmr+vxYPPtu61jAlDajs/oQuCHrv9TgB3Ab5qVKZDZjrgVuBXYhvO5fhQQt8/Z/wKFwE7gdtf8Ee397ri23wCMcT2/GFgFHAXygfvd5vX0eRkCfAwUueJ4AUg83mfdVR4G/NT13hcBr+L6Pnrajr9/d07k0V2P3AJVGM7R2kCco8VKnC+Vu+uAm4B0oA54GEBEMoD3gf/BOaq/F5grIn07E4iq1uMkmrOaTxOREThf2NPUOTK7EMhT1Q9xjnZfUad65GS3xa4FbgHigF0eNjkJ58cmGfg18IaI9GlHqFNdfxNd21zaLNY+OK/Lw0AS8H/A+yKS5DbbVcCNOD9uPXBeuxZE5BycRHw50M+1Hy8DqOoQnC//pa44qpstPgWIBt48zr7UA/fgvAZTcJLrbNe244AFwIc47/1QnB9rgF/gHASMB07G+WH8ZSvbmMpxzlrcfMbXr+1U4HPXw71sp6oWHC9uN5cAp7niuxznMwNOYp3hin0C8K12xNaEqn4FFPD1Z7Uc53uSiJMUbnO7ZuPp8yI472s6cBLQH+dgo9XPumsdd7riPdu1bGOSa207QcsSgQ+papGqzlXVClUtxTmKPrvZbM+p6npVLQfuAy531dteA8xT1Xmq2qCq83FO8S86gZD24iSV5upxjkJHiUikquap6vY21vW0qm5Q1TpVrfUw/SDwoKrWquorOEdeF59A7I0uBrap6nOubb+Ec8Zxqds8/1bVrapaiXNUN76VdV0NPKWqK10/9D8DpohIVjviSAIKVbWutRlUdYWqfumKMw94kq/f/0uA/ar6v6papaqlqrrMLa7fqupBVT2Ec+R+bfP1i8g44FfAj9oR76fAmSIiOD+wi3GO+ie7lX3ajrgb/VFVi1V1N/AJX7/GlwMPqWqBqh7BqerpjGOfVVVdpKrrXN+DtcBLHuI5RlVzVXW+qla7Xr//c5v/eJ/17wO/cMVejZM8ZnWX6wLuLBH4kIjEisiTIrJLRI7iHJUlul+gwznVbbQLiMQ5EhsIfMd1Aa1YRIqBM3GOXDsrA6caoglVzQXuxvngHxSRl0UkvY115bcxfY+6zqldduEcZZ2odFqegezC2bdG+93+rwB6tWddqlqGUyWQ0cr87oqA5OP9SIjIcBF5T0T2u97/3+O8t+AcpbaWbJvvY4vXTkSGAh8Ad6nq4nbE+yXO6zAG5+h2sWt/893KPmtH3I1ae43TafrZaOtz0ppjn1URmSQin7gujpfgVEs1j+cYEUlxfYb3uOJ/vnH+Nj7rA4E33b5vm3ASR2on9yFgWSLwrR8CI4BJqhrP16eX4jZPf7f/B+DUxxfifIGeU9VEt0dPVe3UEZbrgu6lOEeCLajqi6p6Js6XQYE/NU5qZZVtdWOb4TrSbDQA5ygPnFP9WLdpaR1Y715XjO4G4NT5dlSTdYlIT5wj/fasaylQxfGrPh7HOVsZ5nr/f87X730+Tl12m3HR9LVDRAbiVCv9TlWfa0esqGoVzvWUS4B+qrrZNWmxq2wcX18oPl7cbdmHc52hUf/WZmyNiJyGkwgamyC/iHP9pr+qJgBPuMXj6fPyB1f5OFf817jNf7zPej4wo9l3LlpV97SynaBlicB7IkUk2u0RgVN/XgkUu+q2f+1huWtEZJSIxOI0V3zdVZ//PHCpiFwoIuGudU4TkUwP62iViESKyEk4p9NpOKfJzecZISLniEgUzo9bJc6REDgXMbM60TIoBbjTtf3v4NTVznNNWw1c4ZqWDcxyW+4QzoXCwa2sdx4w3NVsM0JEvguMwrkQ31EvAjeKyHjXvv8eWOaqDjkuVS3BqZZ5VES+5Tr7ixSRGSLyZ9dscTgXOMtEZCRwm9sq3gPSRORuEYkSkTgRmeSa9hLwSxHpKyLJru08D8euHX0MPKqqHW3q+RnO0fASt7LPXWX73apIjhd3W14F7hKRDBFJxLkw2y4iEi8il+Bcp3leVde5xXNYVatEZCLONaBGnj4vcTgXdItdr9exqrM2PutPAA+4Ei2u17+xBVhbn8ugYonAe+bhfKgaH/fjtPaIwTnC/xLnwmBzzwFP45xqR+NcsEJV83GaIf4c50OYj/OBbu97+F0RKcNp+fEOTlXGqaq618O8UTh1uYWuOFJc2wV4zfW3SERWtnPbAMuAYa51PgDMUtUi17T7cI6Gj+DUf7/YuJCqVrjm/8J1ij7ZfaWudVyCc7ZVBPwYuERVCzsQW+O6FrpimYtzJDsEuKIDy/8f8AOcC7mN79HtwFuuWe7F+dEqBf4BvOK2bClwPs5Z2n6cFjjfcE3+H5zrQWuBdcBKVxnA93B+jH4tzr0LZa73uT0+xXlv3W/2+9xV5t5stNW42+EfOC211uK09JmH0wii/jjLvCsipTiv3y9wDlZudJs+G/ita55f4SQboNXPy29wLlSX4DQseMNtXcf7rD+E8135yLWtL3EaPbT5uQw2jU28jDHG60RkBvCEqjavzjN+ZGcExhivEZEYEbnIVW2XgVMderwmtsYPLBEY002JyAb36iK3x9W+DAOnauYITtXQJpzqHBNArGrIGGNCnJ0RGGNMiAu6O+SSk5M1KyvL32EY4zNbipzewUckNe0cdour0/ARbfYZawysWLGiUFU9dkkTdIkgKyuLnJwcf4dhjM9Me3oaAItuWNS03ClmUdNiYzwSEU99gDnTgu0aQXZ2tloiMP7wyCOPkJub6/PtloaVAhDXENekfOdO51aJQYNa7V2hSw0dOpQ77rjDJ9syXU9EVqhqtqdpQXdGYIy/5Obmsnr9Jupj29NpqjdUNHkWXuHcj7dix/Huzeoa4RUtuqQy3YglAmM6oD62D5UjT6TD1447HOF0QtqnblKT8siVHwNQO/Icr8cQs3le2zOZoGWJwJgAVxjp/OA3TwS1ZTH+CMd0Q9Z81BhjQpwlgiD1yCOP8Mgjj/g7DGO6vVD4rlnVUJDyR+sVY0JRKHzX7IzAx4qKirjzzjspKipq9/SioiJmz57Nbbfd1upyxhjTWZYIfOyZZ55h3bp1PPvss+2e/swzz7Bx40Y2bdrU6nKm+xpceTuDK29vUR6dVEJ0UokfIjLdjSUCHyoqKuLDDz9EVfnwww9bHN17mt5Y1uiDDz6ws4IQE0EcEcS1KJfwBiS8wQ8Rme7GrhH40DPPPENDg/PFra+v59lnn+Wee+457nRVpba29tg8tbW1PPvss+zZs4fKykruuusu3+5ECMvNzSWsxvd34hdFOMNKJ9Wd1aS8rtx3zUfDqo6Sm1sakp+33NxcYmK6d1PdoDgjEJFbRCRHRHIOHTrk73A6bcGCBdTV1QFQV1fH/Pnz25y+YMEC3LsBUdUWy5nurSjyc4oiP29RXlseTW15tB8iMt1NUJwRqOocYA44fQ35OZxOO++885g3bx51dXVERERw/vnntzldVXn33XePJQMR4fzzzycvLw+Ahx56yNe7EbLuuusuVuw44O8w/KIhOp6hg1ND8vMWCmdBQXFG0F1cf/31hIU5L3l4eDjXXXddm9Ovv/56IiMjj80TGRnZYjljjDkRlgh8KCkpienTpyMiTJ8+naSkpDanN5Y1mjFjRovljDHmRARF1VB3cv3115OXl9fqUb2n6ddffz25ubmoqp0NGGO6nCUCH0tKSuLhhx/u0PSkpCQee+yxJmVDhw71Snwm8Ayt/IHH8pjkYgAqfRhLKAqF75olgiBlA4SEjjCiWpkQtO0mgkoofNcsERgT4A5FLASgb925TcprS2P9EY7phiwRGBPgjkR+BbRMBHWVrZwpGNNB1mrIGGNCnJ0RGNMB4RWHfT5sY9gIp2+pmC1Nt1tZ7/Q/5It4nDGLU72+HeMflgiMaSd/tR7Ji+4BwKmDm/4Qf5qf7Cov9EEUqSHReiZUiXs/NsEgOztbc3Jy/B2GMT4z7elpACy6YVHTcqeYRU2LjfFIRFaoarbHacGWCETkELDL33G4JAO+OBzzF9u/4Gb7F9y6ev8GqmpfTxOCLhEEEhHJaS3Ddge2f8HN9i+4+XL/rNWQMcaEOEsExhgT4iwRnJg5/g7Ay2z/gpvtX3Dz2f7ZNQJjjAlxdkZgjDEhzhKBMcaEOEsExhgT4iwRGGNMiLNEYIwxIc4SgTHGhDhLBMYYE+IsERhjTIizRGCMMSHOEoExxoQ4SwTGGBPiLBEYY0yIs0RgjDEhzhKBMcaEuAh/B9BRycnJmpWV5e8wjPGZLUVbABiRNKJpuVPMiBHNlzCmpRUrVhS2NmZx0CWCrKwscnJy/B2GMT4z7elpACy6YVHTcqeYRU2LjfFIRHa1Ns2qhowxJsR5NRGIyHQR2SIiuSLy01bmmSYiq0Vkg4h86s14jDHGtOS1qiERCQceBc4HCoDlIvKOqm50mycReAyYrqq7RSTFW/EYY4zxzJvXCCYCuaq6A0BEXgZmAhvd5rkKeENVdwOo6kEvxmNMUJp39TzP5Z6LQ0JtbS0FBQVUVVX5O5SAEx0dTWZmJpGRke1expuJIAPId3teAExqNs9wIFJEFgFxwEOq+qwXYzIm6MRGxnou91wcEgoKCoiLiyMrKwsR8Xc4AUNVKSoqoqCggEGDBrV7OW9eI/D07miz5xHAqcDFwIXAfSIyvMWKRG4RkRwRyTl06FDXR2pMAHts+WM8tvyxluWPOY9QVFVVRVJSkiWBZkSEpKSkDp8pefOMoADo7/Y8E9jrYZ5CVS0HykXkM+BkYKv7TKo6B5gDkJ2d3TyZGNOtvbrhVQBmnza7ablTzOzZzZdonxeX7e7wMldNGtC5jXmBJQHPOvO6ePOMYDkwTEQGiUgP4ArgnWbzvA2cJSIRIhKLU3W0yYsxGWOMacZriUBV64Dbgf/g/Li/qqobRORWEbnVNc8m4ENgLfAV8E9VXe+tmIwxpiuICNdee+2x53V1dfTt25dLLrmkQ+uZNm3asRtkL7roIoqLi7syzHbz6p3FqjoPmNes7Ilmz/8C/MWbcRhjTFfq2bMn69evp7KykpiYGObPn09GRsYJrXOeH5uBBV0XE8YY4+43725g496jXbrOUenx/PrS0cedZ8aMGbz//vvMmjWLl156iSuvvJLFixcDUF5ezh133MG6deuoq6vj/vvvZ+bMmVRWVnLjjTeyceNGTjrpJCorK4+tr7H7nOTkZL71rW+Rn59PVVUVd911F7fccgsAvXr14q677uK9994jJiaGt99+m9TU1BPeX+tiwpgAt+iGRS36GQKnjyHrZ8h/rrjiCl5++WWqqqpYu3YtkyZ93Tr+gQce4JxzzmH58uV88skn/OhHP6K8vJzHH3+c2NhY1q5dyy9+8QtWrFjhcd1PPfUUK1asICcnh4cffpiioiLASTCTJ09mzZo1TJ06lX/84x9dsi92RmCMCWptHbl7y7hx48jLy+Oll17ioosuajLto48+4p133uGvf/0r4DR33b17N5999hl33nnnseXHjRvncd0PP/wwb775JgD5+fls27aNpKQkevTocew6xKmnnsr8+fO7ZF8sERgT4P66xPkxuff0e5uWO8Xce2/zJYyvfPOb3+Tee+9l0aJFx47awbmxa+7cuYzw0Ed4W807Fy1axIIFC1i6dCmxsbFMmzbt2H0BkZGRx5YPDw+nrq6uS/bDqoaMCXDvbX2P97a+17L8Pedh/Oemm27iV7/6FWPHjm1SfuGFF/LII4+g6tz2tGrVKgCmTp3KCy+8AMD69etZu3Zti3WWlJTQu3dvYmNj2bx5M19++aWX98ISgTHGdFpmZiZ33XVXi/L77ruP2tpaxo0bx5gxY7jvvvsAuO222ygrK2PcuHH8+c9/ZuLEiS2WnT59OnV1dYwbN4777ruPyZMne30/rGrIGGM6qKysrEXZtGnTmOYaLSgmJoYnn3yyxTwxMTG8/PLLHteZl5d37P8PPvigze3OmjWLWbNmdSDq1tkZgTHGhDg7IzAmwMVExngu91xsTIdZIjAmwH1wtedqglZqD05IeXUdK3cfIfdgGVW19cT2iGBcZgJjMhKIDLcKhO7KEoExhgZVlm4vYuHmA1TVNpASF0VCTCSHyqp5bUUBi7Ye4uoA6nnUdC1LBMYEuN99+jsA7jv7vqblTjH33dd8iY6prq3ntRUFbNx3lOGpvZgxph+p8dGA0x5+y4FS5q4o4LFF26mpayCzd/tHxAmkbqtN6+xcz5gAt3DnQhbuXNiyfKHzOBGVNfX864udbN5/lIvH9uP6KVnHkgA4Nz+NTIvn9nOG0bNHOM8t3UVxRc2JbdQEHEsExoSo6rp6/r1kJ/uKq7h60kDOGJrc6l2vCTGRXDsli5r6Bl5enk+Dhvb4UF3VDXWgsERgTAhqaFBeX1HAniOVXDVpACf1i29zmbT4aC4dl87uwxUszzvsgygDl3s31ECXdEPtT5YIjAlBj3ycy4a9R5k+Jq1dSaDRKQMSGZzck/9s2E9pVa0XI+yYadNaPhrHc66o8Dz96aed6YWFLae1R2M31MCxbqgblZeXc9NNN3Haaadxyimn8PbbbwPOTWNnnXUWEyZMYMKECSxZsgRw+heaNm0as2bNYuTIkVx99dXHuqfwBUsExgS4pNgkkmKTWpYnOY+O+nD9fv62YCun9E/kzKHJHVpWRJg5PoPq2gYWbyvs+Ma7kc50Q52SksL8+fNZuXIlr7zyyrGeSMHpj+jBBx9k48aN7Nixgy+++MJn+2KthowJcHMvn+u53HPxcW3ef5QfvLqa8f0T+dYpGZ0a6LxvXBTj+yeybGcRZw1LJi46suOBdLHjjcsQG3v86cnJnRvXoTPdUKenp3P77bezevVqwsPD2bp167FlJk6cSGZmJgDjx48nLy+PM888s+OBdYIlAmNCxOHyGr73TA69oiJ48tpTWbjpYKfX9Y2RKazOL2bxtkIuGtuvC6MMLh3thvr+++8nNTWVNWvW0NDQQHT01y20oqKijv3flV1Mt4dXq4ZEZLqIbBGRXBH5qYfp00SkRERWux6/8mY8xgSjny34GT9b8LOW5T9zHu1RW9/A7BdWcLC0mjnXZTdpItoZyb2iGJeZwPK8w1TX1Z/QuoJZR7uhLikpoV+/foSFhfHcc89RXx8Yr53XEoGIhAOPAjOAUcCVIjLKw6yLVXW86/Fbb8VjTLBaWrCUpQVLW5YvdR7t8dt3N/LljsP86dtjGd8/sUvimjI4ieq6BlbnF3fJ+oJRR7uhnj17Ns888wyTJ09m69at9OzZ09che+TNqqGJQK6q7gAQkZeBmcBGL27TGNPMi8t289yXu7hl6mAuOyWzy9bbv08s6QnRLNtxmIlZfTp1vSFYdbYb6mHDhjUZjOYPf/hDi2UB/v73v3dtwG3wZtVQBpDv9rzAVdbcFBFZIyIfiIjHwUdF5BYRyRGRnEOHDnkjVmO6pa92HuZXb6/n7OF9+cn0kV26bhFh0uAk9h+tYldRRZeu2/iWNxOBp8OD5g1jVwIDVfVk4BHgLU8rUtU5qpqtqtl9+/bt2iiN6abyCsv5/nM5DOgTy8NXnkJ4WNcfsZ+cmUiP8DBW5R/p8nUb3/FmIigA+rs9zwT2us+gqkdVtcz1/zwgUkQ61rDZmG4uMz6TzPiWVTqZmc7DkyPlNdz49HJEhKduOI2EGO808ewREcao9HjW7Smhtr7BK9tojS9vuAomnXldvHmNYDkwTEQGAXuAK4Cr3GcQkTTggKqqiEzESUxFLdZkTAh7/r+e91zuuZjqunq+/9wK9hRX8uL3JpGV7N0Lkqf0T2R1fjFb9pcyJiPBq9tqFB0dTVFREUlJSSF1baItqkpRUVGTZqnt4bVEoKp1InI78B8gHHhKVTeIyK2u6U8As4DbRKQOqASuUEvzxnRaXX0DP3h1DV/lHeaRK08hO6uP17c5JKUXcVERrMov9lkiyMzMpKCgALtm2FJ0dPSxG9Pay6s3lLmqe+Y1K3vC7f+/A769PG5MkLn7w7sBeHD6g03LnWIedBXXNyg/fG0N76/dx88vGsmlJ6f7JL4wEcZlJrBs52Gqa+uJigz3+jYjIyMZNGiQ17cTKuzOYmMC3Or9qz2XuxXXNyg/em0Nb6/ey48uHMEtU4f4JLZGo9MT+GJ7EZsPlHJyZqJPt21OnHU6Z0yQq61v4EevreGNVXv4wfnD+e9vDPV5DAOSYukVFcGGPSU+37Y5cXZGYEwQq29o4Kanl7N4WyE/PH84d5w7zC9xhIkwKj2eVbuPUFvfYAPdBxl7t4wJUjX1DWzYe5Ql24v407fH+i0JNBqTnkBtvbLtQMu7bk1gszMCYwLc8KThLcq2Hihle/VRNL6Bf12fzbQRKX6IrKlByT2Jighjy4GjjEpv/2A3xv8sERjjAy8u292h+a+aNODY/3MundNk2pc7irj52RwGzAzn3zecdqzJZke30dXCw4ShKb3YeqAMVbX2/UHEqoaMCSLvr93Hdf/6ipS4KN6cfbrP2u2314jUOEoqazlwtNrfoZgOsDMCYwLcLe/eAsDpfX7Gb97byIQBvfnX9dn8+O4eAMyZc7ylfWt4ahzgVF2lJZzYmAfGdywRGBPgthZtZc+RSj46sJELRqXy8JWnEB0ZjtsohwEjPiaSfgnRbDlQytTh1kFksLCqIWMC3L6SSvKPVPBfEzJ4/JpTifbBnbsnYnhqHLuKyqmqDYzRt0zbLBEYE8CeW5rHrqIKknpF8edvj/NKV9JdbXhqHA0KuQetGWmwsERgTIB6a9Ue7nt7A71jezC0by8iguQmrQF9YomODGPrgVJ/h2Laya4RGBOA1uQX8+O5a5k0qA99+59JhIczgfHjfR9XezjNSOPYeqDUmpEGCUsExgSY0qpavv/cCvr2iuLxa06lT88pHudr7HU0EI1IjWP9nhI27Su1m8uCQHCcaxoTIuoaGnhx2W6KK2uYc92p9OnZw98hdcrw1F4AfLrVxgsIBpYIjAkgH6zbz67DFfxl1smMTnduFrvmjWu45o1rWsx7zTXOIxDFRUeSEhfFku2F/g7FtINVDRkTILbsP8rSHUWcPiSpyaAyBUcLPM5f4Lk4YAxJ6cXyvMNU19UTFRHYTV5DnZ0RGBMASqtqeX3lHtLio7lwdJq/w+kSQ/v2oqq2gVW7i/0dimmDV88IRGQ68BDOmMX/VNU/tjLfacCXwHdV9XVvxmRMoFFV5q4soLq2nu+dOYjI8LAmHcgddPXb07xTuYNHU1zlB30XbAdkJfUkTGDJ9iImD07ydzjmOLx2RiAi4cCjwAxgFHCliIxqZb4/4Qxyb0zI+XJHEVsPlDFjTBqp8d2nf56YHuGMzUxkSa5dJwh03qwamgjkquoOVa0BXgZmepjvDmAuEJiHNcZ40f6jVXywfj8jUuNaPWoemjiBoYkTWpaPrWbo2MDu5fP0IUmszi+mvLrO36GY4/Bm1VAGkO/2vACY5D6DiGQAlwHnAKd5MRZjAk5tfQOvLs8nKjKc/5qQ0eqNV1eM+Inn8tmBPz7wGUOSeXzRdr7KO8w3AmDwHOOZN88IPH2qtdnzB4GfqOpxe6cSkVtEJEdEcg4dsnbJpnuYt24f+49WMWtCBnHRkf4Oxyuys3rTIzzMqocCnDfPCAqA/m7PM4G9zebJBl52HQklAxeJSJ2qvuU+k6rOAeYAZGdnN08mxgSdtQXFLNt5mLOGJjMi7fh33j646vsA3H3Kk03Lf5rslP8xcH9koyPDmTAwkSXbi/wdijkOb54RLAeGicggEekBXAG84z6Dqg5S1SxVzQJeB2Y3TwLGdDdFZdW8uWoP/XvHcEE7moqW1RRTVlPcsrwkjLKSwG8BfsaQZDbuO8qR8hp/h2Ja4bUzAlWtE5HbcVoDhQNPqeoGEbnVNf0Jb23bmI44kfGEO6q2voGXl+cTJsIVEwcERbfSJ+r0oUn873xYuqOIi8b283c4xgOv3kegqvOAec3KPCYAVb3Bm7EY42+N9wvsKa7kmkkD6R0bnP0IddS4zER69ghnyfZCSwQBKvDPK43pJj7aeIC1BSVcMCo1pHrkjAwPY9LgJJbk2nWCQGV9DRnjAws2HeDTrYc4LasPZ3dwLN/RSad7Lj+tqitC84nThyTx8eaD7CuppF9CjL/DMc1YIjDGi2rqGrj/3Q18vPkgpw7szczx6R0eqOWyoXd5Lr/paFeE6BOnD3FaOC3JLeLbp2b6ORrTXLuqhkRkrohcLCJWlWRMO23ce5TvPLGEF5ftZuqwvlx2SgZhITpa18i0OPr07MEX1i11QGrvGcHjwI3AwyLyGvC0qm72XljGBKeGBmVVfjHPLc3j3bX7SIyJ5LGrJ1BcUdvpdf4p5zoAfpL9bNPyu50qpp88GPg3WYaFCVNc1wls+MrA065EoKoLgAUikgBcCcwXkXzgH8Dzqtr5T7kxQaywrJp1e0rYsKeE9XuOsmL3EQ6VVtMrKoIbTs/ijnOGkhjbo8NNVN3V1nvuT6i2Orh+TE8fmsT76/axs7CcwX17+Tsc46bd1whEJAm4BrgWWAW8AJwJXA9M80ZwxgSiVbuPMHdlAYu2HKLgSOWx8kHJPZkyOIlzRqZwzkkpxHfTbiM6q/E6wRfbiywRBJh2JQIReQMYCTwHXKqq+1yTXhGRHG8FZ0wg2VdSybtr9pJXVEFMZDhnDkvmhtOzGJORwKj0ePvhb0NWUizpCdEs3V7ItZMH+jsc46a9ZwT/dN0cdoyIRKlqtapmeyEuYwKGqvJ5biH/2bCfmMhwfn3pKC7P7k/PKGt01xEiwpQhyXy8+QANDUpYCNxVHSza+0n+H5rdIQwsBVp2km5MN9Kgyjtr9vLVzsOMSY/nW+MzuPGMQT6N4ZSUczyXn1npsTyQnTE0ibkrC9i0/yij0xP8HY5xOW4iEJE0nHEFYkTkFL7uWjoeiPVybMb4lary/rp9fLXzMFOH9eWC0al+af558aDvey6/utTHkZw49/sJLBEEjrbOCC4EbsDpQvr/3MpLgZ97KSZjAsIXuYUs3V7EGUOSuHB0qjV57AJpCdEM7tuTL7YXcvPUwf4Ox7gcNxGo6jPAMyLybVWd66OYjPG7XUXlfLhhP6PT45kxtp9fk8D/LPsuAL+c9ErT8tucEb9++XhwjfJ6xpBk5q4soLa+gchwu0c1EBz3XRCRa1z/ZonID5o/fBCfMT5XVVvPK8vzSYiJ5NsTMkP2bmBvOX1IEhU19azJL/Z3KMalrXTc0/W3FxDn4WFMt/Phhv2UVNZyxWkDiI4M93c43c7kwUmIYKOWBZC2qoaedP39jW/CMca/8grL+WrnYc4YkkT/PtYewht69+zBqH7xfJFbyJ3nDvN3OIb2dzr3ZxGJF5FIEVkoIoVu1UbGdAsNqry7di+JMZGcNyrV3+F0a2cMTWbV7mIqa+r9HYqh/fcRXKCqPxaRy3AGpf8O8AnwvNciM8bH1hYUs6+kisuz+xMV0XqV0In0G9QZk/pd7Ln83AqfxtGVpgxJYs5nO8jZdZizhnVsfAbT9dqbCBrvnb8IeElVD1tTOtOd1NU3MH/jAdITohmXGVjt288fcJ3n8lllPo6k60zM6kNEmLBke5ElggDQ3rZb74rIZiAbWCgifYE2h0cSkekiskVEckXkpx6mzxSRtSKyWkRyROTMjoVvTNdYtvMwRypquXBMWsC1Eqqur6S6vuVdxNVVQnVVYMXaXj2jIhjfP5EluTY+QSBoVyJQ1Z8CU4BsV5fT5cDM4y0jIuHAo8AMYBRwpYiMajbbQuBkVR0P3AT8s0PRG9MFqmrr+WTLQYam9GJYSuA1hvtLzg38JeeGluX39OUv9wTv0fTpQ5NZt6eEkkrrxd7fOnI3x0nAd0XkOmAWcEEb808EclV1h6rWAC/TLHmoapmqqutpT0Axxsc+23aIipp6Lhyd5u9QQsoZQ5JoUFi2w5qR+lt7u6F+DhgCrAYaL/Mr8Gxry+D0UZTv9rwAmORh3ZcBfwBSAI9XxUTkFuAWgAEDBrQnZGPapaSiliXbixibkUBGog2q3tWOd2G9rr6ByHDhqS/yKCyrOVZ+1ST7jvtaey8WZwOj3I7e28NT5WWL5VX1TeBNEZkK/A44z8M8c4A5ANnZ2XbWYLrMs0vzqKlrYNqI4K1iCVYR4WEMSu5J7sHg6zyvu2lv1dB6oKPnzQVAf7fnmcDe1mZW1c+AISKS3MHtGNMpFTV1PPXFTkamxdEvwc4G/GFYShyFZTUcKa9pe2bjNe09I0gGNorIV8CxAVRV9ZvHWWY5MExEBgF7gCuAq9xnEJGhwHZVVRGZAPQArMLQ+MRLX+VzpKKW72b3b3tmPzorY5bn8ovLfRxJ1xuW2gvWwdaDpUwalOTvcEJWexPB/R1dsarWicjtwH+AcOApVd0gIre6pj8BfBu4TkRqgUrgux2sfjKmU6rr6vnHZzuYPLgPA5J6tr2AH52d+R3P5ZcEfyLo2yuKxJhIth0os0TgR+1KBKr6qYgMBIap6gIRicX5cW9ruXk0G9nMlQAa//8T8KeOhWzMiXtz5R72H63iL98ZR/7hwB7pq7TmMABxPfo0LS92anbjEht8HlNXERGGpfZibUEJ9Q1KuA1f6Rft7WvoZuB14ElXUQbwlpdiMsar6uobePzT7YzLTODMoYF/SeqhVbfx0KrbWpb/LJmHfhb48bdlWEoc1XUN7D4cvF1mBLv2Xiz+b+AM4CiAqm7Dae5pTNCZt34/u4oqmD1tiI06FgCGpvQiTGCbtR7ym/YmgmrXTWEAiEgEdvOXCUKqymOf5DKkb08uGGU3kAWC6Mhw+veOZduB4O07Kdi1NxF8KiI/xxnE/nzgNeBd74VljHd8vPkgm/eXMnvaUMKsPjpgDEvtxd7iSsqq6/wdSkhqbyL4KXAIWAd8H+cC8C+9FZQx3qCqPPpJLhmJMXxzfLq/wzFuhqXEoUDuQTsr8If2thpqEJG3gLdU9ZB3QzLGO5btPMzK3cX8bubooBo0/dwBnseAOve/us+PZkbvGGJ7hLP1gF0n8IfjJgJxrqT9Grgdp8sIEZF64BFV/a0P4jOmyzz6SS7JvaL4ToDfQNbclH6Xei4/v/u0sgkTYXhqHFsPlFozUj9o67DobpzWQqepapKq9sHpOO4MEbnH28EZ01VW7T7C4m2FfO+sQUE3IH1R5V6KKlv2zlJ0IJyiA8G1L8czMi2Oipp6Vu4+4u9QQk5bieA64EpV3dlYoKo7gGtc04wJCo98nEvv2EiunTzQ36F02ONr7+HxtS2Pux6/P4nH7+8+d+MOT40jXIQFGw/4O5SQ01YiiFTVFkMIua4TRHqY35iAs66ghI83H+R7Zw2mZ1R7e1UxvhYdGc6g5J4s2GSJwNfaSgTH6xLQugs0QeHhj7cRHx3BdVOC72wg1IzsF8f2Q+XsLAz+fpSCSVuJ4GQROerhUQqM9UWAxpyIjXuPMn/jAW46cxBx0XYSG+hOSosHYKGdFfjUcROBqoararyHR5yq2rfKBLy/f7KNuKgIbjx9kL9DMe3Qu2cPRqTGWfWQj1mFqem21hWUMG/dfu44ZygJscF73HLRoJs9l1/VPdvcnzcqhSc+3UFJRW1Qv2/BJHjuqjGmg/78n830jo3k5qmD/R3KCZmQch4TUlqM4MqEsyqZcFZgd6HdGeeelEp9g7Jo60F/hxIyLBGYbunzbYUs3lbIf39jKPFBfm1gb9l29pZtb1m+K4K9u7rfSf34zESSe/VgvjUj9RlLBKbbaWhQ/vThZjISY7i2G7QUemrDz3lqw89blv+xD0/9sY+HJYJbWJhw/qg0Pt58kKraen+HExK63+GECWkvLtvN2oJi1u0pYdapmcxdscffIZlOuGhsGi99tZtFWw4xfYx1F+5tdkZgupW6+gbmbzxAWnw04/sn+jsc00lTBifROzaSD9bv83coIcGriUBEpovIFhHJFZGfeph+tYisdT2WiMjJ3ozHdH+LcwspKq9h+pg0wmz0saAVER7GhaPTWLjJqod8wWuJQETCgUeBGcAo4EoRGdVstp3A2ao6DvgdMMdb8ZjuL/9wBYu2HGR0ejzDU+P8HY45QReN7UdZdR2fbbWe773Nm9cIJgK5rk7qEJGXgZnAxsYZVHWJ2/xfAplejMd0c795dyOCcPHYfv4OpUt9a8gdnstvLPFxJL41ZUgSibGRzFu3jwtG23UCb/JmIsgA8t2eF+B0Yd2a/wd84GmCiNwC3AIwYMCArorPdCMLNx1gwaYDTB+dRmJsD3+H06XGJJ/puXxitY8j8a3I8DAuGJXKvHX7qaqtD7ruw4OJN68ReKqg9TjgvYh8AycR/MTTdFWdo6rZqprdt2/fLgzRdAcVNXXc/+4Ghqb04vSh3adb5kZ5RzeQd3RDy/KtkeRtDe57JNrSWD20eFuLTpBNF/JmIigA3IeCygRajK4hIuOAfwIzVbXIi/GYbuqB9zdRcKSSB741hoiw7tcQ7vlNv+X5TS0HBHz+b715/m+9/RCR75wxNJmEGKd6yHiPN781y4FhIjJIRHoAVwDvuM8gIgOAN4BrVXWrF2Mx3dQnWw7ywrLdfO/MQUwa3P3OBkJdZHgY00en8dGG/VTU1Pk7nG7La4lAVetwxjr+D7AJeFVVN4jIrSJyq2u2XwFJwGMislpEcrwVj+l+jpTX8OPX1zIiNY4fXjDC3+EYL7lsQgblNfV8tMG6nPAWr95ZrKrzgHnNyp5w+/97wPe8GYPpnlSVX761nuKKGp6+8TS7kNiNvLhsd5PnDaokxkby6Ce5VNS0vKfgqknWgOREdb8KVRMSnv9yF++v28c95w9ndHqCv8MxXhQmwin9E8k9WMbRqlp/h9MtWV9DJujk5B3mN+9u5NyRKdw6dYi/w/G6y4f/yHP5bcW+DcSPTunfm0+2HGJNfjFnDbOWg13NEoEJKgePVnHbCyvJ7B3D/313PGFh3b8bieG9sz2XjwudYcOT46Lo3zuGVbstEXiDVQ2ZoFFdV8/sF1ZSVlXHk9dmkxDTvdvQN9p6JIetR1q2o9i6tgdb13avm+eOZ/yA3uw/WsW+ku43GI+/2RmBCWiNFw4bVHk1J5+1BSVccVp/Vuw6wopdR/wcnW+8uvUvAPxy0itNyx9PdMofD42RvMZlJDBv7T5W7S6m39gYf4fTrdgZgQkK/1m/n7UFJVw4Oo1xmYn+Dsf4Qc+oCEakxbE6v5j6Bo+dFJhOskRgAt4XuYUszi1k8uA+TB2W7O9wjB9lZ/WmrLqOjfuO+juUbsUSgQloq3YfYd66fYzqF88l49IRG2MgpA1PjSMxJpLlOw/7O5RuxRKBCVjvr93H6ysKGJTck++e1t8GmjGEiZCd1ZvcQ2UUlXXv3ld9yS4Wm4D00Yb93PXyKgYkxXLdlCwiw0P3mOWak37lufye0LhY3lz2wD58vPkgy/MOM31M9xp7wl8sEZiAs3DTAW5/cRWjMxKYeXI6PSJCNwkAZMWP9lw+PDTvso2PiWRkWjwrdh3hvFGp/g6nWwjtb5gJOB+s28f3n1vBSf3iePbGidaHELC+8HPWF37esvyrKNZ/FeWHiPxv4qA+lNfUs3GvXTTuCnZGYHymeWdiza3JL+a1Fflk9o5l5vgM3rc+6AF4a/sjQMuRyt76t9PH0piJoXEfgbuhKb3oHRvJMrto3CXsjMAEhJW7jvBqTj4Dk3py4+lZdiZgjitMhImDkthZWG5nBV3AEoHxu692HmbuygKGpPTi+ilZRFkSMO0wMasPkeHCvz7f6e9Qgp4lAuNXS7YX8tbqPQxPjePayQND/sKwab+YHuGcOrAP76zZw8GjVf4OJ6jZt874zadbD/HeWudmsasnDQjpJqKmc84YkkRdg/Ls0l3+DiWo2cVi43Oqyocb9rN4WyHjMhP4zqn9CQ+B7qQ766bRv/dc/lO7UJrUK4rzT0rl+WW7mP2NIcT2sJ+0zvDqIZiITBeRLSKSKyI/9TB9pIgsFZFqEbnXm7GYwNCgypur9rB4m9N30OXZlgTakt5rCOm9Wg7Akz6wjvSBNqD7zVMHU1xRy9yVe/wdStDyWiIQkXDgUWAGMAq4UkRGNZvtMHAn8FdvxWECR119Ay99tZucXUf4xogULh2Xbt1GtMPKgwtYeXBBy/LFMaxcbN0xZw/szcmZCfxr8Q7rlbSTvHlGMBHIVdUdqloDvAzMdJ9BVQ+q6nIgNG+RDCFHq2p5ZmkeG/Ye5eKx/Th/VKp1INdO83b+g3k7/9Gy/MU45r0Y54eIAouIcOvZQ8grquDdNXv9HU5Q8mYiyADy3Z4XuMo6TERuEZEcEck5dOhQlwRnfCf/cAXffmwJOwvLmXVqJmcMta6kTde6cHQaI9PieHjhNjsr6ARvJgJPh3udeodUdY6qZqtqdt++Nl5pMFm5+wiXPfYFB45WceMZg5gwoLe/QzLdUFiYcNe5w9hRWM47a+xaQUd5MxEUAP3dnmcCdt4WIlSVF5bt4oo5XxLbI4I3Zp/BkL69/B2W6cYazwoeWZhLXX2Dv8MJKt5MBMuBYSIySER6AFcA73hxeyZAlFbVcufLq/nFm+uZPDiJt/77DIamWBIw3hUWJtx9nnNW8O5aO+bsCK81ulXVOhG5HfgPEA48paobRORW1/QnRCQNyAHigQYRuRsYparWeUiQWpJbyM/fXEf+kUp+PH0Et04dQpg1Dz0ht437m+fy+4t8HEngu2BUGif1i+fhhblcMi7dblJsJ6/efaGq84B5zcqecPt/P06VkQlyB0ur+P37m3hr9V4G9InlpZsnM3FQH3+H1S0kxaR7Lk+t93EkgS8sTLj3guH8v2dyeG7pLm46c5C/QwoKdhueOSEHS6v49xd5PL90F9V1Ddx57jBmTxtivYd2oaX73gVgSr9Lm5bPj3XKz6/weUyB7JyRKZw1LJm/LdjKzPHpJPUKzTEbOsISgemwhgZl5e4jzF25h7krC6itb2DGmDTuvWAEg+2CcJdbuPt5oGUiWPiG81pbImhKRPjVJaOY/tBi/m/+Vh64bKy/Qwp4lghMuxw8WsXK3UdYsr2ID9fv52BpNT0iwvj2hExumTqYQck9/R2iMccMc/Vm++zSPK6eNJBR6fH+DimgWSIwx6gqReU17DhUzvZDZew4VMaOQ+VsOVBKwZFKAKIjw5g2PIUZY9M4Z2QKcdGRfo7aGM/uOW84b6/ew2/e3cDLt0y2O9mPwxJBCFJVDpVVs2V/6bFH7qEyth8s42jV152Y9YgIY1BST07OTOSG07M4dWBvRqcnHBszoK2hJ43xp4TYSH54wQh++dZ6Xl9RwHey+7e9UIiyRBACquvqWZNfwvK8w+TkHWZtQQlF5TXHpif3imJYSi8uPTmdwX17MaRvT4b07UV6Yoz1DGqC2lUTB/DO6r387r2NTB3el9T4aH+HFJBENbj65cjOztacnBx/hxHw5ny2g837jrJ5fym5B8uocd1pmRIXRf8+saTFR5OWEE1qfDS9oux4IJCV1jjjDsT1aNoct7TYOTOLSwztu2ivmjTguNN3FpYz/cHPmDIkiX/fcFrIVhGJyApVzfY0zX4BupHquno+3nSQuSv38PHmAzQoJMREcsqARIanxjGwTyyx9qMfdJongGPlIZ4A2mtQck9+cfFJ/OrtDfz7izy7t8AD+1UIAm3VxZdW1bJs52GW7TxMeXUdcdERnDE0mZMzE+mXEB2yR0DdxacFrwFwduZ3mpa/57TUOvuScp/HFEjac60qXIST0uJ4YN4mDpVW85MZI30QWfCwRBDEjpTX8MmWg6zKL6a+QRmZFsfkwUkMTellA750I4v3vA60TASL37dE0F4iwrcnZPLoolxeWLaLG8/MIiXOrhc0skQQhI5U1LBoy0FW7DpCmAjZA3tzxpBkkuPsDkpjWhMbFcE1kwfyxKfbufnZFbx08yQb49jFXoUgUlJZyydbDrIi7wgITByUxNnD+5IQY235jWmPfgkxXHHaAF5YtovZL6xkzrXZx5pDhzJLBEGgtKqWT7ce4qudh1GFU7N6M214XxJje/g7NGOCzkn94vn9ZWP56RvrmP3CCh69egJREaHdN5YlggBWXFHDk5/t4J+uQblPGdCbc0ak0LunJQBjTsQVEwdQ16D88q313PT0ch67+tSQPrO2+wgC0NGqWv79eR7/XLyDspo6xmYkcN7IVLsGEKKq653uPaLCY5qWVzkNAqKig+s7HAga7z14Y2UBP5m7lgF9Ynny2lMZmhLn58i8x+4jCBL7S6r49xc7eXHZbkqr67hgVCo/uGA4K3cV+zs040fNE8CxcksAJ+y/JmSSkRjD7BdWcukjX/DLS07iytMGhNxgSpYI/KyhQflyZxEvf5XPB+v3Ud+gXDwune9PHcyYjAQASwQhbv7uZwE4f8B1Tctfd7qhPn9Wmc9j6k4mDU7ig7vO4p5XneFV31y5h/suGcXJ/RP9HZrPWCLwA1Vl8/5SPly/n7dW72FXUQXx0U7TtpvOGET/PrH+DtEEkGX73gdaJoJlC53PiSWCE5cSH83z/28Sr60o4I8fbGbmo19w7sgUrp0ykKnD+nb7MwRLBD5SXFHDsp2HWbq9iE+2HGRXUQUiMHlQEvecN5zpY9JsVC9j/EhEuDy7PzPGpPHU53k892UeC/99kP59Yvj2hEymjUhhbEZCt+yI0auJQESmAw/hDF7/T1X9Y7Pp4pp+EVAB3KCqK70Zk7fV1jewt7iSXUUVbD1Qyoa9R1m/p4TcQ2WoOv35D+gTy2XjMxjZL4646Egqaup5Y+Uef4duTMhoq1uKvnFR3HnuMDbuPcpXOw/z0MJtPLhgG71jI5k8OIkxGQmMSo9nZFocqXHRQX/G4LVEICLhwKPA+UABsFxE3lHVjW6zzQCGuR6TgMddf31CVWlQqG9QGtR5OP87dff1qlTV1lNZU09lbT0VNc7/FTX1FFfWcLishqLyGg6X13CotJr8IxXsLa6kwe0aXlp8NKPT47n05HSmDEni5MxEXl9R4KtdNMZ0UkRYGOMyExmXmciFo1P5PLeQT7ceIifvCB+s339svshwIS0hmozEGNITY0juFUVCTCTxMZEkuv5GR4QRFRlOVESY83D7PzI8DBEIE3E98Hn/YN48I5gI5KrqDgAReRmYCbgngpnAs+q0Yf1SRBJFpJ+q7uvqYD5cv4+7X1lNQwPOD74qXdFyNi4qgt49e5DcqwfZA3sz4JQM+veJZUCfWIak9CLZBs42Jugl9Ypi5vgMZo7PAJwm3pv2HmXbwTL2Fleyp7iSvcWVfLm9iMMVNVTVnnjPsF8nBycxhAncfNZgfnjBiBNed3PeTAQZQL7b8wJaHu17micDaJIIROQW4BbX0zIR2dK1oXZaMlDo7yC8yPYvgFzNQM/lk1tdJKj2rxN8tn9X+2IjLbXYv3tdj07y/AHCu4nA07lN82Pw9syDqs4B5nRFUF1JRHJau0GjO7D9C262f8HNl/vnzd6WCgD3QUIzgb2dmMcYY4wXeTMRLAeGicggEekBXAG802yed4DrxDEZKPHG9QFjjDGt81rVkKrWicjtwH9wmo8+paobRORW1/QngHk4TUdzcZqP3uiteLwk4KqrupjtX3Cz/QtuPtu/oOt0zhhjTNeyERmMMSbEWSIwxpgQZ4ngBInIeBH5UkRWi0iOiEz0d0xdSURece3bahHJE5HV/o6pq4nIHSKyRUQ2iMif/R1PVxKR+0Vkj9t7eJG/Y/IGEblXRFREkv0dS1cSkd+JyFrXe/eRiKR7ZTt2jeDEiMhHwN9U9QPXl+zHqjrNz2F5hYj8L07Lrt/6O5auIiLfAH4BXKyq1SKSoqoH/R1XVxGR+4EyVf2rv2PxFhHpD/wTGAmcqqrd5iY6EYlX1aOu/+8ERqnqrV29HTsjOHEKxLv+T6Cb3gfh6iDwcuAlf8fSxW4D/qiq1QDdKQmEkL8BP8bDzajBrjEJuPTES/toieDE3Q38RUTygb8CP/NvOF5zFnBAVbf5O5AuNhw4S0SWicinInKavwPygttd1QtPiUhvfwfTlUTkm8AeVV3j71i8RUQecP2+XA38yivbsKqhtonIAiDNw6RfAOcCn6rqXBG5HLhFVc/zaYAn6Hj7p6pvu+Z5HKcTwf/1aXBdoI337wHgY+Au4DTgFWCwBtEXo439+xKnvxoFfgf0U9WbfBjeCWtj/34OXKCqJSKSB2QHW9VQe75/rvl+BkSr6q+7PIYg+rwHJBEpARJVVV3VJyWqGt/WcsFERCKAPTj1r92qD20R+RCnamiR6/l2YLKqHvJrYF4gIlnAe6o6xt+xdAURGQssxLkZFb7uomaiqu5vdcEgJSIDgfe98f5Z1dCJ2wuc7fr/HKC7VZ0AnAds7m5JwOUtnPcNERkO9KAb9dgpIv3cnl4GrPdXLF1NVdepaoqqZqlqFk7fZRO6UxIQkWFuT78JbPbGdmyoyhN3M/CQ66i5iq+7y+5OrqD7XSRu9BTwlIisB2qA64OpWqgd/iwi43GqhvKA7/s1GtNRfxSREUADsAvo8hZDYFVDxhgT8qxqyBhjQpwlAmOMCXGWCIwxJsRZIjDGmBBnicAYY0KcJQJjTpBrqNXPRWSGW9nlrpvVjAl41nzUmC4gImOA14BTcIZmXQ1MV9Xt/ozLmPawRGBMF3GNZVCO00tkqar+zs8hGdMulgiM6SIi0hNYiXOHcnZj19bGBDrrYsKYLqKq5SLyCs5AMJYETNCwi8XGdK0G18OYoGGJwBhjQpwlAmOMCXF2sdgYY0KcnREYY0yIs0RgjDEhzhKBMcaEOEsExhgT4iwRGGNMiLNEYIwxIc4SgTHGhLj/D9zttE3de7qDAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"data.label_distribution()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also binarize the data if you want to do binary classification instead. For example, for Caco-2, we know higher than -4.7 is good, so we can do binarization by typing:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Binariztion using threshold -4.7, default, we assume the smaller values are 1 and larger ones is 0, you can change the order by 'binarize(order = 'ascending')'\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEICAYAAABS0fM3AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAABAF0lEQVR4nO3deXhU5dn48e+djSxkgSRAIEBYBFlEQGRRq7TaCojVVupua21fXKrVtravttra2v2t/blVlLZW64p1F7EFrSjKIvsqm2wJhC2QjSSQ5f79cU5wjFkmYWbOLPfnuubK5Kz3zJyZ+5znec7ziKpijDEmdsV5HYAxxhhvWSIwxpgYZ4nAGGNinCUCY4yJcZYIjDEmxlkiMMaYGGeJIIyIyHwR+W6o13XX/4KIbOro+s1s7y0R+Zb7/FoR+SCA275KROYGanvt2O+ZIrJFRCpF5OJQ79+YYLFEEAQiskNEzvM6jkYico+I1IpIhfvYLCIPi0he4zKqukBVB/u5rafbWk5VJ6vqkwGIvUBEVEQSfLb9jKp+5US33QG/Ah5W1c6q+mpzC4jIlSKyzE0WxW5CPCtYAYnIeBGZJyKHROSAiPzL93NtYZ0JIlIuIvE+0/7awrRHgxV7C7HtEJFq9zgtFZGFInKDiPj1W9Xc8RKkOEOyn1CxRBA7ZqlqOtAV+BrQA1je1o9Ge4kjWo+rvsD6lmaKyA+B+4HfAt2BPsAjwEVBjKkLMBMocOOrAP7RxjrLgHhgtM+0LwB7mkw7G3g/UIG2w4XusdoX+D3wv8DfPYgjdqiqPQL8AHYA5zUzvQswGzgAHHaf5/vMnw/8DvgIKANeA7r6zB8PLARKgdXAxCbrfreFeO4Bnm4yLd7dxp/c/ycCRT7z/xfYjfPDsgk4F5gEHANqgUpgtc++fwN8CFQDA33jAa515z3kvq6NwLktvV++8QK7AHX3VwlMcLf3gc/yZwBL3W0vBc5o8r7c6+6/ApgL5LTy2f0PsBU4BLwO9HSnfwI0uK+vEujUZL1Md/o3Wtn2WGCR+/kVAw8DST7zhwHz3H3vA37qTu+Ek2D2uI/7m+7fZxujgQo/jtF3gB+5z7sB24BfNpmmQL4fcStwA7AF57j+CyA+x9l9wEFgO3Czu3yCv98dd/8NwHD3/wuAlUA5UAjc47Nsc8fLAOC/QIkbxzNAVmvHujs9DrjD/exLgBdwv4/N7cfr350TeUTrmVu4isM5W+uLc7ZYjfOl8vVN4DqgJ1AHPAggIr2AN4Ff45zV3w68JCK5HQlEVetxEs0Xms4TkcE4X9jT1TkzOx/Yoar/xjnbnaVO8cipPqtdA0wH0oGdzexyHM6PTQ7wC+BlEenqR6hnu3+z3H0uahJrV5z35UEgG/gz8KaIZPssdiXwbZwftySc9+5zRORLOIn4UiDPfR3PA6jqAJwv/4VuHEebrD4BSAZeaeW11AM/wHkPJuAk15vcfacDbwP/xvnsB+L8WAP8DOckYCRwKs4P410t7ONsWrlq8fE+n763ZwMfuA/fadtVtai1uH1MBU5347sU55gBJ7FOdmMfDVzsR2yfoaofAUV8eqwewfmeZOEkhRt96myaO14E53PtCQwBeuOcbLR4rLvb+L4b7znuuo1JrqX9RCxLBCGkqiWq+pKqVqlqBc5Z9DlNFntKVdep6hHgbuBSt9z2amCOqs5R1QZVnYdziT/lBELag5NUmqrHOQsdKiKJqrpDVT9pY1tPqOp6Va1T1dpm5u8H7lfVWlWdhXPmdcEJxN7oAmCLqj7l7vs5nCuOC32W+YeqblbVapyzupEtbOsq4HFVXeH+0N8JTBCRAj/iyAYOqmpdSwuo6nJVXezGuQN4jE8//6nAXlW9T1VrVLVCVZf4xPUrVd2vqgdwztyvabp9ERkB/Bz4sR/xvgecJSKC8wO7AOesf7zPtPf8iLvR71W1VFV3Ae/y6Xt8KfCAqhap6mGcop6OOH6squp8VV3rfg/WAM81E89xqrpVVeep6lH3/fuzz/KtHevXAz9zYz+KkzymRUu9gC9LBCEkIqki8piI7BSRcpyzsizfCjqcS91GO4FEnDOxvsA33Aq0UhEpBc7COXPtqF44xRCfoapbgdtwDvz9IvK8iPRsY1uFbczfre41tWsnzlnWierJ569AduK8tkZ7fZ5XAZ392ZaqVuIUCfRqYXlfJUBOaz8SIjJIRGaLyF738/8tzmcLzllqS8m26Wv83HsnIgOBt4BbVXWBH/EuxnkfhuOc3S5wX2+hz7T3/Yi7UUvvcU8+e2y0dZy05PixKiLjRORdt3K8DKdYqmk8x4lIN/cY3u3G/3Tj8m0c632BV3y+bx/jJI7uHXwNYcsSQWj9CBgMjFPVDD69vBSfZXr7PO+DUx5/EOcL9JSqZvk80lS1Q2dYboXuhThngp+jqs+q6lk4XwYF/tA4q4VNttWNbS/3TLNRH5yzPHAu9VN95vVox3b3uDH66oNT5tten9mWiKThnOn7s61FQA2tF33MwLlaOcn9/H/Kp599IU5Zdptx8dn3DhHpi1OsdK+qPuVHrKhqDU59ylQgT1U3urMWuNNG8GlFcWtxt6UYp56hUe+WFmyJiJyOkwgamyA/i1N/01tVM4FHfeJp7nj5nTt9hBv/1T7Lt3asFwKTm3znklV1dwv7iViWCIInUUSSfR4JOOXn1UCpW7b9i2bWu1pEhopIKk5zxRfd8vyngQtF5HwRiXe3OVFE8pvZRotEJFFEhuBcTvfAuUxuusxgEfmSiHTC+XGrxjkTAqcSs6ADLYO6Ad939/8NnLLaOe68VcDl7rwxwDSf9Q7gVBT2b2G7c4BBbrPNBBG5DBiKUxHfXs8C3xaRke5r/y2wxC0OaZWqluEUy/xFRC52r/4SRWSyiPzRXSwdp4KzUkROBm702cRsoIeI3CYinUQkXUTGufOeA+4SkVwRyXH38zQcrzv6L/AXVW1vU8/3cc6GF/pM+8CdtteniKS1uNvyAnCriPQSkSycilm/iEiGiEzFqad5WlXX+sRzSFVrRGQsTh1Qo+aOl3ScCt1S9/06XnTWxrH+KPAbN9Hivv+NLcDaOi4jiiWC4JmDc1A1Pu7Bae2RgnOGvxinYrCpp4AncC61k3EqrFDVQpxmiD/FOQgLcQ5ofz/Dy0SkEqflx+s4RRmnqeqeZpbthFOWe9CNo5u7X4B/uX9LRGSFn/sGWAKc5G7zN8A0VS1x592NczZ8GKf8+9nGlVS1yl3+Q/cSfbzvRt1tTMW52ioBfgJMVdWD7YitcVvvuLG8hHMmOwC4vB3r/xn4IU5FbuNndDPwqrvI7Tg/WhXAX4FZPutWAF/GuUrbi9MC54vu7F/j1AetAdYCK9xpAN/F+TH6hTj3LlS6n7M/3sP5bH1v9vvAnebbbLTFuP3wV5yWWmtwWvrMwWkEUd/KOm+ISAXO+/cznJOVb/vMvwn4lbvMz3GSDdDi8fJLnIrqMpyGBS/7bKu1Y/0BnO/KXHdfi3EaPbR5XEaaxiZexhgTdCIyGXhUVZsW5xkP2RWBMSZoRCRFRKa4xXa9cIpDW2tiazxgicCYKCUi632Li3weV4UyDJyimcM4RUMf4xTnmDBiRUPGGBPj7IrAGGNiXMTdIZeTk6MFBQVehxFRNrmdSw9us2/RyLGpxHlRg7Oj6EUZE0TLly8/qKrNdkkTcYmgoKCAZcuWeR1GRJk40fk7f76XUQTWxCcmAjD/2vmexmFMpBCR5voAAyIwEZj2y8+/kfLycm69tcW78CNO/zjnPp5bb73V40hMqA0cOJBbbrnF6zCiiiWCGHD48C4qj1SxfFtr9/BEqiqvAzAhFF/1ua6xTABYIogB1dWdOVaXwrGTT6Sj0vByKMHpmLNr3bg2ljTRJGXjnLYXMu1miSAGHDmSSV19dDUTPpj4X8ASgTGBYM1HjTEmxtkVQQxoaFAEja5+c01MiqspZ/fuFsf+MR1kVwQxwO4eN9FCGmqprq72OoyoY4nAGGNinBUNxYD09F3UNzRw5PiAaJGvf/XNXodgTNSwRBADROoQia7ioQTSvQ7BmKhhiSAGHD3ahYYoqycoSXCGWs6u+4LHkRgT+SIiEYjIdGA6QJ8+fTyOJvIcPdrF6xACriTRGV3REoExJy4iKotVdaaqjlHVMbm5zXaeZ4wxpoMiIhEYY4wJHksExhgT4ywRGGNMjIuIymJzYjIzd1FbH133EQys/qHXIRgPaFwiKSkpXocRdeyKIAYkJyci8eJ1GAEVRyfi6OR1GCbEGpIz6NWrl9dhRB27IogBR45kcrQ2ugalOZDwDgC5ded6HIkxkc8SQQyoru4cdeMRHE78CLBEYEwgWNGQMcbEOLsiiBGCkhxFw/zFDS4BIGVT9Lwm0zZnzOLuXocRdSwRxID6+n7U1dVxWv+DXocSMDuSkwA4rb/9KMSW7gwcONDrIKKOJYIY0LnznQA88IDHgQTQ6idWA/DAtVH0oozxiETa6FUicgDY6XEYOUCknV5bzKERaTFHWrxgMXdUX1VttrO2iEsE4UBElqnqGK/jaA+LOTQiLeZIixcs5mCwVkPGGBPjLBEYY0yMs0TQMTO9DqADLObQiLSYIy1esJgDzuoIjDEmxtkVgTHGxDhLBMYYE+MsERhjTIyzRGCMMTHOEoExxsQ4SwTGGBPjLBEYY0yMs0RgjDExzhKBMcbEOEsExhgT4ywRGGNMjLNEYIwxMc4SgTHGxDhLBMYYE+MibvD6nJwcLSgo8DqMiLJpk/N38GBv4wikTSXOixqcHUUvypggWr58+cGWxiyOuERQUFDAsmXLvA4jokyc6PydP9/LKAJr4hMTAZh/7XxP4zAmUojIzpbmWdGQMcbEOEsExhgT4ywRGGNMjIu4OgLTfnPmeB1B4M25KgpflPFbbW0tRUVF1NTUeB1K2ElOTiY/P5/ExES/1wlaIhCRZOB9oJO7nxdV9RdNlpkIvAZsdye9rKq/ClZMsSo11esIAi81MQpflPFbUVER6enpFBQUICJehxM2VJWSkhKKioro16+f3+sF84rgKPAlVa0UkUTgAxF5S1UXN1lugapODWIcMe+RR5y/N93kbRyB9MhS50XddHoUvSjjt5qaGksCzRARsrOzOXDgQLvWC1oiUFUFKt1/E92HBmt/pmUvvAD7y2vIOm2/16Ecd+W4Pie0/gvrXwAsEcQySwLN68j7EtTKYhGJF5FVwH5gnqouaWaxCSKyWkTeEpFhwYzHGGPM5wU1EahqvaqOBPKBsSIyvMkiK4C+qnoq8BDwanPbEZHpIrJMRJa195LHGGMCTUS45pprjv9fV1dHbm4uU6e2r5R74sSJx2+QnTJlCqWlpYEM028haT6qqqXAfGBSk+nlqlrpPp8DJIpITjPrz1TVMao6Jje32TukjTEmZNLS0li3bh3V1dUAzJs3j169ep3QNufMmUNWVlYAomu/YLYaygVqVbVURFKA84A/NFmmB7BPVVVExuIkppJgxWSMiT6/fGM9G/aUB3SbQ3tm8IsLWy+pnjx5Mm+++SbTpk3jueee44orrmDBggUAHDlyhFtuuYW1a9dSV1fHPffcw0UXXUR1dTXf/va32bBhA0OGDDmeSODT7nNycnK4+OKLKSwspKamhltvvZXp06cD0LlzZ2699VZmz55NSkoKr732Gt27dz/h1xvMK4I84F0RWQMsxakjmC0iN4jIDe4y04B1IrIaeBC43K1kNgE0fz7cNSN8KooDYf61862fIeOpyy+/nOeff56amhrWrFnDuHHjjs/7zW9+w5e+9CWWLl3Ku+++y49//GOOHDnCjBkzSE1NZc2aNfzsZz9j+fLlzW778ccfZ/ny5SxbtowHH3yQkhLn/PjIkSOMHz+e1atXc/bZZ/PXv/41IK8lmK2G1gCjmpn+qM/zh4GHgxWDMSb6tXXmHiwjRoxgx44dPPfcc0yZMuUz8+bOncvrr7/On/70J8Bp7rpr1y7ef/99vv/97x9ff8SIEc1u+8EHH+SVV14BoLCwkC1btpCdnU1SUtLxeojTTjuNefPmBeS12J3FMeBPf4KVu9K54KoKr0MJmD8tdL5gt59xu8eRmFj21a9+ldtvv5358+cfP2sH58aul156icHN9P3eVvPO+fPn8/bbb7No0SJSU1OZOHHi8TuoExMTj68fHx9PXV1dQF6H9TUUA2bPhpUfpHgdRkDN3jyb2Ztnex2GiXHXXXcdP//5zznllFM+M/3888/noYceorGke+XKlQCcffbZPPPMMwCsW7eONWvWfG6bZWVldOnShdTUVDZu3MjixU3vwQ08SwTGGNNB+fn53HrrrZ+bfvfdd1NbW8uIESMYPnw4d999NwA33ngjlZWVjBgxgj/+8Y+MHTv2c+tOmjSJuro6RowYwd1338348eOD/jqsaMgYY9qpsrLyc9MmTpzIRHcUqJSUFB577LHPLZOSksLzzz/f7DZ37Nhx/Plbb73V5n6nTZvGtGnT2hF1y+yKwBhjYpxdEcSAlBRIPBpdrXJTEqOrzsMYL1kiiAFvvQXPLomurjneuqr5S2djTPtZ0ZAxxsQ4SwQx4N574ZXHM7wOI6Dufe9e7n3vXq/DMCYqWCKIAe+8A+uXJnsdRkC9s/0d3tn+jtdhGBMVLBEYY0w7Baob6nBhicAYY9opGN1Qe8kSgTEm4k2c+PlH41jdVVXNz3/iCWf+wYOfn+ePxm6ogePdUDc6cuQI1113HaeffjqjRo3itddeA5ybxr7whS8wevRoRo8ezcKFCwGnf6GJEycybdo0Tj75ZK666ipC2RGzJYIYkJ0NnTMbvA4joLJTs8lOzfY6DBPDOtINdbdu3Zg3bx4rVqxg1qxZx3siBac/ovvvv58NGzawbds2Pvzww5C9FruPIAa89BI8u+Sg12EE1EuXvuR1CCaMzJ/f8rzU1Nbn5+S0Pr8lHemGumfPntx8882sWrWK+Ph4Nm/efHydsWPHkp+fD8DIkSPZsWMHZ511VvsD64BgjlCWDLwPdHL386Kq/qLJMgI8AEwBqoBrVXVFsGIyxphAam831Pfccw/du3dn9erVNDQ0kJz8aWu+Tp06HX8eyC6m/RHMoqGjwJfcgelHApNEpGk3epOBk9zHdGBGEOOJWXfeCc8/kul1GAF159t3cufbd3odholx7e2GuqysjLy8POLi4njqqaeor68PeczNCVoiUEdjV3mJ7qNp7cdFwD/dZRcDWSKSF6yYYtWiRbB1bae2F4wgi4oWsahokddhmBjX3m6ob7rpJp588knGjx/P5s2bSUtLC3XIzQpqHYGIxAPLgYHAX1R1SZNFegGFPv8XudOKm2xnOs4VA3369AlavMYY44+OdkN90kknfWYwmt/97nefWxfg4YdDO4JvUFsNqWq9qo4E8oGxIjK8ySLNjdn2uTZTqjpTVceo6pjc3NwgRGqMMbErJM1HVbUUmA9MajKrCOjt838+sCcUMRljjHEELRGISK6IZLnPU4DzgI1NFnsd+KY4xgNlqlqMCaj8fOjaLTwqpQIlPyOf/Ix8r8MwHgrlDVeRpCPvSzDrCPKAJ916gjjgBVWdLSI3AKjqo8AcnKajW3Gaj347iPHErKefhmeXlLS9YAR5+utPex2C8VBycjIlJSVkZ2fjtEI34CSBkpKSzzRL9UfQEoGqrgFGNTP9UZ/nCnwvWDEYY6JTfn4+RUVFHDgQXQMuBUJycvLxG9P8ZXcWx4DbboNNe7O45gelXocSMLf9+zYA7p90v6dxGG8kJibSr18/r8OIGpYIYsCqVbC/PMnrMAJq1d5VXodgTNSwTueMMSbGWSIwxpgYZ4nAGGNinNURxIBBgyBuf+h6MgyFQdmDvA7BmKhhiSAGzJwJzy455HUYATXzwpleh2BM1LCiIWOMiXGWCGLA9Onwt9919TqMgJr+xnSmvzHd6zCMiQpWNBQDNm+G/eXR9VFvLtnc9kLGGL/YFYExxsQ4SwTGGBPjLBEYY0yMi66CY9OskSNh095jXocRUCN7jPQ6BGOiRtASgYj0Bv4J9AAagJmq+kCTZSYCrwHb3Ukvq+qvghVTrLr/fnh2SanXYQSU9TpqTOAE84qgDviRqq4QkXRguYjMU9UNTZZboKpTgxiHMcaYVgStjkBVi1V1hfu8AvgY6BWs/ZmWXX01PPKLbK/DCKirX76aq1++2uswjIkKIaksFpECnNHKljQze4KIrBaRt0RkWAvrTxeRZSKyzEYkar+iIji0P97rMAKqqLyIovIir8MwJioEPRGISGfgJeA2VS1vMnsF0FdVTwUeAl5tbhuqOlNVx6jqmNzc3KDGa4wxsSaoiUBEEnGSwDOq+nLT+aparqqV7vM5QKKI5AQzJmOMMZ8VtEQgIgL8HfhYVf/cwjI93OUQkbFuPCXBiinWbNhTzi3PrWRV4WF2Hari6cU7+eRApddhGWPCTDBbDZ0JXAOsFZFV7rSfAn0AVPVRYBpwo4jUAdXA5aqqQYwpJqgqf/j3Jh57/xMykhPpP6wHlTX1FB2u4u8flDO6TxYXj+xFQnzk3k84IX+C1yEYEzWClghU9QNA2ljmYeDhYMUQixoalDtfXsusZYVcNqY3P50yhMzURJ5dsova+sG8u2k/8zcd4MjReq4c14fECE0Gvzvvd16HYEzUiMxfAdOih/67lVnLCrn5iwP5/SWnkJmaeHxeYnwcXxnag4tG9mTTvgpeW7Xbw0iNMeHCEkEU+XDrQe5/ZzNfG9WLH31lEG71C5dcAvff8Wkd/Lh+2XxxcDdW7Cplxc7DXoV7Qi554RIueeESr8MwJipYIogSVcfq+PG/VjMgtzO/vnj48SQAUFIClWWf/ajPHdKNfjlpvL56D6VVkdcPUUlVCSVV1q7AmECwRBAlHnxnK3vKavj9108hrVPbVT9xIlwyOp8GVd5atzcEERpjwpUlgiiw7UAlf1uwjWmn5TOmwP8hKbumJXH2oFzW7i5jmzUrNSZmWSKIAve/vYXE+Dj+d9LJ7V73nEG5ZKYkMnfDPqzlrjGxycYjiHBb9lXwxpo9XH/2AHLTOzW7zLnnwpqimmbnJcbHMXFwLq+t2sPW/ZWc1D09mOEGzLn9zvU6BGOihiWCCPfAO1tITYxn+tn9W1zm7rvh2SVNu3n61Gl9uzB/0wHe/ngfA7t1/kxFc7i6+5y7vQ7BmKjhV9GQiLwkIheIiBUlhZHCQ1XMWVvM1eP70jUtqcPbSYiL45xBuRQermZHSVUAIzTGRAJ/f9hnAFcCW0Tk9yLS/sJoE3BPLtyBiPCtMwpaXW7yZPjDba332jq6TxdSEuNZ+MnBAEYYPJOfmczkZyZ7HYYxUcGvRKCqb6vqVcBoYAcwT0QWisi33R5GTYhVHq1j1tJCppySR8+slFaXra6G2qOtF/ckJcQxtl9XNuwp5/CR8L+voLq2muraaq/DMCYq+F3UIyLZwLXAd4GVwAM4iWFeUCIzrXplRREVR+u47syCgG1zfP9sRGDxdrtRy5hY4m8dwcvAAiAVuFBVv6qqs1T1FqBzMAM0n6eqPPtRIcN6ZjCyd1bAtpuZksiQvAxW7DxMXUNDwLZrjAlv/l4R/E1Vh6rq71S1GEBEOgGo6pigRWeataaojI+Ly7libJ+At/A5vaArR47V83FxRUC3a4wJX/4mgl83M21RayuISG8ReVdEPhaR9SJyazPLiIg8KCJbRWSNiIz2M56Y9vzSXaQkxnPRyJ5+LT91Kow6y7/y9IHdOpOVksjSHYdOJMSgmzpoKlMHTfU6DGOiQqv3EYhID6AXkCIio/h0fIEMnGKi1tQBP1LVFSKSDiwXkXmqusFnmcnASe5jHE7rpHHtfxmxo/pYPW+sLuaCEXmkJ/tXT3/77fDsEv/O8ONEOK1vF97ZuJ/DVcfoktrxZqnBdPsZt3sdgjFRo60bys7HqSDOB3yHm6zAGW2sRW4RUrH7vEJEPsZJKr6J4CLgn+6oZItFJEtE8hqLn8znzd2wl8qjdVwyOj9o+xjVx0kEa4rKOGdQ681OjTGRr9VEoKpPAk+KyCWq+lJHdyIiBcAoYEmTWb2AQp//i9xpn0kEIjIdmA7Qp0+fjoYRFV5esZteWSmM6+d/53ITJ8L+8m7cNWO/X8t3TUuiT9dUVheWhm0imPjERADmXzvf0ziMiQat1hGIyNXu0wIR+WHThz87EJHOwEvAbaratJ+D5mo6P9fzmarOVNUxqjomNzc8f5hCYX9FDQu2HODiUT2JiwtuNxCn5meyt7yGveXN91FkjIkebVUWp7l/OwPpzTxa5d5s9hLwjKq+3MwiRUBvn//zgT1tbTdWzV5dTIPC10b1Cvq+hvfKJE5gTWFp0PdljPFWW0VDj7l/f9neDYvTrvHvwMeq+ucWFnsduFlEnsepJC6z+oGWvbm2mCF5GQzsFvweQtOTExmQ25nVRaV8eWj3iOiIzhjTMf7eUPZHEckQkUQReUdEDvoUG7XkTOAa4Esissp9TBGRG0TkBneZOcA2YCvwV+Cmjr6QaLentJrlOw9zwSk9QrbPU3tncbiqlsJD1hGdMdHM326ov6KqPxGRr+EU53wDeBd4uqUVVPUDmq8D8F1Gge/5GUNMaxxOcsopee1e99JLYen29v+YD83LICFOWFVURp/stLZXCKFLh13qdQjGRA1/E0Fjg/UpwHOqesiKCkLrzTV7GJKXQf/c9vfocdNN8OyS9g9FmZwYz8l5GazdXcYFp+QRH+QK6va46XS7eDQmUPy9s/gNEdkIjAHeEZFcwJqThMie0mpW7Cpl6oj2Xw0AVFXB0ZqO/Yifmp/JkaN1bD94pEPrB0tVbRVVtVZkZUwg+NsN9R3ABGCMqtYCR3BuBjMhMGetU3/ekWIhgClT4P9+0LFmtyd1SycxXli/p6xD6wfLlGemMOWZKV6HYUxUaM9QlUNw7ifwXeefAY7HNGPO2mKG5mXQLyf05fRJCXEM6p7OhuJyLjy1J3FWJGhM1PG31dBTwJ+As4DT3Yf1OhoCjcVCF3SwWCgQhvXMpKKmjiJrPWRMVPL3imAMMNRt5WNCqLG10AUdLBYKhJN7pBMvwvo95WHXesgYc+L8rSxeB4SuAbs57j/r93Jyj3QKPCgWapScGM+AbmmsLy7HzgWMiT7+XhHkABtE5CPgaONEVf1qUKIyABw6coxlOw5x8xcHntB2rr0WFn1yYq1+hvXM5JWVu9lbXkNeZutjJIfCtSOv9ToEY6KGv4ngnmAGYZr3zsf7aFD4yrATuxi79lpIWnJiiWBIXgavrtzN+j3llgiMiTL+Nh99D9gBJLrPlwIrghiXAeZu2EfPzGSG9cw4oe0cPAgVpf6WAjavc6cECnLSwqYZ6cGqgxysOuh1GMZEBX9bDf0P8CLwmDupF/BqkGIyOCORLdhyICAdvk2bBg/cmXPCMQ3rmcG+8qMcrDja9sJBNu2FaUx7YZrXYRgTFfw9TfweTidy5QCqugXoFqygDCzYcoCa2oYTLhYKpKF5zpXJhuKmw0oYYyKZv4ngqKoea/zHvanMmo8E0dwN+8hITmBsO0YiC7as1CR6ZiVbIjAmyvibCN4TkZ/iDGL/ZeBfwBvBCyu21dU38M7H+/jSyd1IjD+xsv1AG5qXQeGhKipqar0OxRgTIP7+ytwBHADWAtfjjCNwV2sriMjjIrJfRNa1MH+iiJT5jFXw8/YEHs2W7zzM4apavjw0fIqFGg3Ny0SBjcUVXodijAkQv5qPqmqDiLwKvKqqB/zc9hPAw7TeH9ECVZ3q5/ZixtwN+0iKj+OcwYEZn/nGG+GDLe3vhro53TM60TUtiQ3F5ZzuYbHVjWNu9GzfxkSbVhOBO9zkL4CbcQaZERGpBx5S1V+1tq6qvi8iBYEKNFaoKnM37OXMgdl07tSePgFbdtllUL8kMP0EiQhDeqSzePshjtbW0ykxPiDbba/Lhl/myX6NiUZtFQ3dhtNa6HRVzVbVrjhjC58pIj8IwP4niMhqEXlLRIa1tJCITBeRZSKy7MABfy9IItOmfRUUHqoOaLFQYSGU7AvcD/bQnpnUNyib9wfmKqMjCssKKSwr9Gz/xkSTthLBN4ErVHV74wRV3QZc7c47ESuAvqp6KvAQrdyXoKozVXWMqo7JzQ1McUm4mrt+HyJw3tDAtc695hqYcU92wLbXp2sqqUnxfOxh66FrXrmGa165xrP9GxNN2ip7SFTVz92+qaoHRCSxuRX8parlPs/niMgjIpLT3P5iydwNexnVO4tu6cleh9Ki+DhhSI8M1heXUd+gYTWEpQm8Z5fs8jqEz7hyXB+vQ4g6bV0RHOvgvDaJSA+3DgIRGevGUnIi24x0u0urWbe7PCxbCzU1tGcGNbUNYTeEpTGm/dq6IjhVRJq7/heg1VNWEXkOmAjkiEgRTqVzIoCqPgpMA24UkTqgGrg81sc7mLfeGXvg/GHdPY6kbQO7dSYxXthQXMbAbp29DscYcwJaTQSq2uEaRlW9oo35D+M0LzWuuRv2MbBbZ/rnhv8Pa2J8HCd1S+fj4gouHKEn3B+SMcY7gWmfaE7Y4SPHWLL9ENef3T/g2/7Rj+C9TYG/AWxoXgYbisvZU1pDry6h7Zr6RxN+FNL9GRPNLBGEif9u3E99gwalk7kLL4SKbtUB3+7JPdIRYENxWcgTwYWDLwzp/oyJZuHVkU0Mm7thLz0ykhnRKzPg2960CfbsDHzOT3XHKPCiE7pNBzex6eCmkO/XmGhkiSAMVB+r573NztgDcUFoinn99fD474PTHcTQPGeMgpLK0I5RcP3s67l+9vUh3acx0coSQRj4dOyB8G8t1JSNUWBM5LNEEAbmbthHenIC4/oF7u7fUOmSlkRepo1RYEwks0TgscaxB849uRtJCZH5cQzJy2BXSRWVR+u8DsUY0wGR+csTRZbucMYeCKchKdtraF6GO0aBXRUYE4ms+ajH5m7YS1JCHOcMCl5nenfdBf/9uCxo28/LTCYrNZENxeWMKQjNGAV3nd3quEjGmHawROAhVWXu+n18YWAOaQEae6A5550H+9OD16pHRBial8FH2w9xtK6eTgnBH6PgvP7nBX0fxsQKKxry0IbicnaXVge9tdCqVbBj8wl1FtumIXkZ1DUoW/aFZoyCVXtXsWrvqpDsy5hoZ4nAQ/9xxx44d0hwE8Ftt8HT/69LUPdRkJ1GSmLoxii47d+3cdu/bwvJvoyJdpYIPPTW2mJO79uVnM6dvA7lhMXHCSf3SOfjveXUNTR4HY4xph0sEXhk094KtuyvZOqpeV6HEjCn5GdSU9vAVg+HsDTGtF/QEoGIPC4i+0VkXQvzRUQeFJGtIrJGREYHK5Zw9OaaPcQJTBoeuc1GmxrYrTPJiXGsLQpeCyVjTOAF84rgCWBSK/MnAye5j+nAjCDGElZUldlrixnXLzush6Rsr4S4OIblZbKhuJzaeiseMiZSBK3Noqq+LyIFrSxyEfBPd1SyxSKSJSJ5qlocrJjCxca9FWw7cITrzuwXkv399rcwd31pSPZ1Sn4my3cdZuv+Soa4/RAFw2/P/W3Qtm3CT0VNLYWHqiivqaOmtp7szkkM65nJgNw0GxQpALy8j6AXUOjzf5E77XOJQESm41w10KdP5A9cPdstFpocomKhM86AHfEnNMS03wbkdiY1KZ41RaVBTQRn9D4jaNs24aGuoYFVu0pZsv0Qu0s/HU/j9dV7jj/vmZnMtDG9+eaEvlHR6MIrXiaC5tJ4s2MWq+pMYCbAmDFjInpcY1XlzTXFnDEgh+wQHbgLF8Lm9UkMGhH8ZBAfJwzrmcHqojJq6xtIjA9O6ePCwoWAJYRo9XFxOW+uLebQkWN0z+jE+cN60C8njS6piVw6pjf7K46yctdh5qzby0P/3cLfF2zjpi8OZPrZ/YN2zEUzLxNBEdDb5/98YE8Ly0aN9XvK2VFSxfXnDAjZPn/6U9hfnsVdM/aHZH+n9Mpi6Y7DbNpbwfAgDLQD8NN3fgrA/GvnB2X7xhtH6+qZvbqY5bsO0y29E9eeUcBJ3Tp/pvinS1oSXdKSGNwjncvH9mHr/kr++O+N/N9/NjF3/V4euHwUBTlpHr6KyONl6nwd+Kbbemg8UBYL9QOz1xQTHydMiuBO5trSLyeNtKR41u621kPGf6VVx5j5/jZW7DrMxMG53PylgQzqnt5mHcDAbp2Z+c0xPHLVaHYequKiv3zIB1sOhijq6BDM5qPPAYuAwSJSJCLfEZEbROQGd5E5wDZgK/BX4KZgxRIuGhqU11ft5qyBOXRJS/I6nKCJjxOG98pk495yamrrvQ7HRICDlUd59L1POHTkGN86o4CvDO1BQlz7fp6mnJLH6987ix4ZyVz7j494bdXuIEUbfYLZauiKNuYr8L1g7T8cLdpWwp6yGu6YMsTrUIJuVO8slmw/xLrdZSHrkdREpoOVR/nbgm3UNyjXnz2AHpkdb1LdJzuVF2+cwHefXMZts1ZxtK6BS8f0bnvFGGe1KiH00vIi0pMT+MrQyBuSsr16d00lp3MSK3aVeh2KCWO+SeA7X+h/QkmgUXpyIk9eN5azBuZwx0tr+M/6vQGINLpZIgiRyqN1vLVuL1NH9CQ5MfjdNPu6/364+geHQ7pPEWF0ny7sKDnCoSOBb610/6T7uX/S/QHfrgmd8ppa/v7Bduoak0BG4G6uTE6M59GrT+OU/CxueW4lS7aVBGzb0cgSQYi8tbaY6tp6pp3WK+T7HjkSCgbVhn6/vbMQYMWuwCehkT1GMrLHyIBv14TGsboGnlq0k+pj9Vx3Zr+AJoFGaZ0S+Me1p9O7SwrffXIZG/bYCHotsUQQIi8uL6JfThqj+wS3O+jmvP02rPso9DfbZKUmMSC3Myt3HaZBA3v7x9vb3ubtbW8HdJsmNBpU+dfyQvaUVnPZ6b3pmZUStH11TUviqe+Mo3NyAt99cikHK4M3QFMks0QQAoWHqliy/RBfH9XLk9vhf/1rePUfwWnP35bRfbM4XFXLjpIjAd3ur9//Nb9+/9cB3aYJjXkb9rF+TzmTh/cI6t3njXpmpTDzmjGUHDnG955ZYf1gNcMSQQi8vMJpxva10aEvFvLa0LxMOiXEsWJnqdehmDCwYU8Z720+wOkFXThzYE7I9ntKfia/v+QUlmw/xG/e/Dhk+40UlgiCrL5BeWFZIWcMyCa/S6rX4YRcUkIcp/TKZN3uMrunIMYdOnKMF1cU0SsrhQtH9Az51fHXRuXz3bP68cTCHbywrLDtFWKIJYIge+fjfewureabE/p6HYpnxvbryrH6hqBUGpvIUFffwHMf7QLgirF9SPCoP6A7Jp/MmQOzufvVdazfY3e+N7JEEGT/XLSTvMxkzgvyuMThLL9LKr27pLB4W0nAK41NZJizbi+7S6uZNjqfrh7eVZ8QH8cDl4+iS2oSNz2zgvKa0LemC0eWCIJo6/4KPth6kKvH9/XsDAjgscfgujsOebZ/gAkDsjlYeSxgw1g+NvUxHpv6WEC2ZYJr7e4yFm8r4cwB2Qzt6U2jBV85nTvx8JWj2H24mh//azVqJyeWCILpqUU7SYqP47LTvb3FffBg6Nm3ztMYhvfKpHOnBBZ9EpgbewbnDGZwzuCAbMsET0nlUV5eUUTvLimcH0bDso4p6Modk0/mP+v38bcF270Ox3OWCIKkoqaWF5cXMXVEnucDZrzxBqxYELy22v5IiIvj9IKubN5XQUkA2nK/sekN3tj0RgAiM8FSW9/Asx/tIk6Ey8f2aXcncsH2nbP6MWlYD37/740s3eHtFbPXwuuTiSKvrNzNkWP1fOuMAq9D4b77YM6z6V6Hwbh+XRGBxQG43f++Rfdx36L7AhCVCZY31xZTXFbDN07Lp0tq+PW2KyL88Rsj6N0lhe89s4IDFbF7s5klgiCob1Ce+HAHp+ZncmrvLK/DCRsZKYkM6+mMaVx51NuiKhNcqwtL+Wj7Ib5wUg4nh+CmsY7KSE7kkatOo6y6llufX0l9Q2zWFwQ1EYjIJBHZJCJbReSOZuZPFJEyEVnlPn4ezHhC5T/r97Lt4BGmnx26UcgixVkDc6ipbeDZJTu9DsUEycGKo7yyajd9uqbylaHhUy/QkqE9M7j34uEs/KSE+9/e7HU4ngjmwDTxwF+AycBQ4AoRGdrMogtUdaT7+FWw4gkVVeUv726lf04ak8Kocixc9O6aysDczvx1wXa7wSwKHatr4JmPdpIQJ1x+em/i40LfpUpHXDqmN5eN6c1D/93Ku5tCM6RrOAnmFcFYYKuqblPVY8DzwEVB3F9YmL/5AOv3lHPDOQMi5ksQaucMzuVAxVH+ZXd3RhVV5dVVu9lffpTLxvQmKwzrBVrzy4uGMSQvgx/MWkXR4SqvwwmpYCaCXoDvN73IndbUBBFZLSJviciwIMYTdKrKn+duJr9LChePCp9+hZ56Cm68J3z6Y++fk8aYvl34y7ufdPiq4KmvPcVTX3sqwJGZE7Fk+yFWFZZy7pBunNTd+8YJ7ZWcGM+Mq0ZTX69875kVHK2LnSvWYCaC5k6Hm9bErAD6quqpwEPAq81uSGS6iCwTkWUHDhwIbJQB9J/1+1i7u4xbzz2JpITwqYfv3Ruyu4fPQS0i/Ogrg9lbXsPTiztWV9A7sze9M20IwnBReKiKN9cUM7h7OhMHd/M6nA4ryEnj/75xKquLymKqc7pg/loVAb7f1Hxgj+8CqlquqpXu8zlAooh8rktCVZ2pqmNUdUxubm4QQ+64uvoG7pu7if65aXwtjK4GAGbNgkXzwqvDuwkDsjlrYA6PzP+Eig7c5j9r3SxmrZsVhMhMex05WsezH+0iIyWBb4zJJ86DrtYDadLwHvzPF/rxz0U7eW3Vbq/DCYlgJoKlwEki0k9EkoDLgdd9FxCRHuJ2QSgiY914wqcMox2eW1rIlv2V/OT8wZ52J9GcGTPgnZc7ex3G5/xk0mAOHTnGw+9ubfe6M5bNYMayGUGIyrRHgzq96x45WseVY/uSmpTgdUgB8ZNJJ3N6QRfufHktW/dXeB1O0AXtF0tV64Cbgf8AHwMvqOp6EblBRG5wF5sGrBOR1cCDwOUagR1/lFXX8ue5mxjfvyvnD7OWQv4akZ/FJaPz+ccHO9gZ4IFrTGi8tbaYLfsrufDUnvTq4u3d64GUGB/Hw1eOJjUpnhueXsGRKL/vJainrqo6R1UHqeoAVf2NO+1RVX3Uff6wqg5T1VNVdbyqLgxmPMFy39xNlFbXctcFQz0ZgSyS/WTSYBLihV++scE6/4owi7eV8OEnJUwYkM3pBV29Difgumck88Dlo9h2oJI7Xl4b1cdneJVhRKAVuw7z1OKdfGtCAcN7ed+zYqTpnpHMD84bxH837mfO2r1eh2P8tHlfBbPX7GFw93QuOCXP63CC5syBOdx+/mDeWL2HP/5nk9fhBI0lghNwtK6en768lu7pyfzoK4O8DidiffvMAob3yuAXr6+ntOqY1+GYNuwprea5j3bRLT2Zy0/vHfGVw2258ZwBXDmuDzPmf8KTC3d4HU5QWCI4AX+eu5mNeyv4zdeGk56c6HU4LXrxRbj1dwe9DqNFCfFx/P7rIyitOsbPXlnn1yX4i5e+yIuXvhiC6Iyv/eU1/OPD7SQnxvPNCX3plBjvdUhBJyLce9Fwvjy0O/e8sZ45a4u9DingLBF00MJPDjJzwTauHNeHc8N89LGcHEjPavA6jFYN75XJD78yiDfXFvPi8qI2l89JzSEnNXSDnxtnbIHHP9yOiPCdM/tF3J3DJyI+TnjoilGM7tOFW59fybwN+7wOKaAsEXRAcVk1339uJf1y0rjrgiFeh9OmJ56A92aneR1Gm64/ewDj+3fl7tfWsWFPeavLPrHqCZ5Y9URoAjMcrjrG3z/cTm29ct1Z/chJ93aMDS8kJ8bz+LWnM7RnJjc+vZx/r4ueKwNLBO1UU1vPjU+voPpYPTOvOS0i2k0/8QQseDP8E0F8nPDgFaPITEnk+qeXcehIy/UFlghCZ29ZDY+953QHct2Z/eiRkex1SJ7JTEnkqe+MZUR+Jt97diWz1+xpe6UIYImgHeoblFufX8nqolLuu/RUBnaLvP5Uwl239GRmXH0a+8qP8p0nl1J9LHy6xohFS7aVMHPBJwBMP3tAVN0r0FEZyYn88zvjGN0ni1ueW8nfFmyL+Kallgj81NCg/OyVtfxn/T7uvmAok4ZHb5M5r43u04UHLx/JqsJSbnxmuXVX7ZFXVhZxzeMf0blTItefMyCmrwSa6twpgSevG8v5Q3vw6zc/5s6X13KsLrzr4VpjicAP9Q3KnS+v5fmlhdz8xYFcd1Y/r0OKepOG5/Hbr53C/E0HuP4pSwahVFNbz50vr+UHs1YzMj+L68/uH5ZDTXotNSmBR64azfe+OIDnlxZyzd+XsLesxuuwOsQSQRuqjtVx/VPLmLWskO+fe5LdLxBCV4ztwx8uOYX3txzg8pmLORiAQe9N67bur+SSGQt57qNd3HDOAJ79n3GkdQr/ejCvxMUJPz7/ZO6/bCRriso4//73eWN15NUb2Cfcik8OVHLT0yvYsr+Cey8axjUTCrwOqUPmzIFZS8O3++7WXHZ6HzJTkrht1kq++tAHPHTlaE7r24U5V83xOrSoUlNbzyPvbmXGe5+Q1imBv39rTNg3iw4nF4/qxam9s/jBrFXc8pzTvPSuC4bQLUKK0+yKoBkNDcqTC3dw4UMfcKDyKE98e2zEJgGA1FTolBy5lVmThvfgX9efQXy8cOlji/jzvM3ESydSE8Ora+1I1NCg/HvdXibd/z4P/ncrU0f05O0fnmNJoAP65aTx4g0T+MF5g3hrXTET/zSfB9/ZEhENHiTSarvHjBmjy5YtC9r2P9p+iN+8uYHVRWWcPSiXP14ygh6ZkZHVW/LII7B0+yG+PK3S61COu3Jcn3avU1Zdyz2vr+eVlbtJypzHF0/uxoyL7iTOhgRtt4YG5d/r9/LgO1vYuLeC/jlp/Oqi4Zx10udv0nt2yS4PImxZR46dUNtZcoTfv7WRt9btpXtGJ647sx+Xj+1DZop3PRCIyHJVHdPsPEsETmXwvA17+fsH21m64zA9MpL538mDuXhkr6joTXTiRKdrgLtmhM+g3CfyZZ6/aT9f/9f5VB2r45wuf+HmLw3kK8O60ykh+rs7OFFFh6t4aflu/rW8kKLD1fTPSeOWcwdy4YieLY6jYYmg4z7afoj75m5iyfZDpCbFM+20fL4+Op9T8zND/tvSWiKI2TqChgZlZWEpczfs5c01xRQdria/Swp3XTCEq8b1JSXJflTC1cTB3RiRn0lJ5TFqjzZwy3MryUxJ5MJT87hoZC9G9c4Ku8GBvKKqbCguZ/6mA7y7cT/Ldx1GFc4amMMdk09m8vA84u2KKmjG9uvKrOsnsG53GY9/uJ3nPtrFPxftJL9LChecksfZg3I5rW8Xkj3usymoiUBEJgEPAPHA31T1903mizt/ClAFXKuqK4IRS01tPet2l7FyVykrCw+zdMdhDlQcJSFOmDAgm59NGcKXh3a3H5AIkt05iXk3ncMHWw/y8ooiXlxexNOLd9G5UwJj+3VlXL+uDMnL4OQe6eSmd4qKq7vWVB+rZ+ehI+w4WMXHxeWsKixldVEppVXOUKDDe2Vw67knccnofHp3tfqVUBreK5M/XzqSX0wd5px8ri3m7x9s57H3t5EUH8fI3lmckp/JkLwMhuSl0zc7jc4hbK0VtD2JSDzwF+DLOOMXLxWR11V1g89ik4GT3Mc4YIb7N+DmrC3mhy+sBqB31xQm9M/mSyd344snd/O03M6cmPg44ZxBuZwzKJeKmlre33yQhZ8cZNEnJfx346dFYZkpifTMSqFHRid6ZKbQIyOZrp2TyEhOoHMn55HWKYHkxDgS4uJITIgjMU5IjHeeJ8QJCXGCiNCYTkRod3JRVRrUGeKxvkE//dsA9e5zVaVelZraBqqP1VNdW0f1sQaqa+upOlZHWXUtJZXHOHTkGIeqjnGg4ii7SqrYW/5pG3YRGNQtnfOH9mBMQRfOGZxLt/TIruuKBpmpiXxjTG++MaY3FTW1LNtxmEXbSliy/RBPL97JUZ+b0hqP2V5ZKeRlJpPdOYnx/bMZ3z874HEFM+WMBbaq6jYAEXkeuAjwTQQXAf90h6dcLCJZIpKnqgHvzekLJ+Xy12+OYWTvLHJjsMOsWJCenMgFI/K4YIRz1/ehI8fYuLecjcUVfHKgkr1lNewtr2FNURklrfRj1FGNOUH4NEGIz3Tnhz9w+8tMSSQ7LYnszkmcOTCHguxU+uakUZCdSv/cziE9ozTtl56cyBfdk1GAuvoGdpQcYePeCgoPVbO7tIo9pTUUHqrio+0llNfUcfMXNeISQS+g0Of/Ij5/tt/cMr2AzyQCEZkOTHf/rRQRr4cKygHCt4P/5uVcNT58Yr7Kv8XafJ/l22FX3BNpx0akxctVERgzAYr5x3+AH3d89b4tzQhmImjuG9r0fMifZVDVmcDMQAQVCCKyrKXa93BlMYdGpMUcafGCxRwMwawZLQJ6+/yfDzS999qfZYwxxgRRMBPBUuAkEeknIknA5cDrTZZ5HfimOMYDZcGoHzDGGNOyoBUNqWqdiNwM/Aen+ejjqrpeRG5w5z8KzMFpOroVp/not4MVT4CFTTFVO1jMoRFpMUdavGAxB1zE3VlsjDEmsOzuKWOMiXGWCIwxJsZZImiBiHQVkXkissX926WZZXqLyLsi8rGIrBeRW33m3SMiu0VklfuYEsRYJ4nIJhHZKiJ3NDNfRORBd/4aERnt77oexXuVG+caEVkoIqf6zNshImvd9zR43dC2P+aJIlLm83n/3N91PYz5xz7xrhORehHp6s4L+fssIo+LyH4RWdfC/LA6jv2MOeyO5Wapqj2aeQB/BO5wn98B/KGZZfKA0e7zdGAzMNT9/x7g9hDEGQ98AvQHkoDVjTH4LDMFeAvnvo3xwBJ/1/Uo3jOALu7zyY3xuv/vAHJCfCz4E/NEYHZH1vUq5ibLXwj81+P3+WxgNLCuhflhcxy3I+awOpZbetgVQcsuAp50nz8JXNx0AVUtVreTPFWtAD7GuTM6lI535aGqx4DGrjx8He/KQ1UXA1kikufnuiGPV1UXquph99/FOPeXeOlE3icv3uOO7PcK4LkQxNUiVX0fONTKIuF0HANtxxyGx3KzLBG0rLu69zS4f7u1trCIFACjgCU+k292Lwkfb65oKUBa6qbDn2X8WTfQ2rvP7+CcBTZSYK6ILHe7HgkFf2OeICKrReQtERnWznUDze/9ikgqMAl4yWeyF+9zW8LpOO6IcDiWmxXTvVKJyNtAj2Zm/ayd2+mM8yW6TVXL3ckzgHtxPux7gfuA6zoebcu7b2aav115+NXFR4D5vU8R+SLOl+csn8lnquoeEekGzBORje5ZWTD5E/MKoK+qVrr1Qa/i9KrrxXtMO/d7IfChqvqe2XrxPrclnI7jdgmjY7lZMZ0IVPW8luaJyD5xe0J1Lz+bHd5LRBJxksAzqvqyz7b3+SzzV2B24CL/jBPpyiPJj3UDza9uRURkBPA3YLKqljROV9U97t/9IvIKTrFAsL88bcbscwKAqs4RkUdEJMefdYOkPfu9nCbFQh69z20Jp+PYb2F2LDfP60qKcH0A/8dnK4v/2MwyAvwTuL+ZeXk+z38APB+kOBOAbUA/Pq0oG9ZkmQv4bCXbR/6u61G8fXDuNj+jyfQ0IN3n+UJgUgiOBX9i7sGnN2iOBXa573fI3+P2fLZAJk4Zd5rX77O7vwJarngNm+O4HTGH1bHc4mvwasfh/gCygXeALe7fru70nsAc9/lZOJega4BV7mOKO+8pYK0773V8EkMQYp2C02LpE+Bn7rQbgBvc54IzSNAnbkxjWls3BO9tW/H+DTjs854uc6f3d7/kq4H1oYrXz5hvdmNajVMpeEZr64ZDzO7/19LkJMWr9xnnqqQYqMU5+/9OOB/HfsYcdsdycw/rYsIYY2KctRoyxpgYZ4nAGGNinCUCY4yJcZYIjDEmxlkiMMaYGGeJwBhjYpwlAmOMiXH/H7+sWpNva6hfAAAAAElFTkSuQmCC\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"data.binarize(threshold = -4.7, order = 'ascending')\n",
"data.label_distribution()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, we see that it transforms to binary data. We can also see the size between positive and negative classes. \n",
"\n",
"TDC also supports data balancing. For example, for high-throughput screening data, the majority of data are negative samples and using this highly imbalanced data would cause the ML training unstable. \n",
"\n",
"We use SARSCoV2 3CL Protease screening to illustrate. First, let's load it:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYUAAAEWCAYAAACJ0YulAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAAxIUlEQVR4nO3dd5xUhbn/8c+zBZaFpe1SpLlWrAiIijXcaKISiRq9MXZiwRITydX8YiyJN5ro9ZpYL9xo9GJXYhcxEYhYoiKgiCgCFgSULrK7wMKW5/fHOTsOy+zuLFOX+b5fr3nNzCnPec6Zc85z+pi7IyIiApCX6QRERCR7qCiIiEiEioKIiESoKIiISISKgoiIRKgoiIhIhIqCiIhE7BBFwcymm9kF6e437P9IM1uwvf3HiPeSmZ0bfh5tZm8kMfaZZvZysuK1YriHm9kiM6sys5PSPXzJTmb2oZmNyHQezTGzcjNzMyvIdC7pklVFwcwWm9kxmc6jgZldb2Y1ZlYZvhaa2d1mtlNDN+7+ursPjDPWwy115+7Hu/sDSch9m5nZ3R9x9+8nGns7/B642907ufuzjVua2RFm9qaZrTezr83sX2Z2UKNuOoZFZXKM/heb2aaw/Qozm2BmnaLa9zOzp8xsTTiMD8xsdFT7duHvs8jMNoTx7jez8uZGysxOD7u1Rs0LzGyVmZ1gZsPNbEo4XqvN7G/R808zscea2WdmVmFmX5nZbdG/pQV+YWbzwpyXhbH3D9tPMLMbm4jtYT9VZvalmf3ZzPJbyileUfNeVfhaaWaTzOx70d25+77uPj1Zw82EqHmv0sy+Cefji80srnVruopOa4aTVUUhSz3h7iVAd+BkoDcwO54FuzXChXxH/T12Bj6M1cLMOgOTgLsIpnFf4D+BzY06PTVs9v0mpv0od+8EDAaGAL+JavcQsDTMoxQ4B1gZ1f5J4IfAGUAX4ABgNnB0C+P1DNAV+E6j5scBDvwd6AbcA5SHw68E/q+FuAAvAEPdvTOwX5jTL6La3wFcHjbrDuwJPAv8II7YAAeE0+togvG+sHEHSVhRdQ2HcQAwBXgmuhjvQEaF64idgZuBXwP3ZTalBLh71ryAxcAxMZp3I1hxrAbWhZ/7RbWfDtwEvAOsB54Duke1Hw68CXwDvA+MaNTvBU3kcz3wcKNm+WGMW8PvI4BlUe1/DXxJsPAvIFjojgO2ADVAFfB+1LD/APwL2ATsHp0PMDpsd1c4Xh8DRzc1vaLzBZYQrJiqwtehYbw3oro/DJgZxp4JHNZoutwQDr8SeBkoa+a3uxD4BPgaeB7oEzb/FKgPx68KaN+ov2HAN3HMG/8Mp9W7wJXNzTfALcCLUd+rgMFNxD0mzK1/M8PuE47T1+E4XhjV7h7g/kbdTwT+3ESsoUBlK5eLUmAqMC78vgdQBxzcTD8TgBubaOfA7lHf/wbcTVC4HDg/nH9eI9hwvBb4AlgFPAh0aSHfhjgFjZpfSVCM8xr/bsDBwFsEy+jyMJ92jXK+FFgUzo83ALuF/VSE0zy6+5jzY1Ssi8NY64D/ASxq+b4VWAN8Bvws1rg0t84Kx6Ue2C/8/gPgvTDPpcD1Ud3GWk53I5jf14Z5PEJQYJtcx4TN84CrCJa5teE06d7UcJr8/Vozc6b6FWsCRy0UpwDFQEk4Ez8b1X56OJH2AzoCT/HtyrFvOIFGhhPte+H3HlH9xl0Uwua/B2aEn0cQFgVgYPijN6wQy4HdmooVDnsJsC9QABSybVGoBX4ZtjuNYAXePdb0YuuiUE6jmZmookCwdbkOODsc9unh99Ko3D4l2ALtEH6/uYnp9F2CmXco0J6giL3W0u8atusc/h4PAMcD3WJ0M4BgIdsHuAKY29R8A/QDPgDuiGo/laC4/QQY0Kjfm4FXW5gvXwXGAUUEeyKr+XZBPJxgYe8Qfu9CUGQGNxFrLPB2nMvDGWFsD4d5QNj8YuCLFvqdQBxFIZymKwgKQcM88yDBctQBOI9g5bor0Al4GniohWFvM++FzXcNm+8d43c7kGDjrSDsfz4wtlHOz4fzy74Ee43TwphdgI+Ac+OcH51gw7JrOG+tBo6LmrYfA/0JlpFXYo1LHOusJcAlUeuI/QnWP4MICuNJzSynuxOsp9oDPQiK8+1xrGPGAm8TLAPtgb8AjzX3m8Qcp3hmznS9mprAMbobDKyL+j6dqBUWwYy+haDq/7rxTAz8I2oGmk7ri8LFwKKoH7yhKOxOsDV1DFDYUqxw2L+P0Sy6KHxFuBUTNnsHODvW9KJ1ReFs4J1Gw34LGB2Vx7VR7S4F/t7EdLoPuCXqeyeCvaLyeH5XYG+CldgygiL4PNArqv21wJzwcx+CreQhjeabKoItJydYWXSNat+NYOX/YdjvHOCgsN29wOPN5NY/7KckqtlNwISo74uAM8LPFxLuCcaINYhgy/XIVi4XexBsGfcOv19DC4WFlotCBcFGwKfAjQQrrIZ5ZteobqcBl0Z9Hxj+tk2uXGLNe2HzorD54S3NFwQruGca5Xx41PfZwK+jvv+Jb1ecLc2PDhwR1X4icFX4+Z/AxVHtvh9rXBrNe7GKwtvANU30cztwW3PTqlH3JwHvhZ+bW8fMZ+sjCTs1/FbxDKfh1SaOYZtZsZn9xcy+MLMKgsrZtdHJsaVRn78g2LIuIzjO9+/hSaBvzOwb4AiCCba9+hIs3Ftx908IZubrgVVm9riZ9Wkh1tIW2n/p4S8c+oJgxZioPmGsaF8QjFuDFVGfNxIsXC3Gcvcqgq3/vk10vxV3n+/uo929H8HeXh+CBafBOQS70Lj7VwRb7uc2CnOSB8d1RwB7Efz2DfHXuftV7r4v0IugKDwbniBeS/PzQh/ga3evjGrWeDo9GOYIQbHd5kIBM9sdeAm43N1fb2Z423D3RQQFbVzYqKWc4zHU3bu5+27ufq2710e1i54nG88nXxCsZHptxzAbptk2y46Z7RmejF4RLuN/JOo3DEWfB9oU43vD/BnP/NjUvN2Hbdcl2yOyjjCzQ8zslfBCg/UEG5WNxy3CzHqG644vw2nxcEP3LaxjdiY4b9OwnptPsEHTqt+qTRQFgkMGA4FDPDjxdlTYPPqqj/5RnwcQVMg1BD/wQ+7eNerV0d1v3p5EwpPBo4CYC7a7P+ruRxD8QA78V0OrJkI21bxB30ZXtwwg2HsA2EBwSK1B71bE/SrMMdoAgsNwrbVVLDPrSHDIr9Wx3P1jgq3c/cJYhxFsKf8mXGGsAA4BTo91ItTdXw37v7WJ+GvCdn0IDg9MBQ42s37NjFt3MyuJatZ4Oj0IHG1mhxIcAnk0OoCZ7RwO5wZ3f6jJkW9eAcGxZgi23vuZ2bDtjNWS6Hmn8XwygGBvbiWtdzLBVm6sS7jHExy22SNcxq9m6+W7NRKZH5ez7bqkVSy4cq4v0HA5+aMEe7/93b0L8L98O26xltObwuaDwmlxVlT3za1jlgLHN1rXFbn7l00MJ6ZsLAqFZlYU9SogOI+wCfjGzLoDv4vR31lmto+ZFRMc83/S3esIquwoMzvWzPLDmCOaWQnEZGaFZrY38BjByvfPMboZaGbfNbP2QHWYc13YeiVQvh1XGPUEfhEO/98JDrU0XJY5B/hJ2G4YwRU6DVYTHIfftYm4k4E9zewMCy6hPI3gsNukVuYHwUz/UzMbHI77HwnOuSxuqUcz28vMrmj4PcysP8H5jbfDTs4luHJlH4LDhoMJCkYxwTmIWG4Hvmdmg8OY/2Vm+4XjWQJcAnzi7mvdfSrfXhlzYEM34WWF57n7UoKLFG4K551BBMffH2kYmLt/QbACeAyY4u6RrVAz60twSOJ/3P1/W5oeUf1dYGY9w8/7EFxNNS0c3iKCvYbHwnm5XZjbT8zsqqgwDfN7w6tdvMOP8hjwSzPbxYLLfP9IcEVebSvGpZeZXUaw3P6m0V5JgxKCQ1pVZrYXwW+0vbZ7fiQ4lPQLCy5j7kZw4jYuZtbZzE4AHic4jPtB2KqEYG+z2swOJjhX1CDWclpCcDj0m3D++VXUMJpbx/wv8IdwIwQz62FmJzYznNhaOr6UzhfB8Tlv9LqRYKtuejihFgIXEXV8jK2vPqoguJyvLCruIQSHHL4OJ86LhCccafmcQsMVQxsIjh2PA/pGdTOCb88pDApzqAyHNYlvTwiVEqw41gHvNjVsYl99dDfBCeaFwPejut0VmBHm9yJwJ1HnLQiK42qCKzqGs+3VR0cQHJtdH74fESuPqFzeiDWdwvYXExyfbhjv6KvDFtP0seO+BAvil+E0/pLgBFlngmPQ6wgu+Wvc3ziCwh8zPsGW51Ph57vC364qnB6TCE92hu3bEVwG+0mYwxfAX6PmkX5hP1+H43hxjHxGE8yTpzVq/ju2vuqjCqiKY1n4P4INiQ3h+P03UBTV3gguSf2Q4PDHl8ATwL5h+wlsuyw1nE9yoq4+iopZzrbnofKA3xJsha4m2Mja5mKAJuI0LDerCDZCjouxvDecaD6KYE+himAv/PdsPa9ulTPBsjQ66vuNwF/jnB8bx5pAeP6FYI/sNoLDTZ8T39VHmwiW+fUE5+V+BuRHdXNqOE9VhrncTfPL6b4Ey2MVwYbfFcS3jskD/oNgT6wyHP8/NjWcpn6/hsuwREREsvLwkYiIZIiKgkiGWPDsn6oYrzMznVtLLHiOVqzcY965Lm2HDh+JiEhEm3jyX1lZmZeXl2c6jbRaEF60N7DFR+21EGdtEGhgaYKBRKTNmT179hp379GaftpEUSgvL2fWrFmZTiOtRowI3qdPTzDOhCDQ9NEJBhKRNsfMWn3zXZsoCrno2GPvCj/9PKE4D528vfdKiUguUlHIUm+88ffwU2JFoX+X/i13JCISUlHIUps2NfWYodZ5Yt4TAJy232lJiSciOzYVhSy1YUOXpMQZP2s8oKIgIvHRfQoiIhKhPYUsVVcX65lhIiKppaKQtXRToYikn4qCiOzQRjTc9ANMT/TGnxygopClOnZcnJQ4T/74yaTEEZHcoBPNWcqsDrO6ljtsQVlxGWXFTf7zn8gOLXovIdZ32Zb2FLLUli3dkxJnwpwJAIwePDop8URkx5a1RcHMxgBjAAYMaPXfpLZ5W7Z0S0ocFQURaY2sPXzk7ve4+zB3H9ajR6se8iciItspa4uCiIikn4qCiOywGl+CqktSW6aiICIiEVl7ojnXlZQsTkqcyWdOTkockbZKeweto6KQpYqLi5ITp7A4KXFEJDeoKGSpZD06e9zMcQBcetClSYknIjs2nVPIUps2dUrKH+1M/HAiEz+cmISMRCQXqCiIiEiEioKIiETonEKW6tBh/0ynICI5SEUha/080wmISA4y9+z/hy8zWw18kaRwZcCaJMVKtbaSq/JMrraSJ7SdXHM1z53dvVUPj2sTRSGZzGyWuw/LdB7xaCu5Ks/kait5QtvJVXnGTyeaRUQkQkVBREQicrEo3JPpBFqhreSqPJOrreQJbSdX5RmnnDunICIiTcvFPQUREWmCioKIiESoKIiISISKgoiIRKgoiIhIhIqCiIhEqCiIiEiEioKIiESoKIiISISKgoiIRKgoiIhIhIqCiIhEqCiIiEiEioKIiEQUZDqBeJSVlXl5eXmm00irBQuC94EDE4yzNgg0sDTBQCLS5syePXtNa/+juU0UhfLycmbNmpXpNNJqxIjgffr0BONMCAJNH51gIBFpc8zsi9b2o8NHIiISoaIgIiIRKgoiIhLRJs4p5KLJk5MU58wkBRLJoJqaGpYtW0Z1dXWmU8lKRUVF9OvXj8LCwoRjqShkqeLiJMUpTFIgkQxatmwZJSUllJeXY2aZTieruDtr165l2bJl7LLLLgnH0+GjLDVuXPBKOM7McYybmYRAIhlUXV1NaWmpCkIMZkZpaWnS9qK0p5ClJk6EVRXVdD1wVcz2ZxwyIL44H04E4NKDLk1abiKZoILQtGROG+0piIhIhIqCiEgczIyzzz478r22tpYePXpwwgkntCrOiBEjIjfjjhw5km+++SaZaSZMh49EROLQsWNH5s2bx6ZNm+jQoQNTpkyhb9++CcWcnKzLDJNIRUFE2pT/fOFDPvqqIqkx9+nTmd+N2rfF7o4//nhefPFFTj31VB577DFOP/10Xn/9dQA2bNjAz3/+cz744ANqa2u5/vrrOfHEE9m0aRM//elP+eijj9h7773ZtGlTJF7DI3zKyso46aSTWLp0KdXV1Vx++eWMGTMGgE6dOnH55ZczadIkOnTowHPPPUevXr2SOv7RdPgoS02fDteOj32SuVVxRk/Xc49EkuQnP/kJjz/+ONXV1cydO5dDDjkk0u4Pf/gD3/3ud5k5cyavvPIKv/rVr9iwYQPjx4+nuLiYuXPncs011zB79uyYse+//35mz57NrFmzuPPOO1m7di0QFJvhw4fz/vvvc9RRR3HvvfemdBxTtqdgZv2BB4HeQD1wj7vfYWbXAxcCq8NOr3b37NuHEpGsFM8WfaoMGjSIxYsX89hjjzFy5Mit2r388ss8//zz3HrrrUBwGe2SJUt47bXX+MUvfhHpf9CgQTFj33nnnTzzzDMALF26lEWLFlFaWkq7du0i5y0OPPBApkyZkqrRA1J7+KgWuMLd3zWzEmC2mTWMzW3ufmsKh93m3XorvLekhB+cWZlYnDeDyXzlYVcmIy2RnPfDH/6QK6+8kunTp0e25iG4ieypp55iYIzn3bd0yej06dOZOnUqb731FsXFxYwYMSJy30FhYWGk//z8fGpra5M4NttK2eEjd1/u7u+GnyuB+UBiZ2VyyKRJ8N4bHRKPs3ASkxZOSkJGIgJw3nnn8dvf/pb9999/q+bHHnssd911F+4OwHvvvQfAUUcdxSOPPALAvHnzmDt37jYx169fT7du3SguLubjjz/m7bffTvFYNC0t5xTMrBwYAswIG11mZnPN7H4z69ZEP2PMbJaZzVq9enWsTkRE0q5fv35cfvnl2zS/7rrrqKmpYdCgQey3335cd911AFxyySVUVVUxaNAgbrnlFg4++OBt+j3uuOOora1l0KBBXHfddQwfPjzl49EUa6hqKRuAWSfgVeAP7v60mfUC1gAO3ADs5O7nNRdj2LBhnot/srOqorrJk83x3tGsP9mRHcH8+fPZe++9M51GVos1jcxstrsPa02clO4pmFkh8BTwiLs/DeDuK929zt3rgXuBbcumiIhkRMqKggVnRu4D5rv7n6Oa7xTV2cnAvFTl0JZ16ACF7RPfi+tQ2IEOhYmfmxCR3JDKq48OB84GPjCzOWGzq4HTzWwwweGjxcBFKcyhzXrpJXh0RuLnUl4686UkZCMiuSJlRcHd3wBiXYelexJERLKU7mjOUjfcAM/c3znxOK/ewA2v3pCEjEQkF6goZKlp0+DDmUWJx/l8GtM+n5aEjEQkF6goiIjEIVmPzs52KgoiInGIfnQ2kJRHZ2cjFQURaXNGjNj21fCf5hs3xm4/YULQfs2abdvFq+HR2UDk0dkNNmzYwHnnncdBBx3EkCFDeO655wBYvHgxRx55JEOHDmXo0KG8+eabQPC8oxEjRnDqqaey1157ceaZZ5Lqm4njoaKQpUpLoVOX+sTjFJdSWlyahIxEZHsend2zZ0+mTJnCu+++yxNPPBF5YioEz0e6/fbb+eijj/jss8/417/+lYnR2or+ZCdLPfUUPDpjTeJxfvxUErIRyS7Tpzfdrri4+fZlZc23b872PDq7T58+XHbZZcyZM4f8/HwWLlwY6efggw+mX79+AAwePJjFixdzxBFHbF9ySaKiICLSCq19dPb1119Pr169eP/996mvr6eo6NurCtu3bx/5nI7HYsdDh4+y1G9+A4+P65J4nKm/4TdTf5OEjEQEWv/o7PXr17PTTjuRl5fHQw89RF1dXdpzbg0VhSz11lvwyQftW+6wpTjL3uKtZW8lISMRgdY/OvvSSy/lgQceYPjw4SxcuJCOHTumO+VWSfmjs5NBj87elh6dLblEj85uWZt4dLaIiLQtKgoiIhKhq4+yVL9+ULsm8RNS/Tr3S0I2Ipnn7pE/sJetJfM0gIpClnr4YXh0xtqWO2wpzo8eTkI2IplVVFTE2rVrKS0tVWFoxN1Zu3btVpe6JkJFQUSyXr9+/Vi2bBmrVyf+x1M7oqKioshNcIlSUchSY8fCghVdOfuX3yQW5+9jAbj9uNsTTUkkYwoLC9lll10ynUZOUFHIUnPmwKqKdonHWTEn4Rgikjt09ZGIiESoKIiISISKgoiIROicQpbac0/IW5X4ExP3LN0zCdmISK5IWVEws/7Ag0BvoB64x93vMLPuwBNAObAY+LG7r0tVHm3VPffAozO+TjzOqHuSkI2I5IpUHj6qBa5w972B4cDPzGwf4CpgmrvvAUwLv4uISBZIWVFw9+Xu/m74uRKYD/QFTgQeCDt7ADgpVTm0ZWPGwF9v6p54nBfGMOaFMUnISERyQVrOKZhZOTAEmAH0cvflEBQOM+vZRD9jgDEAAwbE95joHcnChbCqIvGfZ+HahS13JCISSvnVR2bWCXgKGOvuFfH25+73uPswdx/Wo0eP1CUoIiIRKS0KZlZIUBAecfenw8YrzWynsP1OQOx/kRERkbRLWVGw4FGG9wHz3f3PUa2eB84NP58LPJeqHEREpHVSeU7hcOBs4AMzmxM2uxq4GZhoZucDS4B/T2EObdbgwbBgxZbE4/QenHAMEckd+o/mLPbojCVNtov3P5pFJHfpP5pFRCQhKgpZ6qyzYNzvShOP8/RZnPX0WUnISERygZ59lKWWLYOvK/ITj1OxLAnZiEiu0J6CiIhEqCiIiEiEioKIiETonEKWOvRQ+PCrzYnH6XdoErIRkVyhopClbroJHp2xPvE4x9yUhGxEJFfo8JGIiESoKGSpU06B268qSzzOxFM4ZeIpSchIRHKBDh9lqbVroaoi8Zq9duPaJGQjIrlCewoiIhKhoiAiIhEqCiIiEqFzClnq6KNh7rLqxOPscnQSshGRXKGikKWuuw4enRH3X1o3Hec71yUhGxHJFXEdPjKzp8zsB2amw00iIjuweFfy44EzgEVmdrOZ7ZXCnAQ4/nj4r7E9Eo/zyPEc/8jxSchIRHJBXEXB3ae6+5nAUGAxMMXM3jSzn5pZYSoTzFWbNkHNZks8Ts0mNtVsSkJGIpIL4j4cZGalwGjgAuA94A6CIjElJZmJiEjaxXWi2cyeBvYCHgJGufvysNUTZjYrVcmJiEh6xXv10V/dfXJ0AzNr7+6b3X1YCvISEZEMiPfw0Y0xmr3VXA9mdr+ZrTKzeVHNrjezL81sTvga2Zpkc8kJJ8CQIxI/F3DCnidwwp4nJCEjEckFze4pmFlvoC/QwcyGAA1nPjsDxS3EngDcDTzYqPlt7n5r61PNLVdeCY/OqEw8zmFXJiEbEckVLR0+Opbg5HI/4M9RzSuBq5vr0d1fM7PyRJITEZH0arYouPsDwANmdoq7P5WkYV5mZucAs4Ar3H1drI7MbAwwBmDAgAFJGnTbMWIErKroybXjVyUWZ8IIAKaPnp5wTiKy42v2nIKZnRV+LDez/2j82o7hjQd2AwYDy4E/NdWhu9/j7sPcfViPHonfxCUiIi1r6fBRx/C9UzIG5u4rGz6b2b3ApGTEFRGR5Gjp8NFfwvf/TMbAzGynqHscTgbmNde9iIikV7wPxLvFzDqbWaGZTTOzNVGHlprq5zGCy1YHmtkyMzsfuMXMPjCzucC/Ab9MeAxERCRp4r157fvu/v/M7GRgGfDvwCvAw0314O6nx2h8X+tTzE0//jHM/Hxj4nH2/XESshGRXBFvUWh46N1I4DF3/9os8Ye1SdMuvRQenVGVeJyDLk1CNiKSK+K9o/kFM/sYGAZMM7MeQOJ/CyZN2rgRNlcnXng31mxkY03iexwikhvifXT2VcChwDB3rwE2ACemMrFcN3Ik/PcvE78Ud+QjIxn5iJ4mIiLxac3fce5NcL9CdD+NH2EhIiJtWLyPzn6I4KazOUBd2NhRURAR2aHEu6cwDNjH3T2VyYiISGbFe6J5HtA7lYmIiEjmxbunUAZ8ZGbvAJsbGrr7D1OSlTB6NLz16YbE4wwenXAMEckd8RaF61OZhGxr9GhoN0NFQUTSK66i4O6vmtnOwB7uPtXMioH81KaW29asgcpv8ijpWp9YnI1rACgrLktGWiKyg4v32UcXAk8Cfwkb9QWeTVFOApx6Ktzxm8RX5KdOPJVTJ56ahIxEJBfEe6L5Z8DhQAWAuy8CeqYqKRERyYx4i8Jmd9/S8CW8gU2Xp4qI7GDiLQqvmtnVQAcz+x7wN+CF1KUlIiKZEG9RuApYDXwAXARMBq5NVVIiIpIZ8V59VG9mzwLPuvvq1KYkAJdcAm8sSvzR2ZcMuyQJ2YhIrmi2KFjwpwm/Ay4DLGxUB9zl7r9PQ34567TToG5G4o+8Pm2/05KQjYjkipYOH40luOroIHcvdffuwCHA4Wamv9JMoaVLYe3KxG8FWbp+KUvXL01CRiKSC1oqCucAp7v75w0N3P0z4KywnaTI2WfD+OtLE4/zzNmc/czZSchIRHJBS0Wh0N3XNG4YnlcojNG9iIi0YS0VhS3b2U5ERNqglq4+OsDMKmI0N6AoBfmIiEgGNbun4O757t45xqvE3Zs9fGRm95vZKjObF9Wsu5lNMbNF4Xu3ZI2IiIgkLt6b17bHBOC4Rs2uAqa5+x7AtPC7xHDFFTDyjMrE4xx6BVccekUSMhKRXBDv/ym0mru/ZmbljRqfCIwIPz8ATAd+naoc2rJRo6Cy56bE4wwclYRsRCRXpHJPIZZe7r4cIHxv8kmrZjbGzGaZ2azVq3PvJuoFC+CrLxKv2QvWLGDBmgVJyEhEckG6i0Lc3P0edx/m7sN69OiR6XTS7qKL4P6buyceZ9JFXDTpoiRkJCK5IN1FYaWZ7QQQvq9K8/BFRKQZ6S4KzwPnhp/PBZ5L8/BFRKQZKSsKZvYY8BYw0MyWmdn5wM3A98xsEfC98LuIiGSJVF59dHoTrY5O1TBFRCQxKSsKkphrr4V/zl+feJyj9F9IIhI/FYUsdcwxsKpkc+Jxdj0mCdmISK7I2ktSc92cObB4YeIPop2zYg5zVsxJOI6I5AYVhSw1diw8fFvij4Ya+/exjP372ITjiEhuUFEQEZEIFQUREYlQURARkQgVBRERidAlqVnqj3+Elz/8JvE4R/8x8WREJGeoKGSpww6DxfmJ/w32Yf0PS0I2IpIrdPgoS735Jiyc2y7xOEvf5M2lbyYhIxHJBSoKWerqq2Hi+K6Jx5l2NVdPuzrxhEQkJ6goiIhIhIqCiIhEqCiIiEiEioKIiEToktQsdfvtMPmDdYnHOe72hGOISO5QUchSgwfDR5trEo/Te3DCMUQkd+jwUZaaOhXmvdM+8TifTWXqZ1OTkJGI5ALtKWSpG2+EVRVd2O/gVYnFee1GQP/AJiLx0Z6CiIhEqCiIiEhERg4fmdlioBKoA2rdfVgm8hARka1l8pzCv7n7mgwOX0REGtGJ5iyyunIzU+evZP7yCvqNyqOwopo3Fjm79uhEn64dtivmX074S5KzFJEdWaaKggMvm5kDf3H3exp3YGZjgDEAAwYMSHN66fXp6ir+9PIC/vHhSurqnU7tCyjr1I61bGHRvFoA+nbtwNF79WRg7xLMLO7YA8sGpiptEdkBZaooHO7uX5lZT2CKmX3s7q9FdxAWinsAhg0b5plIMtVq6+q565+fcPcrn1BUkMcFR+7Cj4b0Y89enZg0yXh1wWr2+G4F875cz5ufruXBt79gYK8SfjS0b9zDeGHBCwCMGjgqVaMhIjuQjBQFd/8qfF9lZs8ABwOvNd/XjmVt1WYufng2Mxev4+QhfbnmB3tT1unbm9X+9CdYVVHCtUdu4tDdyjh4l1Le+nQNL3+0krv/+QkH7tyNYeXdWxzOn976E6CiICLxSfslqWbW0cxKGj4D3wfmpTuPTPp8zQZOGvcv5i5bzx0/Gcxtpw3eqiDEkp9nHLFHDy4ZsRvtCvI4468zeHHu8jRlLCK5IhN7Cr2AZ8Lj4gXAo+7+9wzkkRFL1m7kjHvfZnNtPY+PGc6QAd1a1f9OXTpwyXd246UPV/Dzx96l3ocw6oA+KcpWRHJN2ouCu38GHJDu4WaDZes2cvq9b7Oppo5HLxjOPn06b1ec4vYFPHT+wYy+fya/fGIORYX5fG+fXknOVkRyke5oTpMV66s5/d63qayu4eHzD9nugtCguF0B940exr59u/CzR97ltYWrk5SpiOQyFYU02LC5lvMmzGTdhhoePP8Q9uvbpcV+HnoILrl+bbPdlBQV8sBPD2K3np0Y89As5i77Zts4Jz/EQyc/tL2pi0iOUVFIsbp65/LH5/DxigruPmMIg/t3jau//v2htFddi911LW7Hg+cdTGnH9lzwwCyWr9+0dZwu/enfpf/2pC4iOUhFIcVufmk+U+ev5Hej9mXEwJ5x9/fEE/DWlOK4uu1R0p77Rx/Exi11nD9hFhs2134bZ94TPDHviVbnLSK5SUUhhR6dsYR7X/+ccw/dmXMPK29Vv+PHw7SnO8Xd/cDeJdx9xhA+XlHB5Y+/R119cL/f+FnjGT9rfKuGLSK5S0UhRd5YtIbrnpvHiIE9uO6EfdIyzBEDe3L9D/dl6vxV3PzS/LQMU0R2LHogXgp8sqqSSx6Zze49OnHX6UMoyE9f7T3n0HI+XVXFva9/zi5l8e9piIiAikLSfb1hC+dNmEX7gjzuGz2MkqLCtOdw3Qn7sOTrjVz33DxK+tfQtUP6cxCRtkmHj5KouqaOCx+cxYqKau49Zxj9usV3ojjZCvLzuOuMoQzsVcKilZVs3NLyVUwiIqA9haSpr3f+Y+Ic3l2yjv85Y2irH1/R2JNPwlOzt/8/iDq1L+D+0Qfxg7uvp251cPNc7y5FCeUkks0enbGkyXZnHLJjP34/mbSnkCQ3vTSfyR+s4JqRezNy/50SjldWBiVd6xOK0btLEQ+fdwybt3TkpxNmUhV1qaqISCwqCklw/xufRy49Pf+IXZISc8IEeHVSx4TjzFj5NCccupCFKyv52SPvUlOXWKERkR2bikKCHn77C34/6SOO3bcXvx21b6v+Fa05EybA6y8mXhQmzJnA2yue4saT9uPVhasZ+/gcalUYRKQJOqeQgIkzl3Lts/P47l49uev0oeTnJacgpMLpBw9gw+ZabnxxPmZw+2mD03qprIi0DSoK2+lvs5by66fnctSePRh35lDaFWT/CvaCI3el3p0/Tv4YM+O2Hx+gwiAiW1FRaCV3585pn3Db1IUcsXsZ95x9IEWF+ZlOK25jjtqNeoebX/qYik013H3GkIzcSyEi2Umbia2wpbaeK/82l9umLuSUof24f/RBbaogNLj4O7tx04/251+frOGU8W+y9OuNmU5JRLKE9hTitGTtRn45cQ6zv1jHL4/Zk18cvXvSTirHMnkyPDEz8T/OmXzm5JjNTz94AAO6F3Pxw7M5edy/uO20wRy5R4+EhycibZv2FFrg7jz+zhKOu+M1Fq6s5M7Th3D5MXuktCAAFBdD+yJPPE5hMcWFse+sPnz3Mp659HC6dCjk7Pve4epnPtC9DCI5TkWhGfO+XM8597/DVU9/wOD+XfnH2KP44QF90jLsceNgypOJP9Bu3MxxjJs5rsn2u/fsxIu/OJILj9yFx95ZwnG3v8ZLHyynvj7xgiQibY+KQgyfrKrkskff5YS73mDusvX8btQ+PHz+IfTp2iFtOUycCDOmJf7spIkfTmTihxOb7aaoMJ9rfrAPf7voUNoX5HHJI8G4T/1oJe4qDiK5ROcUQlWba5n8wXImzlzKrC/WUdwun8v+bXcuPGpXuuTIU0aHlXfnH2OP4vn3v+KOaYu44MFZ7FxazI+G9OPkIX0ZUJqZB/yJRKuuqWNVxWZWVFSzoqKaVRXVrFhfzYzPv6Zqcy1bauvZUldPTV09NXVOfb3zp5cXUJBvFOTlUZBvdGxXQNfiQrp0CF7dOrajd+ciencpYqcuwXtZx/bkZfG9R6mSkaJgZscBdwD5wF/d/eZ057B+Yw0fr6hgxudf869P1vDuknXU1Dm79ujIVcfvxakH9qOsU/t0p5VxBfl5/GhoP0Yd0IdJc7/iydnLuH3aQm6bupC9epcwfNdShu9ayuD+XenVuX3Kz61I7qitq2fthi2srtzMivXVrKysZuX6YMW/smIzK8Mi8M3Gmm36LSrMo2O7AkqKCujUvoDCfKMwP4/CgjzyzNitR0fq6p2aOqe2vp6q6lrWb6ph0aoq1m+qYd2GLdQ2OmRakGf0CgtF785F9OpcRK/O7endJfjc0KxDu7Z3BWJz0l4UzCwf+B/ge8AyYKaZPe/uHyV7WJ+sqmTBiipWVVazqjKYqVZVbOaTVVWsqKgO84F9+3TmvCN24fv79GLogG5a0QGF+XmcPKQfJw/px1ffbOL597/ijUVreHzmEia8uRiAkqIC9ujZif7di+nRqT1lJe3p0ak9pZ3a0bF9AR0K8ykqzKOoMD/8nE9+nmEGeWbhC03vLOLuNBwx9IbvgDs437Yjqlm9B5drb6mtZ3NtXfgebK1vrqlnU00tldXBq2pzLZXVNVRW1/LNxhrWVG0OX1v4esOWbfLJMyjrFKyI+3cvZlh5t6gVdFFkBd25qIDH3lna5Hi19JTU+npn7YYtrAiL0Ir1m1i+PtgDWb6+mvnLK3hlwaqYj6HvXFRA7y5F9ChpH9nz6By+d+3QjpKiAoqiloWigm8/ty/IozA/j7w8Iz/PKMgLlov8vMwtG5nYUzgY+MTdPwMws8eBE4GkF4UH3vyCh97+AoDCfKNHp/b06FzEobuVMrB3CQN7lzC4X1e6dWyX7EHvUPp07cDF39mNi7+zG5tr65i7bD3zl1ewaGUVC1dW8t6Sb1hduZlNNdv/vw15UYXCLCjWRnYXi2yuZd+u2D1ceQPe6Dtbr/TTJT/P6NQ+OHxT1qk9u5R15KDy7pRFbVg0bJ2XdWqXlrvu8/KMHiXt6VHSnv3pErMbd6dqcy0rwz2XhgKyKtyDWVO1hZUVwZ7H+o01bEnCM8by84z7zh3GiIE9E44VL0v3iUQzOxU4zt0vCL+fDRzi7pc16m4MMCb8OhBYkKQUyoDt/6OC9GoruSrP5GoreULbyTVX89zZ3Vt1A1Im9hRibV9tU5nc/R7gnqQP3GyWuw9LdtxUaCu5Ks/kait5QtvJVXnGLxOXpC4D+kd97wd8lYE8RESkkUwUhZnAHma2i5m1A34CPJ+BPEREpJG0Hz5y91ozuwz4B8Elqfe7+4dpTCHph6RSqK3kqjyTq63kCW0nV+UZp7SfaBYRkeylx1yIiEiEioKIiETs8EXBzLqb2RQzWxS+d4vRTX8ze8XM5pvZh2Z2eRrzO87MFpjZJ2Z2VYz2ZmZ3hu3nmtnQdOUWI5eWcj0zzHGumb1pZgdkY55R3R1kZnXhvTNpF0+eZjbCzOaE8+Wr6c4xzKGl372Lmb1gZu+Hef40Q3neb2arzGxeE+2zYlmKI8/MLkfBbe077gu4Bbgq/HwV8F8xutkJGBp+LgEWAvukIbd84FNgV6Ad8H7j4QIjgZcI7u8YDszI0HSMJ9fDgG7h5+MzkWs8eUZ1909gMnBqNuYJdCW4039A+L1nluZ5dcNyBfQAvgbaZSDXo4ChwLwm2mfLstRSnhldjnb4PQWCR2g8EH5+ADipcQfuvtzd3w0/VwLzgb5pyC3yyA933wI0PPIj2onAgx54G+hqZjulIbfGWszV3d9093Xh17cJ7kFJt3imKcDPgaeAVelMLko8eZ4BPO3uSwDcPRO5xpOnAyUWPKinE0FRSPu/Nbn7a+Gwm5IVy1JLeWZ6OcqFotDL3ZdDsPIHmn2IiJmVA0OAGalPjb5A9FO8lrFtMYqnm3RobR7nE2yVpVuLeZpZX+Bk4H/TmFdj8UzPPYFuZjbdzGab2Tlpy+5b8eR5N7A3wU2oHwCXu3viD/5JvmxZlloj7cvRDvF/CmY2Fegdo9U1rYzTiWDrcay7VyQjt5YGGaNZ42uE43osSBrEnYeZ/RvBzHxESjOKLZ48bwd+7e51GXxCazx5FgAHAkcDHYC3zOxtd1+Y6uSixJPnscAc4LvAbsAUM3s9TctQa2TLshSXTC1HO0RRcPdjmmpnZivNbCd3Xx7uKsbcBTezQoKC8Ii7P52iVBuL55Ef2fJYkLjyMLNBwF+B4919bZpyixZPnsOAx8OCUAaMNLNad382LRkG4v3t17j7BmCDmb0GHEBwzitd4snzp8DNHhwE/8TMPgf2At5JT4pxy5ZlqUWZXI5y4fDR88C54edzgecadxAeC70PmO/uf05jbvE88uN54JzwyonhwPqGw2Fp1mKuZjYAeBo4O81bs9FazNPdd3H3cncvB54ELk1zQYgrT4J59UgzKzCzYuAQgvNd2ZbnEoK9GcysF8FTjT9La5bxyZZlqVkZX44ycfY9nS+gFJgGLArfu4fN+wCTw89HEOxGziXYDZ4DjExTfiMJtvw+Ba4Jm10MXBx+NoI/JfqU4HjtsAxOy5Zy/SuwLmoazsrGPBt1O4EMXH0Ub57ArwiuQJpHcFgz6/IMl6WXw/lzHnBWhvJ8DFgO1BDsFZyfjctSHHlmdDnSYy5ERCQiFw4fiYhInFQUREQkQkVBREQiVBRERCRCRUFERCJUFERaKbzO/Q0zOz6q2Y/N7O+ZzEskGXRJqsh2MLP9gL8RPCcrn+B68uPc/dNM5iWSKBUFke1kZrcAG4COQKW735DhlEQSpqIgsp3MrCPwLrCF4O7YzRlOSSRhO8QD8UQywd03mNkTQJUKguwodKJZJDH14Utkh6CiICIiESoKIiISoRPNIiISoT0FERGJUFEQEZEIFQUREYlQURARkQgVBRERiVBREBGRCBUFERGJ+P9mmrX6B+BSDQAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from tdc.single_pred import HTS\n",
"data = HTS(name = 'SARSCoV2_3CLPro_Diamond')\n",
"data.label_distribution()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We see the data is very imbalanced. To balance the data via oversampling the positive class, you can do:"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" Oversample of minority class is used. \n"
]
}
],
"source": [
"data_df = data.balanced(oversample = True, seed = 42)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Note that since data balancing adds/removes data points, we create a separate pandas data frame to keep the original data loader class intact. To visualize the label distribution for an array of labels by using a utility function:"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEWCAYAAABrDZDcAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/d3fzzAAAACXBIWXMAAAsTAAALEwEAmpwYAAA500lEQVR4nO3dd3gc9bX/8fdZSVazZMmSbKtLtsG4V1xocai2gdCc0AIBAqZcWi4kP5JcCKk3lwRCIKEnARwwnWDAFBMw3QZZcq/CRcVNLpJt9XJ+f+yaCFm2VvLuzpbzep59tNqZnfl4vdqzM/MtoqoYY4yJXC6nAxhjjHGWFQJjjIlwVgiMMSbCWSEwxpgIZ4XAGGMinBUCY4yJcFYITMgRkQUicnWgn+t5/okisranz+9ke2+JyA88968QkU98uO1LReRdX23PhC8rBMYxIrJJRE51OscBInK3iDSLyD7PbZ2I/EVEMg+so6ofq+oQL7f1z67WU9XpqvqUD7IXiIiKSHS7bT+jqqcf6bZN+LNCYMw3Pa+qSUBf4DxgALC4fTHwBXGzvz8TFOyNaIKOiKSKyBsiUiUiezz3czqsNkhEvhCRGhF5TUT6tnv+ZBH5TESqRWSpiEztbgZVbVbVlcCFQBVwm2fbU0Wkot2+/p+IVHqOINaKyCkiMg34GXChiOwXkaWedReIyG9F5FOgDhjYyakqEZEHPf+uNSJySrsF3ziC6nDU8ZHnZ7Vnn1M6nmoSkeNE5EvPtr8UkePaLVsgIr8WkU89/5Z3RSS9u6+bCU1WCEwwcgH/APKBPKAe+EuHdS4HrgKygBbgAQARyQbeBH6D+1v97cDLIpLRkyCq2gq8BpzYcZmIDAFuBI71HEWcAWxS1beB3+E+uuitqqPbPe0yYBaQBGzuZJeTgA1AOvAL4JX2Re4wTvL8TPHs8/MOWfvifl0eANKA+4A3RSSt3WqXAFcC/YBeuF87EwGsEJigo6q7VPVlVa1T1X3Ab4FvdVhttqquUNVa4E7geyISBXwfmKeq81S1TVXnA0XAjCOItAV3UemoFYgFholIjKpuUtWvutjWk6q6UlVbVLW5k+U7gPs9RyTPA2uBM48g+wFnAutVdbZn33OANcDZ7db5h6quU9V64AVgjA/2a0KAFQITdEQkQUQeFZHNIrIX92mPFM8H/QHl7e5vBmJwf4vOB77rOS1ULSLVwAnAkZzjzwZ2d3xQVUuBW4G7gR0i8pyIZHWxrfIullfqN0eC3Iz7qOdIZXHwEchm3P+2A7a1u18H9PbBfk0IsEJggtFtwBBgkqom85/THtJundx29/OAZmAn7g/a2aqa0u6WqKq/70kQzwXds4GPO1uuqs+q6gm4C5AC/3dg0SE22dVwv9ki0v7fmYf7iASgFkhot2xAN7a7xZOxvTygsovnmQhghcA4LUZE4trdonGfP6/HfeGzL+5z5R19X0SGiUgC8CvgJc/5/H8CZ4vIGSIS5dnm1E4uNh+WiMSIyFBgDu4P3Ps6WWeIiJwsIrFAgydzq2fxdqCgBy2D+gE3e/b/XWAoMM+zbAlwkWfZBGBmu+dVAW3AwENsdx5wtIhcIiLRInIhMAx4o5v5TBiyQmCcNg/3B+iB293A/UA87m/4C4G3O3nebOBJ3Kcz4oCbAVS1HDgHd6udKtxHCD/G+/f6hSKyH6gG5gK7gPGquqWTdWOB33tybsP9If4zz7IXPT93iUixl/sGWAQc5dnmb4GZqrrLs+xOYBCwB/gl8OyBJ6lqnWf9Tz2nxCa336hnG2fhPtraBfwEOEtVd3YjmwlTYhPTGGNMZLMjAmOMiXBWCIwxJsJZITDGmAhnhcAYYyJcdNerBJf09HQtKChwOoYxB1lS7h6dekxul4OTGhNwixcv3qmqnQ61EnKFoKCggKKiIqdjGHOQlFunAlB0/wJHcxjTGRHpbGwrIAQLgemeBx98kNLSUqdjRIQpm92Ddd5yyy0OJwl/gwcP5qabbnI6RtiwQhDmSktLWbJiNa0J3gxgaY5EVF0NAIs32J+VP0XVHTTskzlC9o6NAK0Jfak/5kgG3zTecC17GYA2e639Kn7NvK5XMt1ihcAYH2l27QEgqov1jAk21nzUGGMinB0RhLnKykpcDXVOxzDGZ1wNe6msbHE6RlixI4IwV19fj7R1NhGWMaFJ2pqpr693OkZYsUJgjDERzk4NGeMjCeKeMKzR4RzGdJcVAmN8xBVlB9gmNFkhMMZHGpvsorwJTSFRCERkFjALIC8vz+E0xnSu2VUNWD8CE3pC4lhWVR9T1QmqOiEjo9PB84wxxvRQSBQCY4wx/mOFwBhjIpwVAmOMiXAhcbHY9Fx8fDz7mtTpGBEh0ZUJQIPDOcKdumKIj493OkZYsSOCMJednU1bXLLTMSKCuARxidMxwl5bXDLZ2dlOxwgrdkRgjI80NNU6HcGYHrFCYIyPtLjcM5RZPwITauzUkDHGRDg7IogAUXW7bXq/AKjFfVHeXmv/cs9Z3N/pGGHFCkGYGzx4sNMRIsbbZRUAnJRnH1L+1d/e1z4mqqHVtHDChAlaVFTkdAxjDpJy61QAqu9f4GgOYzojIotVdUKny0KtEIhIFbDZwQjpwE4H998TljlwQjG3ZQ4MpzPnq2qng7WFXCFwmogUHaqqBivLHDihmNsyB0YwZ7ZWQ8YYE+GsEBhjTISzQtB9jzkdoAcsc+CEYm7LHBhBm9muERhjTISzIwJjjIlwVgiMMSbCWSEwxpgIZ4XAGGMinBUCY4yJcFYIjDEmwlkhMMaYCGeFwBhjIpwVAmOMiXBWCIwxJsJZITDGmAhnhcAYYyKcFQJjjIlwVgiMMSbCRTsdoLvS09O1oKDA6RjGHGRJ+VoAxuQOcTiJMQdbvHjxzkPNWRxyhaCgoICioiKnYxhzkJRbpwJQdP8CR3MY0xkR2XyoZXZqyBhjIpwVAmOMiXBWCIwxJsL57RqBiMQBHwGxnv28pKq/6LCOAH8GZgB1wBWqWuyvTMb4U+kv5zkdIWI0NzdTUVFBQ0OD01GCTlxcHDk5OcTExHj9HH9eLG4ETlbV/SISA3wiIm+p6sJ260wHjvLcJgEPe34aE3LS+yQ4HSFiVFRUkJSUREFBAe7vkwZAVdm1axcVFRUUFhZ6/Ty/nRpSt/2eX2M8N+2w2jnA0551FwIpIpLpr0zG+NNF9z7ERfc+5HSMiNDQ0EBaWpoVgQ5EhLS0tG4fKfm1+aiIRAGLgcHAX1V1UYdVsoHydr9XeB7b2mE7s4BZAHl5eX7LG66eXVTmyH4vmRRZ/1dvl7/guXeDozkCyan31thkrAgcQk9eF79eLFbVVlUdA+QAE0VkRIdVOkvc8agBVX1MVSeo6oSMjE77QxhjjOmhgLQaUtVqYAEwrcOiCiC33e85wJZAZDLGmJ4SES677LKvf29paSEjI4OzzjqrW9uZOnXq1x1kZ8yYQXV1tS9jes1vhUBEMkQkxXM/HjgVWNNhtbnA5eI2GahR1a0YY0wQS0xMZMWKFdTX1wMwf/58srOzj2ib8+bNIyUlxQfpus+f1wgygac81wlcwAuq+oaIXAegqo8A83A3HS3F3Xz0Sj/mMcaEoV++vpJVW/b6dJvDspL5xdnDD7vO9OnTefPNN5k5cyZz5szh4osv5uOPPwagtraWm266ieXLl9PS0sLdd9/NOeecQ319PVdeeSWrVq1i6NChXxcS+M/wOenp6Zx77rmUl5fT0NDALbfcwqxZswDo3bs3t9xyC2+88Qbx8fG89tpr9O/f/4j/vf5sNbRMVceq6ihVHaGqv/I8/oinCBxoWfRfqjpIVUeqqg0iZEJW9f0LqLZxhiLGRRddxHPPPUdDQwPLli1j0qT/tHz/7W9/y8knn8yXX37JBx98wI9//GNqa2t5+OGHSUhIYNmyZfz85z9n8eLFnW7773//O4sXL6aoqIgHHniAXbt2Ae4CM3nyZJYuXcpJJ53E448/7pN/S8gNOmeMMe119c3dX0aNGsWmTZuYM2cOM2bM+Mayd999l7lz5/LHP/4RcDd3LSsr46OPPuLmm2/++vmjRo3qdNsPPPAAr776KgDl5eWsX7+etLQ0evXq9fV1iPHjxzN//nyf/FusEBjjI2f9r/uP/o2f3u5wEhMo3/nOd7j99ttZsGDB19/awd2x6+WXX2bIkIOHJO+qeeeCBQt47733+Pzzz0lISGDq1Klf9wuIiYn5+vlRUVG0tLT45N9hYw0Z4yOfbH+DT7a/4XQME0BXXXUVd911FyNHjvzG42eccQYPPvggqu7W8CUlJQCcdNJJPPPMMwCsWLGCZcuWHbTNmpoaUlNTSUhIYM2aNSxcuPCgdXzNCoExxvRQTk4Ot9xyy0GP33nnnTQ3NzNq1ChGjBjBnXfeCcD111/P/v37GTVqFPfccw8TJ0486LnTpk2jpaWFUaNGceeddzJ58mS//zvkQMUKFRMmTFCbmKZ7rGdxYByYmCaSLhg717O4lqFDhzqy71CwevXqg14fEVmsqhM6W9+OCIwxJsLZxWJjfCRG4p2OYEyPWCEwxkeq/vSW0xGM6RE7NWSMMRHOCoExPnLKr37NKb/6tdMxjOk2OzVkjI8s3v1vz707Hc1hTHfZEYExxnSTr4ahDhZWCIwxppv8MQy1k6wQGGNC3tSpB98e8kwfXVfX+fInn3Qv37nz4GXeODAMNfD1MNQH1NbWctVVV3HssccyduxYXnvtNQA2bdrEiSeeyLhx4xg3bhyfffYZ4B5faOrUqcycOZNjjjmGSy+9lEB29rVCYIyPJLrSSHSlOR3DBEhPhqHu168f8+fPp7i4mOeff/7rkUjBPR7R/fffz6pVq9iwYQOffvppwP4tdrHYGB+pvO9lpyNErAULDr0sIeHwy9PTD7/8UHoyDHVWVhY33ngjS5YsISoqinXr1n39nIkTJ5KTkwPAmDFj2LRpEyeccEL3g/WAFQJjjOmh7g5Dfffdd9O/f3+WLl1KW1sbcXFxXy+LjY39+r4vh5j2hp0aMsZHptz5U6bc+VOnY5gA6u4w1DU1NWRmZuJyuZg9ezatra0Bz9wZf05enysiH4jIahFZKSIHjdUqIlNFpEZElnhud/krjzH+tnrf56ze97nTMUwAdXcY6htuuIGnnnqKyZMns27dOhITEwMduVN+G4ZaRDKBTFUtFpEkYDFwrqquarfOVOB2VfW68a0NQ919Ngx1YNgw1IFjw1AfXtAMQ62qW1W12HN/H7AaCN2GtsYYE6YCco1ARAqAscCiThZPEZGlIvKWiHQ6C7WIzBKRIhEpqqqq8mdUY4yJOH4vBCLSG3gZuFVV93ZYXAzkq+po4EHgX51tQ1UfU9UJqjohIyPDr3mN6anUqBxSo3KcjhExQm12xUDpyevi1+ajIhKDuwg8o6qvdFzevjCo6jwReUhE0lV1pz9zGeMPG+/9p9MRIkZcXBy7du0iLS0NEXE6TtBQVXbt2vWNZqne8FshEPf/zt+A1ap63yHWGQBsV1UVkYm4j1B2dbauMcYckJOTQ0VFBXaq+GBxcXFfd0zzlj+PCI4HLgOWi8gSz2M/A/IAVPURYCZwvYi0APXARWrHeyZEjf3prQCU/O/9juaIBDExMRQWFjodI2z4rRCo6ifAYY/ZVPUvwF/8lcGYQNpYv8TpCMb0iPUsNsaYCGeFwBhjIpwVAmOMiXA2+qgxPpLZ62inIxjTI1YIjPGR1fc85nQEY3rETg0ZY0yEs0JgjI8M/ckshv5kltMxjOk2OzVkjI9sbVrX9UrGBCE7IjDGmAhnhcAYYyKcFQJjjIlwdo3AGB8pjB/jdARjesQKgTE+YqOOmlBlp4aMMSbCWSEwxkcKb/s+hbd93+kYxnSbnRoyxkf2tFY4HcGYHrEjAmOMiXBWCIwxJsJZITDGmAjnt0IgIrki8oGIrBaRlSJySyfriIg8ICKlIrJMRMb5K48x/jY0aQpDk6Y4HcOYbvPnxeIW4DZVLRaRJGCxiMxX1VXt1pkOHOW5TQIe9vw0JuR8/uv/dTqCMT3ityMCVd2qqsWe+/uA1UB2h9XOAZ5Wt4VAiohk+iuTMcaYgwXkGoGIFABjgUUdFmUD5e1+r+DgYoGIzBKRIhEpqqqq8ltOY45E9n9fQPZ/X+B0DGO6ze+FQER6Ay8Dt6rq3o6LO3mKHvSA6mOqOkFVJ2RkZPgjpjFHrLZtF7Vtu5yOYUy3+bUQiEgM7iLwjKq+0skqFUBuu99zgC3+zGSMMeab/NlqSIC/AatV9b5DrDYXuNzTemgyUKOqW/2VyRhjzMH82WroeOAyYLmILPE89jMgD0BVHwHmATOAUqAOuNKPeYwxxnTCb4VAVT+h82sA7ddR4L/8lcGYQBrf9xSnIxjTIzbonDE+8u+77nQ6gjE94tU1AhF5WUTOFBEbksIYY8KMtx/sDwOXAOtF5PcicowfMxkTkjJ+NJ2MH013OoYx3eZVIVDV91T1UmAcsAmYLyKficiVniaixkS8Zq2nWeudjmFMt3l9qkdE0oArgKuBEuDPuAvDfL8kM8YYExBeXSwWkVeAY4DZwNnt2vo/LyJF/gpnjDHG/7xtNfSEqs5r/4CIxKpqo6pO8EMu4yO79jdSXddEQq9oekXbtX4T2lrblP2NLWzf20C/pFjc/VbNkfK2EPwGd+ev9j7HfWrIBBlV5eXiSp7+fBPLKmoAiHYJQwYkcfIx/cjsE+9wwvB0Qv+znI4QtvY2NPPv1TtYUVlDfXMr//f2GvLTEvjehFyuPrGQ2OgopyOGtMMWAhEZgHs00HgRGct/OoglAwl+zmZ6oKaumVueL2HB2iqGZibz4zOGULp9P9v2NbC0vJqHPviK04b158Sj0u3blI+98dPbnY4QllZv3ctLiytobm1jRHYf8tMSGJ2TwvtrdvCHd9Yyd8kWHvr+OAZl9HY6asjq6ojgDNwXiHOA9uMF7cM9XIQJIjV1zXz/b4tYu20fd589jMunFOByCc8uKgPg5CH9+NeSSt5euY2G5lZOG9bfioEJassqqnmhqJzMPvF8b0IuGUmxAFwyKY+rTijk/TXb+fGLy7josYXMuWYyg/tZMeiJw540VtWnVPXbwBWq+u12t+8cYjRR45CmljaueupL1m7bx6OXjeeK4wtxub75IZ8YG83FE/M4tiCVBeuq+KR0p0Npw1PKrVNJuXWq0zHCxvod+3ihqJy8vglcfWLh10WgvZOP6c/z105GFS55fCE79jU4kDT0HbYQiMj3PXcLROS/O94CkM946XfzVrN48x7uu3A03z6m3yHXc4lwzphshmcl887KbWzaWRvAlMZ4p6a+mee/LCe9dyw/mFJw2GsAg/sl8c+rJ7K3oZmbni2hpbUtgEnDQ1fNSBI9P3sDSZ3cTBD49+rtPPnZJq46vpCzRmV1ub5LhAvG5ZCa0Ivnviyjobk1ACmN8U6bKi8UldPSplwyKY/YmK4vBB8zIJnfnTeSRRt389cPvgpAyvBy2GsEqvqo5+cvAxPHdFdtYwt3/msFR/fvzR3TvR/5Iy4miguPzeXhBV/x7qptfGf0QTOEGuOI4s172LizlvPGZtMvKc7r550/LocP1lbx1w9KOXNUpl0v6AZvB527R0SSRSRGRP4tIjvbnTYyDrpv/jq21DTwv+eP7HY/gZzUBKYMSmPRht2U767zU0JjvLe/sYW3VmyjIC2RCfmp3X7+XWcNIy7Gxc9fXY57lHvjDW8/OU73zDd8Fu7pJY8Gfuy3VMYrm3bW8tRnm7h4Yi7j8/v2aBunDe1PUlw0by7fan84R2ha7veYlvs9p2OEtPfX7KCxpZVzxmT1qEVbRlIsP5l2DIs27ubdVdv9kDA8eVsIDgwsNwOYo6q7/ZTHdMMf311LTJSLH516dI+3ERsTxSnH9Kdsdx2rt+7zYbrI89xtN/DcbTc4HSNk7drfyBcbdzGhoC/9k70/JdTRRcfmMigjkXveXmMXjr3kbSF4XUTWABOAf4tIBmDttBy0vKKGN5Zt5eoTC+l3BH80AOPyU0nvHcs7q7bRZkcFPbazpo6dNXaKrafmr95OlEs4+TCt3rwRHeXiJ9OO4auqWl5aXOGjdOHN22Go7wCmABNUtRmoBc453HNE5O8iskNEVhxi+VQRqRGRJZ7bXd0NH8kefH89yXHRzDpp4BFvK8olnDasP1X7GllRWeODdJFp8C9mMPgXM5yOEZKq9jWyvKKG4walkxx35CPbnz6sP6NzU3howVd2VOCF7lxdHApcKCKXAzOB07tY/0lgWhfrfKyqYzy3X3UjS0Rbt30f767azhXHF5Lkgz8agOFZyWT0juXDdVV2rcAE3EfrqoiOEo4fnO6T7YkIN357MGW763hj2daunxDhvG01NBv4I3ACcKzndthRR1X1I8CuJfjBQx+UktAriiuPK/DZNl0ifOvoDLbWNLB2u10rMIFTXddESfkeJhT0pXes76ZRP+WYfgzpn8RfPyilrc2+3ByOt0cEE4DjVfUGVb3Jc7vZB/ufIiJLReQtERl+qJVEZJaIFIlIUVVVlQ92G7q2723gjWVbuXhiHqmJvXy67dG5KfSJj7GhJ0xAff7VLgBO9NHRwAEul3Dd1IGs37Hf3tNd8LYQrAAG+HjfxUC+qo4GHgT+dagVVfUxVZ2gqhMyMjJ8HCO0PLuojFZVLp+S7/NtR7mEyYV92VBVy4691hbA+F9TSxtFm/cwLDOZlATffrEBOHNkFum9Y3n6800+33Y48bYQpAOrROQdEZl74HYkO1bVvaq633N/HhAjIr79ShBmmlraePaLMqYenUF+WmLXT+iB8QV9iXIJCzfu8sv2w9nMwVcwc/AVTscIKcsqqqlvbmXyoDS/bL9XtIuLJ+by7zU7rNPkYXh7Qu5uX+/YM9fBdlVVEZmIuyjZp89hvL1yG1X7Grnch9cGOuodG82o7D4Ul1Vz+rABxHkxzotxe+LGK5yOEFJUlYUbdtEvKZZCP32xAfeQ1Q8t+Ip/LtzMT2cM9dt+Qpm3zUc/BDYBMZ77X+I+tXNIIjIH9yxmQ0SkQkR+KCLXich1nlVmAitEZCnwAHCRWnOVw3r6s03kpyXwraP8e3ps8sA0mlraKCmv9ut+ws3a8p2sLbdz0d4q313HlpoGpgxK8+u8GJl94jljeH+e+7Kc+iYbYLEz3k5efw0wC+gLDMI9a9kjwCmHeo6qXny4barqX4C/eJ00wq2orKFo8x7+58yhB80z4Gu5fRPISY1n4YZdTC7sa5PXeGnSvTMBqL5/gbNBQsTCjbuJjXYxJjfF7/u6fEoB85Zv4/WlW/jesbl+31+o8fYawX8BxwN7AVR1PXBk3f9Mt8z+fDNxMS6+Oz4wb+LJhWlU7Wtkg81XYPxgX0MzyytqGJefGpD5hicV9mVI/ySe/GyT9ZPphLeFoFFVmw78IiLRgL2aAbK/sYW5S7dwzuhs+iT4pgNZV0bm9CGhVxQLN9hlG+N7xZv30KrK5EL/XCTuSES4bEo+q7buZWmF9Z7vyNtC8KGI/Az3JPanAS8Cr/svlmnvreVbqW9u5XvH5gRsnzFRLsbmprBm6z7qGlsCtl8T/lSV4vJq8vsmdDr9pL98Z0wWcTEuXiwqD9g+Q4W3heAOoApYDlwLzAP+x1+hzDe9WlJJfloC4/K6Pz77kRiXn0qrKksqqgO6XxPeKqvrqdrXyNgAv5+T42KYNnwAc5dusVn5OvC21VAb7g5fN6jqTFV93Fr4BEZldT2fb9jFeWOzA37RNrNPPFl94igu2xPQ/YaqHwy/nh8Mv97pGEGvpLyaaJcwMrtPwPc9c3wu+xpabK6CDrqavF5E5G4R2QmsAdaKSJWNFBo4/yqpRBXOHxu400LtjctPZUt1A1tr6h3Zfyj58zUX8udrLnQ6RlBrbVOWlVdzzIAk4nsFvo/KcYPSyE6Jt9NDHXR1RHAr7tZCx6pqmqr2BSYBx4vIj/wdLtKpKq+WVDIhP5W8tARHMozOSSFKhOLNdlTQlUWry1m02j5gDmf99n3UNrUG/LTQAS6XcMG4bD4p3WlfbtrpqhBcDlysqhsPPKCqG4Dve5YZP1peWUPpjv2cP86ZowGAxNhojslMYkl5Na02guNhnfHoZZzx6GVOxwhqxeXVJPSK4uj+SY5lmDk+F1V4pbjSsQzBpqtCEKOqB3WVVNUq/jN9pfGTV4or6RXt4syRmY7mGJ+XSm1TK2u32fDUpufqm1pZs3Wv+yjTz50iDycvLYFJhX15sajc+hR4dFUImnq4zByh5tY25i7dwqlD+wWs78ChHNU/iaTYaBbbRWNzBFZU1tDSpozNS3E6CjPH57BpVx2L7ZQn0HUhGC0iezu57QNGBiJgpPpwbRW7a5scu0jcXpRLGJ2bwrpt1qfA9Fxx+R4yeseSnRLvdBSmj8wkPiaKV0rs9BB0UQhUNUpVkzu5JamqnRryo1dLKumb2ItvDQmO+RfG5qXQqsoym9PY9MDu2iY276pjbF5KUIxd1Ts2mjOG9+eNpVtobLE+Bd2Zs9gESE1dM/NXb+c7o7OIiQqO/6LMPvEMSI6jxE4PHdJN42/jpvG3OR0jKJWUu983gRhgzlvnjcthb0ML76/e4XQUx/luglDjM28u30pTSxvnj8t2Oso3jM1L4a0V29i5r5H0AA4NECp+fdnZTkcISqrKkrJqBqYn+mUWsp46flAa/ZJieaWkkukON8hwWnB83TTf8GpJBYP79Xak5+XhjM5JQfjPtzvzTW99uZa3vlzrdIygU767jl21TY71HTiU6CgX54zJYsHaHeyujey2L1YIgkzZrjq+3LTHkSElupIcH8Pgfr1ZUl5NmzW7O8jFz1zLxc9c63SMoFNcXk1MlDAiK9npKAc5b2wOza3KG8u2OB3FUVYIgsyrJZWIwLljg+u00AFjclPYU9fM5l02/6vpWktrG8srahiWmUxsEE57OiwrmWMGJEV85zIrBEFEVXmlpILJhWlB0cSuM8Oz+tArymUXjY1X1mzbR32zc0NKeOP8cdksKa9mQ9V+p6M4xm+FQET+LiI7RGTFIZaLiDwgIqUiskxExvkrS6goLtvD5l11QXeRuL1e0S6GZyWzYksNza1tTscxQW5JeTVJsdEMyujtdJRDOmdMNi5xH41HKn8eETwJTDvM8unAUZ7bLOBhP2YJCa8UVxIX4wr6Fgxj81JpaG5j9da9TkcxQayusYW12/YxOtfZISW60j85juMHp/NqSSVtETqelt8Kgap+BOw+zCrnAE+r20IgRUSC+xPQjxpbWnl96RbOGD6A3rHB3ap3YEYiyXHRLCmvdjpKUPnZCf/Dz06w+ZoOWFZZQ6sGx5ASXTl/XDYVe+r5ctPhPrLCl5OfONlA+zF7KzyPbe24oojMwn3UQF5eXkDCBdr7q3ewt6HF0ZFGveUSYUxuCp+U7mR/Y0vQF65A+cnMU52OEFRKyvYwIDmOzD7Beb2rvTOGDyCh1wpeLalk0sDAzKMcTJy8WNzZsWKnx2Wq+piqTlDVCRkZwTHkgq+9UlJJv6RYjh8UGm/CMXmptCkss2ksv/b8h0t4/sMlTscIClX7GinfUx8SRwMACb2imTZiAG8u3xqR01g6WQgqgNx2v+cAEdmYd3dtEx+s2cE5Y7KIDpIhJboyIDmOrD5xlJRVOx0laFz76q1c++qtTscICkvK9yC4OyGGivPH5rCvoYX3VkfeNJZOfurMBS73tB6aDNSo6kGnhSLBG8u20NKmIXFaqL0xealUVtezY2+D01FMEGlTpaS8msH9epMcHzpjU04ZlMaA5DhejcA+Bf5sPjoH+BwYIiIVIvJDEblORK7zrDIP2ACUAo8DN/grS7B7ubiSoZnJDM0Mvp6XhzM6pw8ucU9GbswBm3fVUV3XHDKnhQ6IcgnnjM1iwboqdu5vdDpOQPmz1dDFqpqpqjGqmqOqf1PVR1T1Ec9yVdX/UtVBqjpSVYv8lSWYle7Yz9Lyas4P0p7Eh5MUZ0NOmIOVlO2hV7SLYZnBNVaWN84fm0Nrm/L60sg6Sx0aJ6TD2MvFFV9/EwlFY3NTqalvZuPOWqejmCDQ1NLG8soaRmb1oVd06H28DBmQxPCs5IjrXGbt/hzU2qa8UlzB1KMz6JcU53ScHhmamUxstIuSsuqg7j0aCP93+u+cjuC4lVtqaGxpY1x+8A4p0ZXzxmbzmzdXU7pjH4P7JTkdJyBCr2SHkU9Kd7J9byMzx4fWReL2ekW7GJHVhxVbamhqiewhJ66dcRzXzjjO6RiOKi7bQ9/EXuSnJTgdpce+MyYLlxBRA9FZIXDQS4srSEmI4eSh/ZyOckTG5qXQ1GJDTjw67zMenfeZ0zEcs6euiQ1VtYzNS8EVZEOod0e/pDhOOjqDf0XQkBNWCBxSU9/MOyu3cc7oLGKjg2943u4oSE8kJT4m4ies+X/v/oz/9+7PnI7hmJKyahQYlxu6p4UOOG9sNltqGli4cZfTUQLCCoFD3lzmno5y5vjcrlcOci4RRuemsH77fvY1NDsdxzhAVSku28PA9ERSE4NnOsqeOn2Ye8yvSOlTYIXAIS8tLmdI/yRGZIdW34FDGZubggJLrU9BRNq8q47dtU0hfZG4vfheUUwfMYB5y7dS3xT+Q05YIXBA6Y79FJdVc8H44JuOsqf6JceRnRJPSXk1an0KIs5iT9+BEVmh13fgUM4bl01tUytvrwz/AQ+sEDjg2UVlxERJyA0p0ZXx+alsrWmgYk+901FMADU0t7KsoppR2aHZd+BQJhemkZ+WwJxF5V2vHOLC538tRDQ0t/LS4nKmjcgkvXes03F8akxuCr2iXHyxMTLHdH/0vPt59Lz7nY4RcCXl1TS3KpMKQ2PkXG+5XMLFE/P4YtNu1m/f53Qcv7JCEGBvLtvK3oYWLpkYfvMqxMVEMTo3hWWV1RFxXrWjC781hgu/NcbpGAGlqnyxcRfZKfFkpwb/vAPd9d3xOcRECc8sKnM6il9ZIQiwZxZtZmBGIpMH9nU6il9MLOxLc6tGZFPSe156j3tees/pGAFVtruO7XsbmVgYnu/ntN6xTBuRySvFFWH95cYKQQCt3rqX4rJqLpmYFzYXiTvKToknJzWeLzbujriLxr/75Df87pPfOB0joBZt3E1stItROeFzkbijSyflsbehhTeWhe9AdFYIAujZRWX0inaF9JAS3phY0Jcd+xr5clPkHRVEkt21TayorGFsXkrId4o8nEmFfRmUkcizX4Tv6SErBAFS29jCqyWVnDUyk5SE0O9wczijclKIi3HxzKLNTkcxfvTy4gpa2pSJBeF1kbgjEeGSSfmUlFWzckuN03H8wgpBgLxcXMH+xhYunZzvdBS/6xXtYmxuKvOWb2W7zV4WllrblNkLN5PfN4EBfUJz5NzuuGBcNnExLp76bJPTUfzCCkEAtLYpT3y8kXF5KYwPk56XXTluUBotbRq2fziR7p2V2yjbXcfxg9OdjhIQKQm9+N6EXP5VsiUsp2a1QhAA73r+aK45caDTUQImrXcs04YP4J8LN1Pb2OJ0nICYc+mjzLn0Uadj+J2q8uhHG8hPS2BYVngMkeKNq44vpLmtjac+3+R0FJ/zayEQkWkislZESkXkjk6WTxWRGhFZ4rnd5c88Tnn84w3k9U3g9OEDnI4SUNecNJC9DS28UBT+PTMBph87hOnHDnE6ht8Vbd7D0vJqrj6hMKSHm+6ugvREzhg2gH8uLAu7Lzf+nLw+CvgrMB0YBlwsIsM6WfVjVR3juf3KX3mcsnjzborLqrn6xEKiXJHzRwMwLi+VCfmp/O2TjbS0hv+kNXfOfp07Z7/udAy/e+yjDaQmxITFyLnddc1JA6mpb+bFMPty488jgolAqapuUNUm4DngHD/uLyg99tEGUhJiwr7J6KFcc9JAKvbU8/bKbU5H8bsHF9/Lg4vvdTqGX22o2s97q7dz2eR84nuFb5PRQxmfn8r4/FT+9ulGWsNo0hp/FoJsoH3ZrPA81tEUEVkqIm+JyHA/5gm4jTtreXfVdr4/KZ+EXpE5PfSpQ/tTmJ7I4x9tiLgOZuHoiU82EhPl4rIpBU5Hccw1JxZSvruet1aEz6ik/iwEnZ0H6fhJUAzkq+po4EHgX51uSGSWiBSJSFFVVZVvU/rRn99bR2y0ix8cV+B0FMdEuYRZJw1kaUUNC9aGzv+dOVjFnjpeLCpn5vgcMpLCa8DE7jht2AAGZiTy5/fWh81RgT8LQQXQ/iRiDvCNPtqquldV93vuzwNiROSg9miq+piqTlDVCRkZGX6M7Dtrtu3ltaVbuPL4woj+owGYOT6HvL4J/OGdtREzB2w4+vN76xERbjp5sNNRHBXlEm47bQjrd+zntSXhMYOZPwvBl8BRIlIoIr2Ai4C57VcQkQHiGXRHRCZ68oTFJKH3vruO3r2iufakyGkyeigxUS7++7SjWbV1L/PC6HA6kpTu2M/LxRVcPjmfzD7hN8pod00fMYBhmcnc/956mlpCvyGE3wqBqrYANwLvAKuBF1R1pYhcJyLXeVabCawQkaXAA8BFGgYnkpeUVzN/1XZmnTQw7IeT8NbZo7MY0j+J+95dF7YtiN65djbvXDvb6Rh+8af564iPieL6qYOcjhIUXC7hx2cMoWx3XVg0j/ZrPwJVnaeqR6vqIFX9reexR1T1Ec/9v6jqcFUdraqTVfUzf+YJlD++s5a+ib248oRCp6MEjSiXcNvpR7NhZy2vhOmE4JOG5jJpaPg1qVxRWcOby7fywxMHkhZmkykdialDMhifn8qD76+noTm0h6i2nsU+9mnpTj4p3ckNUwfROzYyWwodymnD+jM6N4X731sX8n84nbnl8ee55fHnnY7hc394Zy0pCTFcfaJ9sWlPRLj99CFs39sY8kOpWCHwoebWNn75+kpyUuP5fgQMLtddIsId045hS00Dj3z4ldNxfO6plQ/z1MqHnY7hU/9evZ0P11Vx47cHkxwX43ScoDNlUBrfHpLBg++XhvQYRFYIfGj255tZt30/d541jLiYyOts440pg9I4a1QmDy/4ivLddU7HMYfR0NzKL19fxeB+vSO6CXRX7jp7OE0tbfz+rTVOR+kxKwQ+srWmnvvmr+PEo9I5fVh/p+MEtZ+fORSXCL+Yu9I6mQWxhz4opWx3Hb/8znBiouyj4lAK0xO5+sRCXimp5LPSnU7H6RH73/UBVeXnr66gpa2N3547MmynofSVzD7x3H7GEN5fs4PXloTv9H+hbNWWvTy04CvOH5sdMUNNH4mbTj6K/LQE7nhleUjObWyFwAdeLank/TU7uP30IeSlJTgdJyRccVwB4/JSuPv1lSF9bjUcNbW08ZOXl5KSEMOdZ3U2TqTpKL5XFP97/kjKdtdxzzuhd4rICsERKttVx12vreTYglSuPN5aVXgryiXcM3M0Dc2t3Pbi0rDocbzotpdYdNtLTsc4YvfOX8uKyr385tyRpCZaPxhvHTconR9Myecfn25iwdodTsfpFisER6CxpZWbnytBBO6/aGzEDTN9pAb3681dZw3n4/U7efSjDU7HOWJDctMZkhvap1E+WlfFox9u4OKJeUwbEVnzZ/jCT2cMZUj/JG5/cWlITdNqheAI3D13FUvKq7nnglFkp1i3+564eGIuZ47M5A/vrOHj9aE9KN3Vf3mSq//ypNMxeqxsVx03zSlhSP8k7rJTQj0SFxPFAxePpa6plev/uZjGltC4XmCFoIee/nwTc74o4/qpg5g+MtPpOCFLRLhn5iiO6pfETXNK2FC13+lIPfZS6ZO8VPqk0zF6ZG9DM7NmFwHw2OXjI3KuAV8ZMiCJP8wcTXFZNf/z6oqQaBlnhaAH3l6xlV/MXcmpQ/tx++nhPzWhvyXGRvP45ROIEuHyv39hF48DrKG5lVlPF1G6Yz9/vWQc+WmJTkcKeWeOyuTmU47ixcUV/Gn+OqfjdMkKQTe9v2Y7N89ZwtjcFB68eJxdF/CRvLQE/nHlseyubeLSJxZRta/R6UgRobGllRufLWbhht388bujOeGo0L7GEUx+dOpRXDghlwfeL+XhBcHdk94KQTe8vWIr185ezDGZSfzjiol2+Oxjo3JS+NsPjqViTz0XPfY5W2vqnY4U1uqaWrh29mLeW72D35w7gnPHdjaBoOkpEeG3543gnDFZ/N/ba/jze+uD9jSRFQIvqCpPfLyB658pZkR2H2b/cBJ9EmzcFX+YMiiNp66ayI69jZz7109ZUVnjdKSwtGNvAxc9tpCP1lXx+/NH2thYfhId5eK+743hgnE5/Om9ddzx8vKgnL/ACkEX9je2cMtzS/jNm6s5Y9gA5lwzmT7xVgT8aWJhX168fgpRIlzw8Gc890VZ0H6Taq/0l/Mo/eU8p2N06bOvdjLjgU9Yv30/j18+gYsm5jkdKaxFuYQ/fncUN588mOeLyrnwsc+prA6uo10rBIfxaelOpt3/EW8s28KPzxjCQ5eOs8HkAuSYAcnMvekEji3oyx2vLGfW7MVB3y47vU8C6X2Ct2d5XVMLv3p9FZc+sYjk+Gj+9V/Hc8pQGxcrEESE/z7d/Rmyfvt+pv3pI57/sixoOlLagPmd2LyrlnveXsuby7dSmJ7I89dO4diCvk7HijjpvWN56qqJ/O2TDdz77jq+/ccF/PCEQq45aWBQDol80b0PAfDcbTc4nOSbWtuUuUsruefttWytaeCyyfncMf0YEm2+jICbMTKT4VnJ/OSlZfy/l5fzz4Vl/PiMIZx4VLqjY5TZO8FDVSku28MzC8t4bekWekW5+NGpR3PttwbaUYCDolzCrJMGcfqwAfzh3bU8+H4pT3++matPKOS7E3IZ0CfO6Yhfe7v8Bc+94CgE+xtbeOHLcv7x2UbKd9czMrsPD1w81r7UOCw/LZE510zm1ZJK7pu/jsv//gWTB/bl6hMGMnVIBtEOjPQa0YVAVSndsZ/3Vu/gpcXlfFVVS0KvKK44roBrTxpIv+Tg+ZCJdAXpifz1knFc/60a7n13LffOX8ef3lvHt47O4LxxOZw4ON3GxcHdJ+DDdVW8s2Ib81dtZ19jCxPyU/n5jGGcPqw/LmvuHBRcLuGC8TmcNTqTOYvK+OuCr7j66SIykmI5f1w204YPYFROSsCap/u1EIjINODPQBTwhKr+vsNy8SyfAdQBV6hqsb/y1Da2sGbbXlZt2cvSiho+Wb+TbZ7zzhPyU7nngkHMGJVpU0wGsRHZffjHlRPZtLOWlxZX8NLiCj6Y4x7vaXhWMscNSmd4VjLDs5IpTO8d1v082tqUyup61m3fR3HZHoo3V1NSvoeG5jb6xMdwxogBXDopj7F5qU5HNYcQGx3FFccXcunkfD5Ys4MXiip44uONPPrhBpLjojl+cDrj8lIZlpXMsMxkv33Z8dsnnohEAX8FTgMqgC9FZK6qrmq32nTgKM9tEvCw56fPzV26hVueK+FA45PUhBiOG5TOCUelc+JR6eSkBu9FPnOwgvREbj9jCD867WiWVlTzyfqdfLJ+J//4dCPNre7/5JgoYUCfODL7xJPVJ47MlHjSEnuRHBdDUlw0yfEx9I6NJjbGRUyUi15R7p8xUUJMtPv3KJcguC/2uX/S7XO5qkqbus/Vt+mBm/v3jssamlupa3Lf6ptaqWtqoa6pld21TVTtb2Tnvkaq9jeyraaBjTtrafQ0RYxyCcMyk7no2DxOHdqfSQP72mQyISQmysXpwwdw+vAB7K5tcs99vt49//lbK7Z9vd613xrIT6cP9fn+/fnVdyJQqqobAETkOeAcoH0hOAd4Wt1tAxeKSIqIZKrqVl+HGZndhx+dejTDMpMZlpVMZp84m0AmDES5hHF5qYzLS+XmU46iqaWNr6r2s3rrXtZt38/Wmnq2VjdQtHkP25ZtpcWHrTRE+EaR2NvQjABH/89boNDq+dD3VcvXaJeQkRRLRlIsOanxnHhUOgMzejO4X29GZPWxDo5hom9iL84encXZo7MA2F3bxOqt7jMZw7OS/bJPfxaCbKC83e8VHPxtv7N1soFvFAIRmQXM8vy6X0TW+jZqt6QDoTYfnSOZLz2yp4fi64xC+npm+C23nwYqCLnX+tIQzIzzmQ/Za9CfhaCzr9sdvxt5sw6q+hjwmC9CHSkRKVLVCU7n6A7LHDihmNsyB0YwZ/bnScQKILfd7zlAxwlqvVnHGGOMH/mzEHwJHCUihSLSC7gImNthnbnA5eI2Gajxx/UBY4wxh+a3U0Oq2iIiNwLv4G4++ndVXSki13mWPwLMw910tBR389Er/ZXHh4LiFFU3WebACcXcljkwgjazhMJgXsYYY/zHGhobY0yEs0JgjDERzgpBJ0Skr4jMF5H1np8H9dEXkVwR+UBEVovIShG5pd2yu0WkUkSWeG4z/Jh1moisFZFSEbmjk+UiIg94li8TkXHePtfBzJd6si4Tkc9EZHS7ZZtEZLnndS0KosxTRaSm3f/5Xd4+18HMP26Xd4WItIpIX88yp17nv4vIDhFZcYjlwfh+7ipz0L2fD6KqdutwA+4B7vDcvwP4v07WyQTGee4nAeuAYZ7f7wZuD0DOKNx9jAYCvYClBzK0W2cG8BbuPhuTgUXePtfBzMcBqZ770w9k9vy+CUgP8PvBm8xTgTd68lynMndY/2zgfSdfZ89+TwLGASsOsTyo3s9eZg6q93NnNzsi6Nw5wFOe+08B53ZcQVW3qmeAPFXdB6zG3Ss6kL4exkNVm4ADw3i09/UwHqq6EEgRkUwvn+tIZlX9TFX3eH5diLt/iZOO5LUK2te5g4uBOQHIdViq+hGw+zCrBNv7ucvMQfh+PogVgs71V09/Bs/PfodbWUQKgLHAonYP3+g5FPx7Z6eWfORQQ3R4s443z/WH7u73h7i/AR6gwLsistgz9EggeJt5iogsFZG3RGR4N5/ra17vV0QSgGnAy+0eduJ19kawvZ+7KxjezweJ2PGWReQ9YEAni37eze30xv0HdKuq7vU8/DDwa9z/yb8G7gWu6nnaQ+++k8e8HcbDq+E9/MDr/YrIt3H/4ZzQ7uHjVXWLiPQD5ovIGs83Mn/yJnMxkK+q+z3XhP6Fe1TdoH+dcZ8W+lRV23+rdeJ19kawvZ+9FkTv54NEbCFQ1VMPtUxEtotnFFTPYeeOQ6wXg7sIPKOqr7Tb9vZ26zwOvOG75N9wJMN49PLiuf7g1bAiIjIKeAKYrqq7Djyuqls8P3eIyKu4Twn4+w+ny8ztvgSgqvNE5CERSffmuX7Snf1eRIfTQg69zt4ItvezV4Ls/Xwwpy9SBOMN+APfvFh8TyfrCPA0cH8nyzLb3f8R8JyfckYDG4BC/nOBbHiHdc7kmxfXvvD2uQ5mzsPd2/y4Do8nAknt7n8GTAuSzAP4TwfNiUCZ5zUP2tfZs14f3Oe3E51+ndvtv4BDX3gNqvezl5mD6v3caUYndhrsNyAN+Dew3vOzr+fxLGCe5/4JuA89lwFLPLcZnmWzgeWeZXNpVxj8kHUG7hZLXwE/9zx2HXCd577gniDoK0+mCYd7boBe364yPwHsafe6FnkeH+j5A18KrAyyzDd6Mi3FfUHwuMM9Nxgye36/gg5fVBx+nefgHoa+Gfe3/x+GwPu5q8xB937ueLMhJowxJsJZqyFjjIlwVgiMMSbCWSEwxpgIZ4XAGGMinBUCY4yJcFYIjDlCnhExPxGR6e0e+56IvO1kLmO8Zc1HjfEBERkBvIh7zKko3O3Fp6nqV07mMsYbVgiM8RERuQeoxd1LdJ+q/trhSMZ4xQqBMT4iIom4B59rwt3jtdHhSMZ4JWIHnTPG11S1VkSeB/ZbETChxC4WG+NbbZ6bMSHDCoExxkQ4KwTGGBPh7GKxMcZEODsiMMaYCGeFwBhjIpwVAmOMiXBWCIwxJsJZITDGmAhnhcAYYyKcFQJjjIlw/x9A3KtUjEMcwwAAAABJRU5ErkJggg==\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from tdc import utils\n",
"utils.label_dist(data_df.Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For datasets with binding affinity with unit nM, we also provide label unit conversion since the standard ML practice is to transform it into unit p for easier regression:"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n",
"To log space...\n"
]
}
],
"source": [
"from tdc.multi_pred import DTI\n",
"data = DTI(name = 'DAVIS')\n",
"data.convert_to_log()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For some datasets with relational labels, you can get the meaning of the labels, using:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
},
{
"data": {
"text/plain": [
"'#Drug1 may increase the photosensitizing activities of #Drug2.'"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tdc.multi_pred import DDI\n",
"from tdc.utils import get_label_map\n",
"data = DDI(name = 'DrugBank')\n",
"get_label_map(name = 'DrugBank', task = 'DDI')[1]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For interactions between two entities such as drug-target interactions, some users may want to formulate it as a network prediction task instead of chemical modeling. We also provide a function to transform the pair tabular data to graph edge list. For example, to construct a graph for DTI DAVIS dataset with threshold 30, split set to true, fraction 70%/10%/20%, benchmark seed, and descending order:"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
}
],
"source": [
"from tdc.multi_pred import DTI\n",
"data = DTI(name = 'DAVIS')"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The dataset label consists of affinity scores. Binarization using threshold 30 is conducted to construct the positive edges in the network. Adjust the threshold by to_graph(threshold = X)\n"
]
},
{
"data": {
"text/plain": [
"dict_keys(['edge_list', 'neg_edges', 'split'])"
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph = data.to_graph(threshold = 30, \n",
" format = 'edge_list', \n",
" split = True, \n",
" frac = [0.7, 0.1, 0.2], \n",
" seed = 42, \n",
" order = 'descending')\n",
"graph.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"TDC also provides transformation to popular Graph ML library such as DGL/Pytorch Geometric, although these two packages are required in order to be used:"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The dataset label consists of affinity scores. Binarization using threshold 30 is conducted to construct the positive edges in the network. Adjust the threshold by to_graph(threshold = X)\n",
"Using backend: pytorch\n"
]
},
{
"data": {
"text/plain": [
"dict_keys(['dgl_graph', 'index_to_entities', 'split'])"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph = data.to_graph(threshold = 30, format = 'dgl', split = True, frac = [0.7, 0.1, 0.2], seed = 42, order = 'descending')\n",
"graph.keys()"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"The dataset label consists of affinity scores. Binarization using threshold 30 is conducted to construct the positive edges in the network. Adjust the threshold by to_graph(threshold = X)\n"
]
},
{
"data": {
"text/plain": [
"dict_keys(['pyg_graph', 'index_to_entities', 'split'])"
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"graph = data.to_graph(threshold = 30, format = 'pyg', split = True, frac = [0.7, 0.1, 0.2], seed = 42, order = 'descending')\n",
"graph.keys()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For some interaction datasets, there are only positive samples provided. In order to do binary classification, we have to construct negative samples. One popular way is to consider the unobserved as negative samples. This is usually the case for biomedical data as a hit has way smaller chance than being negative. For some dataset with pairwise experimental assay (e.g. HuRI), sample from the unobserved pair are also true negatives. We use HuRI as an example:"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"Found local copy...\n",
"Loading...\n",
"Done!\n"
]
}
],
"source": [
"from tdc.multi_pred import PPI\n",
"data = PPI(name = 'HuRI')\n",
"data.neg_sample(frac = 1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also adjust the negative sample fraction to reflect the realistic ratios between positive and negative by simply tuning the `frac` parameter.\n",
"\n",
"Next, we also provide two functions for database queries to retrieve features for drugs and protein/gene. For drug, the default feature for many ML model is the SMILES string, and for protein, it is the amino acid sequence. You can get the SMILES string from a PubChem CID with: "
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'CCOC1=CC(=C(C=C1C=CC(=O)O)Br)OCC'"
]
},
"execution_count": 17,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tdc.utils import cid2smiles\n",
"cid2smiles(2248631)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can get the amino acid sequence from the Uniprot ID via:"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"'MKTLLLTLVVVTIVCLDLGYTLKCHNTQLPFIYNTCPEGKNLCFKATLKFPLKFPVKRGCAATCPRSSSLVKVVCCKTDKCN'"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"from tdc.utils import uniprot2seq\n",
"uniprot2seq('P49122')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluator\n",
"\n",
"In addition to these data processing helpers, TDC also includes evaluators for the various therapeutics tasks. We include two kinds of Evaluators here. First is to evaluate the prediction accuracy. The second is to evaluate the quality of the molecules generated by distribution learning. All of the metrics are included in the website page [here](https://zitniklab.hms.harvard.edu/TDC/functions/data_evaluation/). "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"First, we show the examples that evaluate the prediction accuracy. It includes \n",
"\n",
"* `Precision`\n",
"\n",
"* `Recall`\n",
"\n",
"* `F1` score is the harmonic mean of the precision and recall.\n",
"\n",
"* `ROC-AUC` (Area Under the Receiver Operating Characteristic Curve). ROC curve summarize the trade-off between the true positive rate and false positive rate for a predictive model using different probability thresholds.\n",
"\n",
"* `PR-AUC` (Precision-Recall Area Under Curve). Precision-Recall curves summarize the trade-off between the true positive rate and the positive predictive value for a predictive model using different probability thresholds.\n",
"\n",
"\n",
"* `Accuracy` computes subset accuracy: the set of labels predicted for a sample must exactly match the corresponding set of labels. \n",
"\n",
"* `MSE` (Mean Squared Error) of an estimator measures the average of the squares of the errors. \n",
"\n",
"* `MAE` (Mean Absolute Error) is a measure of absolute errors between paired observations. \n",
"\n",
"* `r2` is regression score function.\n",
"\n",
"* `macro-f1` assess the quality of problems with multiple binary labels or multiple classes. Macro F1-score will give the same importance to each label/class. It will be low for models that only perform well on the common classes while performing poorly on the rare classes.\n",
"\n",
"* `micro-f1` assess the quality of multi-label binary problems. It measures the F1-score of the aggregated contributions of all classes. Micro-averaging will put more emphasis on the common labels in the data set since it gives each sample the same importance.\n",
"\n",
"\n",
"\n",
"For instance, given a list/array of binary true values `y_true` and a list/array of real-valued predicted scores `y_pred`, you want to generate the PR-AUC:"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0.7333333333333334"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"y_true = [0, 1, 1, 0, 1]\n",
"y_pred = [0.2, 0.2, 0.2, 0.8, 0.89]\n",
"\n",
"from tdc import Evaluator\n",
"evaluator = Evaluator(name = 'ROC-AUC')\n",
"evaluator(y_true, y_pred)\n",
"\n",
"evaluator2 = Evaluator(name = 'PR-AUC')\n",
"evaluator2(y_true, y_pred)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Then, we show the second one, the evaluators that evaluate the quality of the molecules generated by distribution learning.\n",
"\n",
"It includes \n",
"\n",
"* `Validity` is the percentage of the valid molecules in the whole generated molecules. Some SMILES can not be transformed into realistic molecules due to syntactic error. \n",
"\n",
"* `Novelty` is the percentage of the canonical SMILES that are not in the training set of molecule generation. \n",
"\n",
"* `Diversity` \n",
"\n",
"* `Uniqueness` is the percentage of different canonical SMILES in the whole generated molecules. \n",
"\n",
"* `fcd_distance` (Frechet ChemNet Distance) is a measure of how close distributions of generated molecules are to the distribution of molecules in the training set. \n",
"\n",
"* `kl_divergence` KL divergence measures the distance between two probability distribution. Then the probability distributions of a series of physicochemical descriptors for the training set and generated molecules are compared. \n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Assume we have a batch of molecules:"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [],
"source": [
"smiles_lst = ['CC(C)(C)[C@H]1CCc2c(sc(NC(=O)COc3ccc(Cl)cc3)c2C(N)=O)C1', \\\n",
" 'C[C@@H]1CCc2c(sc(NC(=O)c3ccco3)c2C(N)=O)C1', \\\n",
" 'CCNC(=O)c1ccc(NC(=O)N2CC[C@H](C)[C@H](O)C2)c(C)c1', \\\n",
" 'C[C@@H]1CCN(C(=O)CCCc2ccccc2)C[C@@H]1O']"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Validity function will screen all the molecules and check the fraction of chemical validity of the set:"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.0\n"
]
}
],
"source": [
"validity = Evaluator(name = 'validity')\n",
"print(validity(smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Novelty measures the fraction of molecules that are different from training set. Low novelty means the models is overfitted. Here we use the same list of molecules to illustrate. Thus, as they are the same, the novelty should be 0."
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [],
"source": [
"novelty = Evaluator(name = 'novelty')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.0\n"
]
}
],
"source": [
"print(novelty(generated_smiles_lst = smiles_lst, training_smiles_lst = smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Diversity measures the internal distance among the generated molecules. Low diversity means all molecules in the set are pretty similar, which means that the generation does not generate meaningful molecules, i.e., mode collapse."
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"diversity = Evaluator(name = 'diversity')"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.2454994457536287\n"
]
}
],
"source": [
"print(diversity(smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Uniqueness is another metric that can detect mode collapse. This function measures the the fraction of unique molecules in a generated moelcule pool."
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [],
"source": [
"unique = Evaluator(name = 'uniqueness')"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.3333333333333333\n"
]
}
],
"source": [
"print(unique(smiles_lst + smiles_lst + smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"KL-divergence directly measures deviation between two distributions. "
]
},
{
"cell_type": "code",
"execution_count": 28,
"metadata": {},
"outputs": [],
"source": [
"kl_divergence = Evaluator(name = 'KL_Divergence')"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"1.0\n"
]
}
],
"source": [
"print(kl_divergence(generated_smiles_lst = smiles_lst, training_smiles_lst = smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"FCD distance (Fréchet ChemNet Distance) is a measure of distance between two distribution based on activations of the penultimate layer of a neural network, ChemNet that trained to predict biological activities of drugs. Note FCD correlates with other metrics. For example, if the generated structures are not diverse enough or the model produces too many duplicates, FCD will decrease, since the variance is smaller. Values of this metric are non-negative, lower is better."
]
},
{
"cell_type": "code",
"execution_count": 30,
"metadata": {},
"outputs": [],
"source": [
"fcd = Evaluator(name = 'fcd_distance')"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:From /Users/futianfan/anaconda3/envs/tdc/lib/python3.7/site-packages/fcd/FCD.py:196: Model.predict_generator (from tensorflow.python.keras.engine.training) is deprecated and will be removed in a future version.\n",
"Instructions for updating:\n",
"Please use Model.predict, which supports generators.\n",
"1.0000051849132214\n"
]
}
],
"source": [
"print(fcd(generated_smiles_lst = smiles_lst, training_smiles_lst = smiles_lst))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's most of the currently supported data functions! For molecular generation oracles, we provide a separate tutorials (TDC 105) to do that.\n",
"\n",
"**Please [contact us](mailto:kexinhuang@hsph.harvard.edu) if you want to contribute to include useful data functions!**\n",
"\n",
"In the next set of tutorials, we are going to cover \n",
"\n",
"* [TDC 103 Part 1: Datasets - Small Molecules](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_103.1_Datasets_Small_Molecules.ipynb)\n",
"\n",
"* [TDC 103 Part 2: Datasets - Biologics](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_103.2_Datasets_Biologics.ipynb)\n",
"\n",
"* [TDC 104 ML Model Examples with DeepPurpose](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_104_ML_Model_DeepPurpose.ipynb)\n",
"\n",
"* [TDC 105 Molecular Oracles](https://github.com/mims-harvard/TDC/blob/master/tutorials/TDC_105_Oracles.ipynb)\n",
"\n",
"Check them out!"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}