{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Binary extension MLP for ordinal regression and deep learning -- cement strength dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This tutorial explains how to train a deep neural network (here: multilayer perceptron) with the binary extension method by Niu at al. 2016 for ordinal regression. \n",
    "\n",
    "**Paper reference:**\n",
    "\n",
    "- Niu, Zhenxing, Mo Zhou, Le Wang, Xinbo Gao, and Gang Hua. \"[Ordinal regression with multiple output cnn for age estimation](https://openaccess.thecvf.com/content_cvpr_2016/papers/Niu_Ordinal_Regression_With_CVPR_2016_paper.pdf).\" In Proceedings of the IEEE conference on computer vision and pattern recognition."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0 -- Obtaining and preparing the cement_strength dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be using the cement_strength dataset from [https://github.com/gagolews/ordinal_regression_data/blob/master/cement_strength.csv](https://github.com/gagolews/ordinal_regression_data/blob/master/cement_strength.csv).\n",
    "\n",
    "First, we are going to download and prepare the and save it as CSV files locally. This is a general procedure that is not specific to CORN.\n",
    "\n",
    "This dataset has 5 ordinal labels (1, 2, 3, 4, and 5). Note that the method requires labels to be starting at 0, which is why we subtract \"1\" from the label column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of features: 8\n",
      "Number of examples: 998\n",
      "Labels: [0 1 2 3 4]\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "data_df = pd.read_csv(\"https://raw.githubusercontent.com/gagolews/ordinal_regression_data/master/cement_strength.csv\")\n",
    "\n",
    "data_df[\"response\"] = data_df[\"response\"]-1 # labels should start at 0\n",
    "\n",
    "data_labels = data_df[\"response\"]\n",
    "data_features = data_df.loc[:, [\"V1\", \"V2\", \"V3\", \"V4\", \"V5\", \"V6\", \"V7\", \"V8\"]]\n",
    "\n",
    "print('Number of features:', data_features.shape[1])\n",
    "print('Number of examples:', data_features.shape[0])\n",
    "print('Labels:', np.unique(data_labels.values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Split into training and test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    data_features.values,\n",
    "    data_labels.values,\n",
    "    test_size=0.2,\n",
    "    random_state=1,\n",
    "    stratify=data_labels.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Standardize features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "\n",
    "sc = StandardScaler()\n",
    "X_train_std = sc.fit_transform(X_train)\n",
    "X_test_std = sc.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 -- Setting up the dataset and dataloader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we set up the data set and data loaders using PyTorch utilities. This is a general procedure that is not specific to the method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training on cuda:0\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "\n",
    "##########################\n",
    "### SETTINGS\n",
    "##########################\n",
    "\n",
    "# Hyperparameters\n",
    "random_seed = 1\n",
    "learning_rate = 0.05\n",
    "num_epochs = 50\n",
    "batch_size = 128\n",
    "\n",
    "# Architecture\n",
    "NUM_CLASSES = 10\n",
    "\n",
    "# Other\n",
    "DEVICE = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
    "print('Training on', DEVICE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import Dataset\n",
    "\n",
    "\n",
    "class MyDataset(Dataset):\n",
    "\n",
    "    def __init__(self, feature_array, label_array, dtype=np.float32):\n",
    "    \n",
    "        self.features = feature_array.astype(np.float32)\n",
    "        self.labels = label_array\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        inputs = self.features[index]\n",
    "        label = self.labels[index]\n",
    "        return inputs, label\n",
    "\n",
    "    def __len__(self):\n",
    "        return self.labels.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Input batch dimensions: torch.Size([128, 8])\n",
      "Input label dimensions: torch.Size([128])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "\n",
    "# Note transforms.ToTensor() scales input images\n",
    "# to 0-1 range\n",
    "train_dataset = MyDataset(X_train_std, y_train)\n",
    "test_dataset = MyDataset(X_test_std, y_test)\n",
    "\n",
    "\n",
    "train_loader = DataLoader(dataset=train_dataset,\n",
    "                          batch_size=batch_size,\n",
    "                          shuffle=True, # want to shuffle the dataset\n",
    "                          num_workers=0) # number processes/CPUs to use\n",
    "\n",
    "test_loader = DataLoader(dataset=test_dataset,\n",
    "                         batch_size=batch_size,\n",
    "                         shuffle=False,\n",
    "                         num_workers=0)\n",
    "\n",
    "# Checking the dataset\n",
    "for inputs, labels in train_loader:  \n",
    "    print('Input batch dimensions:', inputs.shape)\n",
    "    print('Input label dimensions:', labels.shape)\n",
    "    break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 - Equipping MLP with a modified output layer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we are using the output layer as it was originally implemented by Niu et al. This is actually very similar to a standard output layer as we can see below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(torch.nn.Module):\n",
    "\n",
    "    def __init__(self, in_features, num_classes, num_hidden_1=300, num_hidden_2=300):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.num_classes = num_classes\n",
    "        self.my_network = torch.nn.Sequential(\n",
    "            \n",
    "            # 1st hidden layer\n",
    "            torch.nn.Linear(in_features, num_hidden_1, bias=False),\n",
    "            torch.nn.LeakyReLU(),\n",
    "            torch.nn.Dropout(0.2),\n",
    "            torch.nn.BatchNorm1d(num_hidden_1),\n",
    "            \n",
    "            # 2nd hidden layer\n",
    "            torch.nn.Linear(num_hidden_1, num_hidden_2, bias=False),\n",
    "            torch.nn.LeakyReLU(),\n",
    "            torch.nn.Dropout(0.2),\n",
    "            torch.nn.BatchNorm1d(num_hidden_2),\n",
    "        )\n",
    "        \n",
    "        ### Specify Niu et al. layer\n",
    "        self.fc = torch.nn.Linear(num_hidden_2, (num_classes-1)*2)\n",
    "        ###--------------------------------------------------------------------###\n",
    "        \n",
    "    def forward(self, x):\n",
    "        x = self.my_network(x) \n",
    "        logits = self.fc(x)\n",
    "        \n",
    "        ### Reshape logits for Niu et al. loss\n",
    "        logits = logits.view(-1, (self.num_classes-1), 2)\n",
    "        ###--------------------------------------------------------------------###        \n",
    "        return logits\n",
    "    \n",
    "    \n",
    "    \n",
    "torch.manual_seed(random_seed)\n",
    "model = MLP(in_features=8, num_classes=NUM_CLASSES)\n",
    "model.to(DEVICE)\n",
    "\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3 - Using the extended binary loss for model training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "During training, all you need to do is to \n",
    "\n",
    "\n",
    "\n",
    "\n",
    "1) convert the integer class labels into the extended binary label format using the `levels_from_labelbatch` provided via `coral_pytorch`:\n",
    "\n",
    "```python\n",
    "        levels = levels_from_labelbatch(class_labels, \n",
    "                                        num_classes=NUM_CLASSES)\n",
    "```\n",
    "\n",
    "2) Apply the extended binary loss:\n",
    "\n",
    "```python\n",
    "def niu_et_al_loss(logits, levels):\n",
    "    val = (-torch.sum((F.log_softmax(logits, dim=2)[:, :, 1]*levels\n",
    "                      + F.log_softmax(logits, dim=2)[:, :, 0]*(1-levels)), dim=1))\n",
    "    return torch.mean(val)\n",
    "\n",
    "loss = niu_et_al_loss(logits, levels)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "#!pip install coral-pytorch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "tags": []
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Epoch: 001/050 | Batch 000/007 | Loss: 6.8903\n",
      "Epoch: 002/050 | Batch 000/007 | Loss: 2.6211\n",
      "Epoch: 003/050 | Batch 000/007 | Loss: 2.8005\n",
      "Epoch: 004/050 | Batch 000/007 | Loss: 1.1442\n",
      "Epoch: 005/050 | Batch 000/007 | Loss: 0.7746\n",
      "Epoch: 006/050 | Batch 000/007 | Loss: 1.3391\n",
      "Epoch: 007/050 | Batch 000/007 | Loss: 0.9629\n",
      "Epoch: 008/050 | Batch 000/007 | Loss: 0.8995\n",
      "Epoch: 009/050 | Batch 000/007 | Loss: 1.2059\n",
      "Epoch: 010/050 | Batch 000/007 | Loss: 0.9444\n",
      "Epoch: 011/050 | Batch 000/007 | Loss: 0.9605\n",
      "Epoch: 012/050 | Batch 000/007 | Loss: 0.8075\n",
      "Epoch: 013/050 | Batch 000/007 | Loss: 0.7014\n",
      "Epoch: 014/050 | Batch 000/007 | Loss: 0.8392\n",
      "Epoch: 015/050 | Batch 000/007 | Loss: 0.6986\n",
      "Epoch: 016/050 | Batch 000/007 | Loss: 0.5953\n",
      "Epoch: 017/050 | Batch 000/007 | Loss: 0.5665\n",
      "Epoch: 018/050 | Batch 000/007 | Loss: 0.8357\n",
      "Epoch: 019/050 | Batch 000/007 | Loss: 0.6777\n",
      "Epoch: 020/050 | Batch 000/007 | Loss: 0.6479\n",
      "Epoch: 021/050 | Batch 000/007 | Loss: 0.7408\n",
      "Epoch: 022/050 | Batch 000/007 | Loss: 0.6845\n",
      "Epoch: 023/050 | Batch 000/007 | Loss: 0.6907\n",
      "Epoch: 024/050 | Batch 000/007 | Loss: 0.8215\n",
      "Epoch: 025/050 | Batch 000/007 | Loss: 0.6827\n",
      "Epoch: 026/050 | Batch 000/007 | Loss: 0.6347\n",
      "Epoch: 027/050 | Batch 000/007 | Loss: 0.7414\n",
      "Epoch: 028/050 | Batch 000/007 | Loss: 0.5251\n",
      "Epoch: 029/050 | Batch 000/007 | Loss: 0.6075\n",
      "Epoch: 030/050 | Batch 000/007 | Loss: 0.4576\n",
      "Epoch: 031/050 | Batch 000/007 | Loss: 0.5426\n",
      "Epoch: 032/050 | Batch 000/007 | Loss: 0.8073\n",
      "Epoch: 033/050 | Batch 000/007 | Loss: 0.7842\n",
      "Epoch: 034/050 | Batch 000/007 | Loss: 0.7738\n",
      "Epoch: 035/050 | Batch 000/007 | Loss: 0.8197\n",
      "Epoch: 036/050 | Batch 000/007 | Loss: 0.8191\n",
      "Epoch: 037/050 | Batch 000/007 | Loss: 0.7680\n",
      "Epoch: 038/050 | Batch 000/007 | Loss: 0.5169\n",
      "Epoch: 039/050 | Batch 000/007 | Loss: 0.7163\n",
      "Epoch: 040/050 | Batch 000/007 | Loss: 0.6532\n",
      "Epoch: 041/050 | Batch 000/007 | Loss: 0.5271\n",
      "Epoch: 042/050 | Batch 000/007 | Loss: 0.4311\n",
      "Epoch: 043/050 | Batch 000/007 | Loss: 0.7078\n",
      "Epoch: 044/050 | Batch 000/007 | Loss: 0.5553\n",
      "Epoch: 045/050 | Batch 000/007 | Loss: 0.5620\n",
      "Epoch: 046/050 | Batch 000/007 | Loss: 0.5877\n",
      "Epoch: 047/050 | Batch 000/007 | Loss: 0.5003\n",
      "Epoch: 048/050 | Batch 000/007 | Loss: 0.5545\n",
      "Epoch: 049/050 | Batch 000/007 | Loss: 0.4634\n",
      "Epoch: 050/050 | Batch 000/007 | Loss: 0.4013\n"
     ]
    }
   ],
   "source": [
    "import torch.nn.functional as F\n",
    "from coral_pytorch.dataset import levels_from_labelbatch\n",
    "\n",
    "\n",
    "def niu_et_al_loss(logits, levels):\n",
    "    val = (-torch.sum((F.log_softmax(logits, dim=2)[:, :, 1]*levels\n",
    "                      + F.log_softmax(logits, dim=2)[:, :, 0]*(1-levels)), dim=1))\n",
    "    return torch.mean(val)\n",
    "\n",
    "\n",
    "for epoch in range(num_epochs):\n",
    "    \n",
    "    model = model.train()\n",
    "    for batch_idx, (features, class_labels) in enumerate(train_loader):\n",
    "\n",
    "        ##### Convert class labels for extended binary loss\n",
    "        levels = levels_from_labelbatch(class_labels, \n",
    "                                        num_classes=model.num_classes)\n",
    "        ###--------------------------------------------------------------------###\n",
    "\n",
    "        features = features.to(DEVICE)\n",
    "        levels = levels.to(DEVICE)\n",
    "        logits = model(features)\n",
    "        \n",
    "        #### Niu et al. loss loss \n",
    "        loss = niu_et_al_loss(logits, levels)\n",
    "        ###--------------------------------------------------------------------###   \n",
    "        \n",
    "        \n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "        \n",
    "        ### LOGGING\n",
    "        if not batch_idx % 200:\n",
    "            print ('Epoch: %03d/%03d | Batch %03d/%03d | Loss: %.4f' \n",
    "                   %(epoch+1, num_epochs, batch_idx, \n",
    "                     len(train_loader), loss))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "## 4 -- Evaluate model\n",
    "\n",
    "Finally, after model training, we can evaluate the performance of the model. For example, via the mean absolute error and mean squared error measures.\n",
    "\n",
    "For this, we are going to use the `niu_logits_to_label` utility function below.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "def niu_logits_to_labels(logits):\n",
    "    probas = F.softmax(logits, dim=2)[:, :, 1]\n",
    "    predict_levels = probas > 0.5\n",
    "    predicted_labels = torch.sum(predict_levels, dim=1)\n",
    "    return predicted_labels\n",
    "\n",
    "\n",
    "def compute_mae_and_mse(model, data_loader, device):\n",
    "\n",
    "    with torch.no_grad():\n",
    "    \n",
    "        mae, mse, acc, num_examples = 0., 0., 0., 0\n",
    "\n",
    "        for i, (features, targets) in enumerate(data_loader):\n",
    "\n",
    "            features = features.to(device)\n",
    "            targets = targets.float().to(device)\n",
    "\n",
    "            logits = model(features)\n",
    "            predicted_labels = niu_logits_to_labels(logits)\n",
    "\n",
    "            num_examples += targets.size(0)\n",
    "            mae += torch.sum(torch.abs(predicted_labels - targets))\n",
    "            mse += torch.sum((predicted_labels - targets)**2)\n",
    "\n",
    "        mae = mae / num_examples\n",
    "        mse = mse / num_examples\n",
    "        return mae, mse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_mae, train_mse = compute_mae_and_mse(model, train_loader, DEVICE)\n",
    "test_mae, test_mse = compute_mae_and_mse(model, test_loader, DEVICE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean absolute error (train/test): 0.29 | 0.31\n",
      "Mean squared error (train/test): 0.32 | 0.36\n"
     ]
    }
   ],
   "source": [
    "print(f'Mean absolute error (train/test): {train_mae:.2f} | {test_mae:.2f}')\n",
    "print(f'Mean squared error (train/test): {train_mse:.2f} | {test_mse:.2f}')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}