{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Squared-error reformulation for ordinal regression and deep learning -- cement strength dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Implementation of a method for ordinal regression by Beckham and Pal 2016.\n",
    "\n",
    "**Paper reference:**\n",
    "\n",
    "- Beckham, Christopher, and Christopher Pal. \"[A simple squared-error reformulation for ordinal classification](https://arxiv.org/abs/1612.00775).\" arXiv preprint arXiv:1612.00775 (2016)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 0 -- Obtaining and preparing the cement_strength dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We will be using the cement_strength dataset from [https://github.com/gagolews/ordinal_regression_data/blob/master/cement_strength.csv](https://github.com/gagolews/ordinal_regression_data/blob/master/cement_strength.csv).\n",
    "\n",
    "First, we are going to download and prepare the and save it as CSV files locally. This is a general procedure that is not specific to CORN.\n",
    "\n",
    "This dataset has 5 ordinal labels (1, 2, 3, 4, and 5). Note that we require labels to be starting at 0, which is why we subtract \"1\" from the label column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of features: 8\n",
      "Number of examples: 998\n",
      "Labels: [0 1 2 3 4]\n"
     ]
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "data_df = pd.read_csv(\"https://raw.githubusercontent.com/gagolews/ordinal_regression_data/master/cement_strength.csv\")\n",
    " \n",
    "data_df[\"response\"] = data_df[\"response\"]-1 # labels should start at 0\n",
    "\n",
    "data_labels = data_df[\"response\"]\n",
    "data_features = data_df.loc[:, [\"V1\", \"V2\", \"V3\", \"V4\", \"V5\", \"V6\", \"V7\", \"V8\"]]\n",
    "\n",
    "print('Number of features:', data_features.shape[1])\n",
    "print('Number of examples:', data_features.shape[0])\n",
    "print('Labels:', np.unique(data_labels.values))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "tags": []
   },
   "source": [
    "### Split into training and test data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.model_selection import train_test_split\n",
    "\n",
    "\n",
    "X_train, X_test, y_train, y_test = train_test_split(\n",
    "    data_features.values,\n",
    "    data_labels.values,\n",
    "    test_size=0.2,\n",
    "    random_state=1,\n",
    "    stratify=data_labels.values)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Standardize features"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.preprocessing import StandardScaler\n",
    "\n",
    "\n",
    "sc = StandardScaler()\n",
    "X_train_std = sc.fit_transform(X_train)\n",
    "X_test_std = sc.transform(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1 -- Setting up the dataset and dataloader"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we set up the data set and data loaders. This is a general procedure that is not specific to CORN. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Training on cuda:0\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "\n",
    "\n",
    "##########################\n",
    "### SETTINGS\n",
    "##########################\n",
    "\n",
    "# Hyperparameters\n",
    "random_seed = 1\n",
    "learning_rate = 0.001\n",
    "num_epochs = 50\n",
    "batch_size = 128\n",
    "\n",
    "# Architecture\n",
    "NUM_CLASSES = 5\n",
    "\n",
    "# Other\n",
    "DEVICE = torch.device(\"cuda:0\" if torch.cuda.is_available() else \"cpu\")\n",
    "print('Training on', DEVICE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "from torch.utils.data import Dataset\n",
    "\n",
    "\n",
    "class MyDataset(Dataset):\n",
    "\n",
    "    def __init__(self, feature_array, label_array, dtype=np.float32):\n",
    "    \n",
    "        self.features = feature_array.astype(np.float32)\n",
    "        self.labels = label_array\n",
    "\n",
    "    def __getitem__(self, index):\n",
    "        inputs = self.features[index]\n",
    "        label = self.labels[index]\n",
    "        return inputs, label\n",
    "\n",
    "    def __len__(self):\n",
    "        return self.labels.shape[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Input batch dimensions: torch.Size([128, 8])\n",
      "Input label dimensions: torch.Size([128])\n"
     ]
    }
   ],
   "source": [
    "import torch\n",
    "from torch.utils.data import DataLoader\n",
    "\n",
    "\n",
    "# Note transforms.ToTensor() scales input images\n",
    "# to 0-1 range\n",
    "train_dataset = MyDataset(X_train_std, y_train)\n",
    "test_dataset = MyDataset(X_test_std, y_test)\n",
    "\n",
    "\n",
    "train_loader = DataLoader(dataset=train_dataset,\n",
    "                          batch_size=batch_size,\n",
    "                          shuffle=True, # want to shuffle the dataset\n",
    "                          num_workers=0) # number processes/CPUs to use\n",
    "\n",
    "test_loader = DataLoader(dataset=test_dataset,\n",
    "                         batch_size=batch_size,\n",
    "                         shuffle=False,\n",
    "                         num_workers=0)\n",
    "\n",
    "# Checking the dataset\n",
    "for inputs, labels in train_loader:  \n",
    "    print('Input batch dimensions:', inputs.shape)\n",
    "    print('Input label dimensions:', labels.shape)\n",
    "    break"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2 - Implementing an MLP with an additional parameter layer `a`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this section, we are implementing a simple MLP for ordinal regression. To implement the Beckham et al. method, we add the parameter layer `a` as `self.a`, which is used to compute the predictions for the loss function later in the training loop:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "class MLP(torch.nn.Module):\n",
    "\n",
    "    def __init__(self, in_features, num_classes, num_hidden_1=300, num_hidden_2=300):\n",
    "        super().__init__()\n",
    "        \n",
    "        self.num_classes = num_classes\n",
    "        self.my_network = torch.nn.Sequential(\n",
    "            \n",
    "            # 1st hidden layer\n",
    "            torch.nn.Linear(in_features, num_hidden_1, bias=False),\n",
    "            torch.nn.LeakyReLU(),\n",
    "            torch.nn.Dropout(0.2),\n",
    "            torch.nn.BatchNorm1d(num_hidden_1),\n",
    "            \n",
    "            # 2nd hidden layer\n",
    "            torch.nn.Linear(num_hidden_1, num_hidden_2, bias=False),\n",
    "            torch.nn.LeakyReLU(),\n",
    "            torch.nn.Dropout(0.2),\n",
    "            torch.nn.BatchNorm1d(num_hidden_2),\n",
    "            \n",
    "            # Output layer\n",
    "            torch.nn.Linear(num_hidden_2, num_classes)\n",
    "        )\n",
    "        \n",
    "        # -----------------------------------------------------\n",
    "        # Beckham 2016-specific parameter layer\n",
    "        self.a = torch.nn.Parameter(torch.zeros(\n",
    "            num_classes).float().normal_(0.0, 0.1).view(-1, 1))\n",
    "        # -----------------------------------------------------\n",
    "                \n",
    "    def forward(self, x):\n",
    "        logits = self.my_network(x)\n",
    "        return logits\n",
    "    \n",
    "    \n",
    "    \n",
    "torch.manual_seed(random_seed)\n",
    "model = MLP(in_features=8, num_classes=NUM_CLASSES)\n",
    "model.to(DEVICE)\n",
    "\n",
    "optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3 - Using the reformulated squared error loss loss for model training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We define the 2 functions `squared_error` and `beckham_logits_to_predictions` to implement the method as shown below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "model.a initial value Parameter containing:\n",
      "tensor([[-0.1132],\n",
      "        [-0.1207],\n",
      "        [-0.1813],\n",
      "        [-0.1094],\n",
      "        [ 0.0962]], device='cuda:0', requires_grad=True)\n",
      "Epoch: 001/050 | Batch 000/007 | Cost: 1.5981\n",
      "Epoch: 002/050 | Batch 000/007 | Cost: 1.4069\n",
      "Epoch: 003/050 | Batch 000/007 | Cost: 1.4926\n",
      "Epoch: 004/050 | Batch 000/007 | Cost: 1.2280\n",
      "Epoch: 005/050 | Batch 000/007 | Cost: 1.2744\n",
      "Epoch: 006/050 | Batch 000/007 | Cost: 1.2107\n",
      "Epoch: 007/050 | Batch 000/007 | Cost: 1.3584\n",
      "Epoch: 008/050 | Batch 000/007 | Cost: 1.1555\n",
      "Epoch: 009/050 | Batch 000/007 | Cost: 1.1060\n",
      "Epoch: 010/050 | Batch 000/007 | Cost: 1.2784\n",
      "Epoch: 011/050 | Batch 000/007 | Cost: 1.1050\n",
      "Epoch: 012/050 | Batch 000/007 | Cost: 1.1555\n",
      "Epoch: 013/050 | Batch 000/007 | Cost: 1.2180\n",
      "Epoch: 014/050 | Batch 000/007 | Cost: 1.1857\n",
      "Epoch: 015/050 | Batch 000/007 | Cost: 1.3390\n",
      "Epoch: 016/050 | Batch 000/007 | Cost: 1.1243\n",
      "Epoch: 017/050 | Batch 000/007 | Cost: 0.9613\n",
      "Epoch: 018/050 | Batch 000/007 | Cost: 1.0661\n",
      "Epoch: 019/050 | Batch 000/007 | Cost: 1.1771\n",
      "Epoch: 020/050 | Batch 000/007 | Cost: 0.8850\n",
      "Epoch: 021/050 | Batch 000/007 | Cost: 0.8132\n",
      "Epoch: 022/050 | Batch 000/007 | Cost: 1.1182\n",
      "Epoch: 023/050 | Batch 000/007 | Cost: 1.1167\n",
      "Epoch: 024/050 | Batch 000/007 | Cost: 1.0908\n",
      "Epoch: 025/050 | Batch 000/007 | Cost: 1.0855\n",
      "Epoch: 026/050 | Batch 000/007 | Cost: 1.0150\n",
      "Epoch: 027/050 | Batch 000/007 | Cost: 1.0790\n",
      "Epoch: 028/050 | Batch 000/007 | Cost: 1.0963\n",
      "Epoch: 029/050 | Batch 000/007 | Cost: 1.0859\n",
      "Epoch: 030/050 | Batch 000/007 | Cost: 0.9183\n",
      "Epoch: 031/050 | Batch 000/007 | Cost: 0.8739\n",
      "Epoch: 032/050 | Batch 000/007 | Cost: 0.9620\n",
      "Epoch: 033/050 | Batch 000/007 | Cost: 1.0171\n",
      "Epoch: 034/050 | Batch 000/007 | Cost: 1.0702\n",
      "Epoch: 035/050 | Batch 000/007 | Cost: 1.0548\n",
      "Epoch: 036/050 | Batch 000/007 | Cost: 0.9164\n",
      "Epoch: 037/050 | Batch 000/007 | Cost: 1.0524\n",
      "Epoch: 038/050 | Batch 000/007 | Cost: 0.9839\n",
      "Epoch: 039/050 | Batch 000/007 | Cost: 1.0588\n",
      "Epoch: 040/050 | Batch 000/007 | Cost: 0.9665\n",
      "Epoch: 041/050 | Batch 000/007 | Cost: 1.0466\n",
      "Epoch: 042/050 | Batch 000/007 | Cost: 0.8587\n",
      "Epoch: 043/050 | Batch 000/007 | Cost: 0.9876\n",
      "Epoch: 044/050 | Batch 000/007 | Cost: 0.9188\n",
      "Epoch: 045/050 | Batch 000/007 | Cost: 0.9107\n",
      "Epoch: 046/050 | Batch 000/007 | Cost: 0.8107\n",
      "Epoch: 047/050 | Batch 000/007 | Cost: 1.0285\n",
      "Epoch: 048/050 | Batch 000/007 | Cost: 0.8575\n",
      "Epoch: 049/050 | Batch 000/007 | Cost: 0.9975\n",
      "Epoch: 050/050 | Batch 000/007 | Cost: 0.9574\n",
      "model.a final value Parameter containing:\n",
      "tensor([[-0.1958],\n",
      "        [-0.1812],\n",
      "        [-0.5243],\n",
      "        [-0.1669],\n",
      "        [ 0.4120]], device='cuda:0', requires_grad=True)\n"
     ]
    }
   ],
   "source": [
    "def squared_error(targets, predictions):\n",
    "    return torch.mean((targets.float() - predictions)**2)\n",
    "\n",
    "def beckham_logits_to_predictions(logits, model, num_classes):\n",
    "    probas = torch.softmax(logits, dim=1)\n",
    "    predictions = ((num_classes-1)\n",
    "                   * torch.sigmoid(probas.mm(model.a).view(-1)))\n",
    "    return predictions\n",
    "\n",
    "\n",
    "print('model.a initial value', model.a)\n",
    "for epoch in range(num_epochs):\n",
    "    \n",
    "    model = model.train()\n",
    "    for batch_idx, (features, class_labels) in enumerate(train_loader):\n",
    "\n",
    "        class_labels = class_labels.to(DEVICE)\n",
    "        features = features.to(DEVICE)\n",
    "        logits = model(features)\n",
    "        \n",
    "        #### Beckham 2016-specific loss----------------------------------------### \n",
    "        predictions = beckham_logits_to_predictions(logits, model, model.num_classes)\n",
    "        loss = squared_error(predictions, class_labels)\n",
    "        ###--------------------------------------------------------------------###   \n",
    "        \n",
    "        optimizer.zero_grad()\n",
    "        loss.backward()\n",
    "        optimizer.step()\n",
    "        \n",
    "        ### LOGGING\n",
    "        if not batch_idx % 200:\n",
    "            print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f' \n",
    "                   %(epoch+1, num_epochs, batch_idx, \n",
    "                     len(train_loader), loss))\n",
    "            \n",
    "print('model.a final value', model.a)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4 -- Evaluate model\n",
    "\n",
    "Finally, after model training, we can evaluate the performance of the model. For example, via the mean absolute error and mean squared error measures.\n",
    "\n",
    "For this, we are going to use the `beckham_logits_to_labels` to convert the logits into ordinal class labels as shown below:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [],
   "source": [
    "def beckham_logits_to_labels(logits, model, num_classes):\n",
    "    predictions = beckham_logits_to_predictions(logits, model, num_classes)\n",
    "    return torch.round(predictions).float()\n",
    "    \n",
    "\n",
    "def compute_mae_and_mse(model, data_loader, device):\n",
    "\n",
    "    with torch.no_grad():\n",
    "    \n",
    "        mae, mse, acc, num_examples = 0., 0., 0., 0\n",
    "\n",
    "        for i, (features, targets) in enumerate(data_loader):\n",
    "\n",
    "            features = features.to(device)\n",
    "            targets = targets.float().to(device)\n",
    "\n",
    "            logits = model(features)\n",
    "            predicted_labels = beckham_logits_to_labels(logits, model, model.num_classes)\n",
    "\n",
    "            num_examples += targets.size(0)\n",
    "            mae += torch.sum(torch.abs(predicted_labels - targets))\n",
    "            mse += torch.sum((predicted_labels - targets)**2)\n",
    "\n",
    "        mae = mae / num_examples\n",
    "        mse = mse / num_examples\n",
    "        return mae, mse"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_mae, train_mse = compute_mae_and_mse(model, train_loader, DEVICE)\n",
    "test_mae, test_mse = compute_mae_and_mse(model, test_loader, DEVICE)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Mean absolute error (train/test): 0.66 | 0.69\n",
      "Mean squared error (train/test): 0.85 | 0.89\n"
     ]
    }
   ],
   "source": [
    "print(f'Mean absolute error (train/test): {train_mae:.2f} | {test_mae:.2f}')\n",
    "print(f'Mean squared error (train/test): {train_mse:.2f} | {test_mse:.2f}')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}