{
  "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TorchOpt as Meta-Optimizer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "[<img align=\"left\" src=\"https://colab.research.google.com/assets/colab-badge.svg\">](https://colab.research.google.com/github/metaopt/torchopt/blob/main/tutorials/3_Meta_Optimizer.ipynb)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this tutorial, we will show how to treat TorchOpt as a differentiable optimizer with traditional PyTorch optimization API. In addition, we also provide many other API for easy meta-learning algorithm implementations."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Basic API for Differentiable Optimizer\n",
    "\n",
    "`MetaOptimizer` is the main class for our differentiable optimizer. Combined with the functional optimizer `torchopt.sgd` and `torchopt.adam` mentioned in the tutorial 1, we can define our high-level API `torchopt.MetaSGD` and `torchopt.MetaAdam`. We will discuss how this combination happens with `torchopt.chain` in Section 3. Let us consider the problem below."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Assume a tensor $x$ is a meta-parameter and $a$ is a normal parameters (such as network parameters). We have inner loss $\\mathcal{L}^{\\textrm{in}} = a_0 \\cdot x^2$ and we update $a$ use the gradient $\\frac{\\partial \\mathcal{L}^{\\textrm{in}}}{\\partial a_0} = x^2$ and $a_1 = a_0 - \\eta \\, \\frac{\\partial \\mathcal{L}^{\\textrm{in}}}{\\partial a_0} = a_0 - \\eta \\, x^2$. Then we compute the outer loss $\\mathcal{L}^{\\textrm{out}} = a_1 \\cdot x^2$. So the gradient of outer loss to $x$ would be:\n",
    "\n",
    "$$\n",
    "\\begin{split}\n",
    "        \\frac{\\partial \\mathcal{L}^{\\textrm{out}}}{\\partial x}\n",
    "    & = \\frac{\\partial (a_1 \\cdot x^2)}{\\partial x} \\\\\n",
    "    & = \\frac{\\partial a_1}{\\partial x} \\cdot x^2 + a_1 \\cdot \\frac{\\partial (x^2)}{\\partial x} \\\\\n",
    "    & = \\frac{\\partial (a_0 - \\eta \\, x^2)}{\\partial x} \\cdot x^2 + (a_0 - \\eta \\, x^2) \\cdot 2 x \\\\\n",
    "    & = (- \\eta \\cdot 2 x) \\cdot x^2 + (a_0 - \\eta \\, x^2) \\cdot 2 x \\\\\n",
    "    & = - 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x\n",
    "\\end{split}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Given the analytical solution above. Let's try to verify it with TorchOpt. Define the net work first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.display import display\n",
    "\n",
    "import torch\n",
    "import torch.nn as nn\n",
    "import torch.nn.functional as F\n",
    "\n",
    "import torchopt\n",
    "\n",
    "\n",
    "class Net(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.a = nn.Parameter(torch.tensor(1.0), requires_grad=True)\n",
    "\n",
    "    def forward(self, x):\n",
    "        return self.a * (x**2)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then we declare the network (parameterized by `a`) and the meta-parameter `x`. Do not forget to set flag `requires_grad=True` for `x`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "net = Net()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next we declare the meta-optimizer. Here we show two equivalent ways of defining the meta-optimizer. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Low-level API\n",
    "optim = torchopt.MetaOptimizer(net, torchopt.sgd(lr=1.0))\n",
    "\n",
    "# High-level API\n",
    "optim = torchopt.MetaSGD(net, lr=1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The meta-optimizer takes the network as input and use method `step` to update the network (parameterized by `a`). Finally, we show how a bi-level process works."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x.grad = tensor(-28.)\n"
     ]
    }
   ],
   "source": [
    "inner_loss = net(x)\n",
    "optim.step(inner_loss)\n",
    "\n",
    "outer_loss = net(x)\n",
    "outer_loss.backward()\n",
    "# x.grad = - 4 * lr * x^3 + 2 * a_0 * x\n",
    "#        = - 4 * 1 * 2^3 + 2 * 1 * 2\n",
    "#        = -32 + 4\n",
    "#        = -28\n",
    "print(f'x.grad = {x.grad!r}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 1.1 Track the Gradient of Momentum\n",
    "\n",
    "Note that most modern optimizers involve moment term in the gradient update (basically only SGD with `momentum=0` does not involve). We provide an option for user to choose whether to also track the meta-gradient through moment term. The default option is `moment_requires_grad=True`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When you do not track the meta-gradient through moment (`moment_requires_grad=False`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<graphviz.graphs.Digraph object at 0x7fbc7e823310>\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Generated by graphviz version 2.42.3 (20191010.1750)\n -->\n<!-- Title: %3 Pages: 1 -->\n<svg width=\"344pt\" height=\"962pt\"\n viewBox=\"0.00 0.00 343.50 962.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 958)\">\n<title>%3</title>\n<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-958 339.5,-958 339.5,4 -4,4\"/>\n<!-- 140447553047184 -->\n<g id=\"node1\" class=\"node\">\n<title>140447553047184</title>\n<polygon fill=\"#caff70\" stroke=\"black\" points=\"179,-30 102,-30 102,0 179,0 179,-30\"/>\n<text text-anchor=\"middle\" x=\"140.5\" y=\"-18\" font-family=\"monospace\" font-size=\"10.00\">outer_loss</text>\n<text text-anchor=\"middle\" x=\"140.5\" y=\"-7\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553041216 -->\n<g id=\"node2\" class=\"node\">\n<title>140447553041216</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"197,-85 84,-85 84,-66 197,-66 197,-85\"/>\n<text text-anchor=\"middle\" x=\"140.5\" y=\"-73\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackward0</text>\n</g>\n<!-- 140447553041216&#45;&gt;140447553047184 -->\n<g id=\"edge26\" class=\"edge\">\n<title>140447553041216&#45;&gt;140447553047184</title>\n<path fill=\"none\" stroke=\"black\" d=\"M140.5,-65.87C140.5,-59.11 140.5,-49.35 140.5,-40.26\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"144,-40.11 140.5,-30.11 137,-40.11 144,-40.11\"/>\n</g>\n<!-- 140447553042896 -->\n<g id=\"node3\" class=\"node\">\n<title>140447553042896</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"185,-140 96,-140 96,-121 185,-121 185,-140\"/>\n<text text-anchor=\"middle\" x=\"140.5\" y=\"-128\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553042896&#45;&gt;140447553041216 -->\n<g id=\"edge1\" class=\"edge\">\n<title>140447553042896&#45;&gt;140447553041216</title>\n<path fill=\"none\" stroke=\"black\" d=\"M140.5,-120.75C140.5,-113.8 140.5,-103.85 140.5,-95.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"144,-95.09 140.5,-85.09 137,-95.09 144,-95.09\"/>\n</g>\n<!-- 140447553019088 -->\n<g id=\"node4\" class=\"node\">\n<title>140447553019088</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"208,-217 119,-217 119,-176 208,-176 208,-217\"/>\n<text text-anchor=\"middle\" x=\"163.5\" y=\"-205\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n<text text-anchor=\"middle\" x=\"163.5\" y=\"-194\" font-family=\"monospace\" font-size=\"10.00\">step1.a</text>\n<text text-anchor=\"middle\" x=\"163.5\" y=\"-183\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553019088&#45;&gt;140447553042896 -->\n<g id=\"edge2\" class=\"edge\">\n<title>140447553019088&#45;&gt;140447553042896</title>\n<path fill=\"none\" stroke=\"black\" d=\"M156.47,-175.95C153.5,-167.67 150.05,-158.07 147.12,-149.92\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"150.32,-148.49 143.65,-140.26 143.74,-150.86 150.32,-148.49\"/>\n</g>\n<!-- 140447553041072 -->\n<g id=\"node5\" class=\"node\">\n<title>140447553041072</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"333,-822 232,-822 232,-803 333,-803 333,-822\"/>\n<text text-anchor=\"middle\" x=\"282.5\" y=\"-810\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553041072&#45;&gt;140447553019088 -->\n<g id=\"edge3\" class=\"edge\">\n<title>140447553041072&#45;&gt;140447553019088</title>\n<path fill=\"none\" stroke=\"black\" d=\"M290.09,-802.96C304.75,-785.57 335.5,-744.23 335.5,-703.5 335.5,-703.5 335.5,-703.5 335.5,-316.5 335.5,-258.46 268.44,-226.5 218.07,-210.69\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"218.83,-207.26 208.24,-207.74 216.81,-213.97 218.83,-207.26\"/>\n</g>\n<!-- 140447553043664 -->\n<g id=\"node13\" class=\"node\">\n<title>140447553043664</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"271,-767 182,-767 182,-748 271,-748 271,-767\"/>\n<text text-anchor=\"middle\" x=\"226.5\" y=\"-755\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553041072&#45;&gt;140447553043664 -->\n<g id=\"edge12\" class=\"edge\">\n<title>140447553041072&#45;&gt;140447553043664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M273.5,-802.98C265.31,-795.23 252.99,-783.58 243.03,-774.14\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"245.33,-771.5 235.66,-767.17 240.52,-776.59 245.33,-771.5\"/>\n</g>\n<!-- 140447553045344 -->\n<g id=\"node6\" class=\"node\">\n<title>140447553045344</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"312,-888 253,-888 253,-858 312,-858 312,-888\"/>\n<text text-anchor=\"middle\" x=\"282.5\" y=\"-876\" font-family=\"monospace\" font-size=\"10.00\">step0.a</text>\n<text text-anchor=\"middle\" x=\"282.5\" y=\"-865\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553045344&#45;&gt;140447553041072 -->\n<g id=\"edge4\" class=\"edge\">\n<title>140447553045344&#45;&gt;140447553041072</title>\n<path fill=\"none\" stroke=\"black\" d=\"M282.5,-857.84C282.5,-850.21 282.5,-840.7 282.5,-832.45\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"286,-832.27 282.5,-822.27 279,-832.27 286,-832.27\"/>\n</g>\n<!-- 140447553041120 -->\n<g id=\"node7\" class=\"node\">\n<title>140447553041120</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"208,-272 119,-272 119,-253 208,-253 208,-272\"/>\n<text text-anchor=\"middle\" x=\"163.5\" y=\"-260\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553041120&#45;&gt;140447553019088 -->\n<g id=\"edge5\" class=\"edge\">\n<title>140447553041120&#45;&gt;140447553019088</title>\n<path fill=\"none\" stroke=\"black\" d=\"M163.5,-252.87C163.5,-246.22 163.5,-236.63 163.5,-227.28\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"167,-227.01 163.5,-217.01 160,-227.01 167,-227.01\"/>\n</g>\n<!-- 140447553043040 -->\n<g id=\"node8\" class=\"node\">\n<title>140447553043040</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"208,-327 119,-327 119,-308 208,-308 208,-327\"/>\n<text text-anchor=\"middle\" x=\"163.5\" y=\"-315\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553043040&#45;&gt;140447553041120 -->\n<g id=\"edge6\" class=\"edge\">\n<title>140447553043040&#45;&gt;140447553041120</title>\n<path fill=\"none\" stroke=\"black\" d=\"M163.5,-307.75C163.5,-300.8 163.5,-290.85 163.5,-282.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"167,-282.09 163.5,-272.09 160,-282.09 167,-282.09\"/>\n</g>\n<!-- 140447553043184 -->\n<g id=\"node9\" class=\"node\">\n<title>140447553043184</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"164,-492 75,-492 75,-473 164,-473 164,-492\"/>\n<text text-anchor=\"middle\" x=\"119.5\" y=\"-480\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553043184&#45;&gt;140447553043040 -->\n<g id=\"edge7\" class=\"edge\">\n<title>140447553043184&#45;&gt;140447553043040</title>\n<path fill=\"none\" stroke=\"black\" d=\"M121.52,-472.83C126.12,-453.19 137.92,-403.83 149.5,-363 151.93,-354.43 154.86,-345.01 157.41,-337.05\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"160.8,-337.91 160.56,-327.32 154.15,-335.75 160.8,-337.91\"/>\n</g>\n<!-- 140447553043328 -->\n<g id=\"node10\" class=\"node\">\n<title>140447553043328</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"162,-602 73,-602 73,-583 162,-583 162,-602\"/>\n<text text-anchor=\"middle\" x=\"117.5\" y=\"-590\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553043328&#45;&gt;140447553043184 -->\n<g id=\"edge8\" class=\"edge\">\n<title>140447553043328&#45;&gt;140447553043184</title>\n<path fill=\"none\" stroke=\"black\" d=\"M117.66,-582.66C117.99,-565.17 118.72,-525.8 119.15,-502.27\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"122.65,-502.22 119.34,-492.16 115.66,-502.09 122.65,-502.22\"/>\n</g>\n<!-- 140447553043424 -->\n<g id=\"node11\" class=\"node\">\n<title>140447553043424</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"271,-657 182,-657 182,-638 271,-638 271,-657\"/>\n<text text-anchor=\"middle\" x=\"226.5\" y=\"-645\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553043424&#45;&gt;140447553043328 -->\n<g id=\"edge9\" class=\"edge\">\n<title>140447553043424&#45;&gt;140447553043328</title>\n<path fill=\"none\" stroke=\"black\" d=\"M208.99,-637.98C191.53,-629.5 164.49,-616.35 144.33,-606.54\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"145.86,-603.4 135.33,-602.17 142.79,-609.69 145.86,-603.4\"/>\n</g>\n<!-- 140447553043856 -->\n<g id=\"node21\" class=\"node\">\n<title>140447553043856</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"293,-602 180,-602 180,-583 293,-583 293,-602\"/>\n<text text-anchor=\"middle\" x=\"236.5\" y=\"-590\" font-family=\"monospace\" font-size=\"10.00\">AddcmulBackward0</text>\n</g>\n<!-- 140447553043424&#45;&gt;140447553043856 -->\n<g id=\"edge22\" class=\"edge\">\n<title>140447553043424&#45;&gt;140447553043856</title>\n<path fill=\"none\" stroke=\"black\" d=\"M222.98,-637.75C222.8,-630.72 224.28,-620.62 226.58,-611.84\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"229.99,-612.68 229.75,-602.09 223.33,-610.52 229.99,-612.68\"/>\n</g>\n<!-- 140447553043424&#45;&gt;140447553043856 -->\n<g id=\"edge23\" class=\"edge\">\n<title>140447553043424&#45;&gt;140447553043856</title>\n<path fill=\"none\" stroke=\"black\" d=\"M233.32,-637.75C236.12,-630.8 238.44,-620.85 239.46,-612.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"242.96,-612.26 240.01,-602.09 235.97,-611.88 242.96,-612.26\"/>\n</g>\n<!-- 140447553043520 -->\n<g id=\"node12\" class=\"node\">\n<title>140447553043520</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"307,-712 146,-712 146,-693 307,-693 307,-712\"/>\n<text text-anchor=\"middle\" x=\"226.5\" y=\"-700\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackwardBackward0</text>\n</g>\n<!-- 140447553043520&#45;&gt;140447553043424 -->\n<g id=\"edge10\" class=\"edge\">\n<title>140447553043520&#45;&gt;140447553043424</title>\n<path fill=\"none\" stroke=\"black\" d=\"M226.5,-692.75C226.5,-685.8 226.5,-675.85 226.5,-667.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"230,-667.09 226.5,-657.09 223,-667.09 230,-667.09\"/>\n</g>\n<!-- 140447553043664&#45;&gt;140447553043520 -->\n<g id=\"edge11\" class=\"edge\">\n<title>140447553043664&#45;&gt;140447553043520</title>\n<path fill=\"none\" stroke=\"black\" d=\"M226.5,-747.75C226.5,-740.8 226.5,-730.85 226.5,-722.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"230,-722.09 226.5,-712.09 223,-722.09 230,-722.09\"/>\n</g>\n<!-- 140447553043472 -->\n<g id=\"node14\" class=\"node\">\n<title>140447553043472</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"162,-822 73,-822 73,-803 162,-803 162,-822\"/>\n<text text-anchor=\"middle\" x=\"117.5\" y=\"-810\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140447553043472&#45;&gt;140447553043424 -->\n<g id=\"edge16\" class=\"edge\">\n<title>140447553043472&#45;&gt;140447553043424</title>\n<path fill=\"none\" stroke=\"black\" d=\"M116.38,-802.74C114.16,-781.61 111.4,-727.04 136.5,-693 147.65,-677.87 165.38,-667.59 182.16,-660.76\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"183.74,-663.91 191.88,-657.13 181.29,-657.35 183.74,-663.91\"/>\n</g>\n<!-- 140447553043472&#45;&gt;140447553043664 -->\n<g id=\"edge13\" class=\"edge\">\n<title>140447553043472&#45;&gt;140447553043664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M135.01,-802.98C152.47,-794.5 179.51,-781.35 199.67,-771.54\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"201.21,-774.69 208.67,-767.17 198.14,-768.4 201.21,-774.69\"/>\n</g>\n<!-- 140447553043808 -->\n<g id=\"node15\" class=\"node\">\n<title>140447553043808</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"132,-882.5 31,-882.5 31,-863.5 132,-863.5 132,-882.5\"/>\n<text text-anchor=\"middle\" x=\"81.5\" y=\"-870.5\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553043808&#45;&gt;140447553043472 -->\n<g id=\"edge14\" class=\"edge\">\n<title>140447553043808&#45;&gt;140447553043472</title>\n<path fill=\"none\" stroke=\"black\" d=\"M86.81,-863.37C92.08,-854.81 100.29,-841.47 106.89,-830.74\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"109.91,-832.52 112.17,-822.17 103.94,-828.85 109.91,-832.52\"/>\n</g>\n<!-- 140447553041264 -->\n<g id=\"node22\" class=\"node\">\n<title>140447553041264</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"89,-767 0,-767 0,-748 89,-748 89,-767\"/>\n<text text-anchor=\"middle\" x=\"44.5\" y=\"-755\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140447553043808&#45;&gt;140447553041264 -->\n<g id=\"edge25\" class=\"edge\">\n<title>140447553043808&#45;&gt;140447553041264</title>\n<path fill=\"none\" stroke=\"black\" d=\"M78.17,-863.2C74.44,-853.25 68.29,-836.55 63.5,-822 58.54,-806.95 53.42,-789.73 49.75,-777.01\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"53.05,-775.82 46.93,-767.17 46.32,-777.75 53.05,-775.82\"/>\n</g>\n<!-- 140447553045584 -->\n<g id=\"node16\" class=\"node\">\n<title>140447553045584</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"108.5,-954 54.5,-954 54.5,-924 108.5,-924 108.5,-954\"/>\n<text text-anchor=\"middle\" x=\"81.5\" y=\"-942\" font-family=\"monospace\" font-size=\"10.00\">x</text>\n<text text-anchor=\"middle\" x=\"81.5\" y=\"-931\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553045584&#45;&gt;140447553043808 -->\n<g id=\"edge15\" class=\"edge\">\n<title>140447553045584&#45;&gt;140447553043808</title>\n<path fill=\"none\" stroke=\"black\" d=\"M81.5,-923.8C81.5,-914.7 81.5,-902.79 81.5,-892.9\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"85,-892.84 81.5,-882.84 78,-892.84 85,-892.84\"/>\n</g>\n<!-- 140447553043136 -->\n<g id=\"node17\" class=\"node\">\n<title>140447553043136</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"247,-382 158,-382 158,-363 247,-363 247,-382\"/>\n<text text-anchor=\"middle\" x=\"202.5\" y=\"-370\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553043136&#45;&gt;140447553043040 -->\n<g id=\"edge17\" class=\"edge\">\n<title>140447553043136&#45;&gt;140447553043040</title>\n<path fill=\"none\" stroke=\"black\" d=\"M196.06,-362.75C190.61,-355.34 182.64,-344.5 175.94,-335.41\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"178.57,-333.07 169.82,-327.09 172.93,-337.22 178.57,-333.07\"/>\n</g>\n<!-- 140447553043232 -->\n<g id=\"node18\" class=\"node\">\n<title>140447553043232</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"267,-437 172,-437 172,-418 267,-418 267,-437\"/>\n<text text-anchor=\"middle\" x=\"219.5\" y=\"-425\" font-family=\"monospace\" font-size=\"10.00\">SqrtBackward0</text>\n</g>\n<!-- 140447553043232&#45;&gt;140447553043136 -->\n<g id=\"edge18\" class=\"edge\">\n<title>140447553043232&#45;&gt;140447553043136</title>\n<path fill=\"none\" stroke=\"black\" d=\"M216.69,-417.75C214.44,-410.72 211.2,-400.62 208.38,-391.84\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"211.64,-390.54 205.25,-382.09 204.98,-392.68 211.64,-390.54\"/>\n</g>\n<!-- 140447553043760 -->\n<g id=\"node19\" class=\"node\">\n<title>140447553043760</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"274,-492 185,-492 185,-473 274,-473 274,-492\"/>\n<text text-anchor=\"middle\" x=\"229.5\" y=\"-480\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553043760&#45;&gt;140447553043232 -->\n<g id=\"edge19\" class=\"edge\">\n<title>140447553043760&#45;&gt;140447553043232</title>\n<path fill=\"none\" stroke=\"black\" d=\"M227.85,-472.75C226.54,-465.8 224.66,-455.85 223.02,-447.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"226.41,-446.27 221.12,-437.09 219.54,-447.56 226.41,-446.27\"/>\n</g>\n<!-- 140447553043904 -->\n<g id=\"node20\" class=\"node\">\n<title>140447553043904</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"275,-547 186,-547 186,-528 275,-528 275,-547\"/>\n<text text-anchor=\"middle\" x=\"230.5\" y=\"-535\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553043904&#45;&gt;140447553043760 -->\n<g id=\"edge20\" class=\"edge\">\n<title>140447553043904&#45;&gt;140447553043760</title>\n<path fill=\"none\" stroke=\"black\" d=\"M230.33,-527.75C230.2,-520.8 230.02,-510.85 229.85,-502.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"233.35,-502.02 229.66,-492.09 226.35,-502.15 233.35,-502.02\"/>\n</g>\n<!-- 140447553043856&#45;&gt;140447553043904 -->\n<g id=\"edge21\" class=\"edge\">\n<title>140447553043856&#45;&gt;140447553043904</title>\n<path fill=\"none\" stroke=\"black\" d=\"M235.51,-582.75C234.72,-575.8 233.6,-565.85 232.61,-557.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"236.08,-556.63 231.47,-547.09 229.12,-557.42 236.08,-556.63\"/>\n</g>\n<!-- 140447553041264&#45;&gt;140447553042896 -->\n<g id=\"edge24\" class=\"edge\">\n<title>140447553041264&#45;&gt;140447553042896</title>\n<path fill=\"none\" stroke=\"black\" d=\"M44.5,-747.82C44.5,-729.48 44.5,-685.44 44.5,-648.5 44.5,-648.5 44.5,-648.5 44.5,-261.5 44.5,-211.41 91.19,-167.96 119.45,-146.25\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"121.61,-149 127.55,-140.23 117.44,-143.38 121.61,-149\"/>\n</g>\n</g>\n</svg>\n"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net = Net()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)\n",
    "y = torch.tensor(1.0)\n",
    "\n",
    "optim = torchopt.MetaAdam(net, lr=1.0, moment_requires_grad=False)\n",
    "\n",
    "net_state_0 = torchopt.extract_state_dict(net, enable_visual=True, visual_prefix='step0.')\n",
    "inner_loss = F.mse_loss(net(x), y)\n",
    "optim.step(inner_loss)\n",
    "net_state_1 = torchopt.extract_state_dict(net, enable_visual=True, visual_prefix='step1.')\n",
    "\n",
    "outer_loss = F.mse_loss(net(x), y)\n",
    "display(\n",
    "    torchopt.visual.make_dot(\n",
    "        outer_loss, params=[net_state_0, net_state_1, {'x': x, 'outer_loss': outer_loss}]\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- When you track the meta-gradient through moment (`moment_requires_grad=True`, default for `torchopt.MetaAdam`)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<graphviz.graphs.Digraph object at 0x7fbc7e8238e0>\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Generated by graphviz version 2.42.3 (20191010.1750)\n -->\n<!-- Title: %3 Pages: 1 -->\n<svg width=\"509pt\" height=\"974pt\"\n viewBox=\"0.00 0.00 508.50 974.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 970)\">\n<title>%3</title>\n<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-970 504.5,-970 504.5,4 -4,4\"/>\n<!-- 140447553148704 -->\n<g id=\"node1\" class=\"node\">\n<title>140447553148704</title>\n<polygon fill=\"#caff70\" stroke=\"black\" points=\"323.5,-30 246.5,-30 246.5,0 323.5,0 323.5,-30\"/>\n<text text-anchor=\"middle\" x=\"285\" y=\"-18\" font-family=\"monospace\" font-size=\"10.00\">outer_loss</text>\n<text text-anchor=\"middle\" x=\"285\" y=\"-7\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553041024 -->\n<g id=\"node2\" class=\"node\">\n<title>140447553041024</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"341.5,-85 228.5,-85 228.5,-66 341.5,-66 341.5,-85\"/>\n<text text-anchor=\"middle\" x=\"285\" y=\"-73\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackward0</text>\n</g>\n<!-- 140447553041024&#45;&gt;140447553148704 -->\n<g id=\"edge32\" class=\"edge\">\n<title>140447553041024&#45;&gt;140447553148704</title>\n<path fill=\"none\" stroke=\"black\" d=\"M285,-65.87C285,-59.11 285,-49.35 285,-40.26\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"288.5,-40.11 285,-30.11 281.5,-40.11 288.5,-40.11\"/>\n</g>\n<!-- 140447553043424 -->\n<g id=\"node3\" class=\"node\">\n<title>140447553043424</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"329.5,-140 240.5,-140 240.5,-121 329.5,-121 329.5,-140\"/>\n<text text-anchor=\"middle\" x=\"285\" y=\"-128\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553043424&#45;&gt;140447553041024 -->\n<g id=\"edge1\" class=\"edge\">\n<title>140447553043424&#45;&gt;140447553041024</title>\n<path fill=\"none\" stroke=\"black\" d=\"M285,-120.75C285,-113.8 285,-103.85 285,-95.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"288.5,-95.09 285,-85.09 281.5,-95.09 288.5,-95.09\"/>\n</g>\n<!-- 140450536407152 -->\n<g id=\"node4\" class=\"node\">\n<title>140450536407152</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"232.5,-217 143.5,-217 143.5,-176 232.5,-176 232.5,-217\"/>\n<text text-anchor=\"middle\" x=\"188\" y=\"-205\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n<text text-anchor=\"middle\" x=\"188\" y=\"-194\" font-family=\"monospace\" font-size=\"10.00\">step1.a</text>\n<text text-anchor=\"middle\" x=\"188\" y=\"-183\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140450536407152&#45;&gt;140447553043424 -->\n<g id=\"edge2\" class=\"edge\">\n<title>140450536407152&#45;&gt;140447553043424</title>\n<path fill=\"none\" stroke=\"black\" d=\"M217.63,-175.95C232.39,-166.21 249.91,-154.65 263.38,-145.76\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"265.56,-148.52 271.98,-140.09 261.7,-142.68 265.56,-148.52\"/>\n</g>\n<!-- 140447553041264 -->\n<g id=\"node5\" class=\"node\">\n<title>140447553041264</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"162.5,-834 61.5,-834 61.5,-815 162.5,-815 162.5,-834\"/>\n<text text-anchor=\"middle\" x=\"112\" y=\"-822\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553041264&#45;&gt;140450536407152 -->\n<g id=\"edge3\" class=\"edge\">\n<title>140447553041264&#45;&gt;140450536407152</title>\n<path fill=\"none\" stroke=\"black\" d=\"M94.01,-814.75C62.35,-797.94 0,-757.93 0,-703.5 0,-703.5 0,-703.5 0,-316.5 0,-252.73 78.18,-221.72 133.71,-207.74\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"134.54,-211.14 143.45,-205.4 132.91,-204.33 134.54,-211.14\"/>\n</g>\n<!-- 140447553019232 -->\n<g id=\"node16\" class=\"node\">\n<title>140447553019232</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"272.5,-773 183.5,-773 183.5,-754 272.5,-754 272.5,-773\"/>\n<text text-anchor=\"middle\" x=\"228\" y=\"-761\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553041264&#45;&gt;140447553019232 -->\n<g id=\"edge15\" class=\"edge\">\n<title>140447553041264&#45;&gt;140447553019232</title>\n<path fill=\"none\" stroke=\"black\" d=\"M129.12,-814.79C148.33,-805.02 179.72,-789.05 201.98,-777.73\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"203.86,-780.7 211.19,-773.05 200.69,-774.46 203.86,-780.7\"/>\n</g>\n<!-- 140447553148064 -->\n<g id=\"node6\" class=\"node\">\n<title>140447553148064</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"141.5,-900 82.5,-900 82.5,-870 141.5,-870 141.5,-900\"/>\n<text text-anchor=\"middle\" x=\"112\" y=\"-888\" font-family=\"monospace\" font-size=\"10.00\">step0.a</text>\n<text text-anchor=\"middle\" x=\"112\" y=\"-877\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553148064&#45;&gt;140447553041264 -->\n<g id=\"edge4\" class=\"edge\">\n<title>140447553148064&#45;&gt;140447553041264</title>\n<path fill=\"none\" stroke=\"black\" d=\"M112,-869.84C112,-862.21 112,-852.7 112,-844.45\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"115.5,-844.27 112,-834.27 108.5,-844.27 115.5,-844.27\"/>\n</g>\n<!-- 140447553041216 -->\n<g id=\"node7\" class=\"node\">\n<title>140447553041216</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"232.5,-272 143.5,-272 143.5,-253 232.5,-253 232.5,-272\"/>\n<text text-anchor=\"middle\" x=\"188\" y=\"-260\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553041216&#45;&gt;140450536407152 -->\n<g id=\"edge5\" class=\"edge\">\n<title>140447553041216&#45;&gt;140450536407152</title>\n<path fill=\"none\" stroke=\"black\" d=\"M188,-252.87C188,-246.22 188,-236.63 188,-227.28\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"191.5,-227.01 188,-217.01 184.5,-227.01 191.5,-227.01\"/>\n</g>\n<!-- 140447553041312 -->\n<g id=\"node8\" class=\"node\">\n<title>140447553041312</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"232.5,-327 143.5,-327 143.5,-308 232.5,-308 232.5,-327\"/>\n<text text-anchor=\"middle\" x=\"188\" y=\"-315\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553041312&#45;&gt;140447553041216 -->\n<g id=\"edge6\" class=\"edge\">\n<title>140447553041312&#45;&gt;140447553041216</title>\n<path fill=\"none\" stroke=\"black\" d=\"M188,-307.75C188,-300.8 188,-290.85 188,-282.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"191.5,-282.09 188,-272.09 184.5,-282.09 191.5,-282.09\"/>\n</g>\n<!-- 140447553041408 -->\n<g id=\"node9\" class=\"node\">\n<title>140447553041408</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"176.5,-437 87.5,-437 87.5,-418 176.5,-418 176.5,-437\"/>\n<text text-anchor=\"middle\" x=\"132\" y=\"-425\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553041408&#45;&gt;140447553041312 -->\n<g id=\"edge7\" class=\"edge\">\n<title>140447553041408&#45;&gt;140447553041312</title>\n<path fill=\"none\" stroke=\"black\" d=\"M136.58,-417.66C145.78,-399.93 166.62,-359.73 178.76,-336.32\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"182.01,-337.65 183.51,-327.16 175.8,-334.42 182.01,-337.65\"/>\n</g>\n<!-- 140447553043376 -->\n<g id=\"node10\" class=\"node\">\n<title>140447553043376</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"173.5,-602 84.5,-602 84.5,-583 173.5,-583 173.5,-602\"/>\n<text text-anchor=\"middle\" x=\"129\" y=\"-590\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553043376&#45;&gt;140447553041408 -->\n<g id=\"edge8\" class=\"edge\">\n<title>140447553043376&#45;&gt;140447553041408</title>\n<path fill=\"none\" stroke=\"black\" d=\"M129.16,-582.74C129.63,-557.31 131,-483.08 131.65,-447.69\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"135.15,-447.37 131.84,-437.31 128.15,-447.24 135.15,-447.37\"/>\n</g>\n<!-- 140447553041168 -->\n<g id=\"node11\" class=\"node\">\n<title>140447553041168</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"158.5,-657 69.5,-657 69.5,-638 158.5,-638 158.5,-657\"/>\n<text text-anchor=\"middle\" x=\"114\" y=\"-645\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553041168&#45;&gt;140447553043376 -->\n<g id=\"edge9\" class=\"edge\">\n<title>140447553041168&#45;&gt;140447553043376</title>\n<path fill=\"none\" stroke=\"black\" d=\"M116.48,-637.75C118.46,-630.72 121.32,-620.62 123.81,-611.84\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"127.21,-612.66 126.57,-602.09 120.48,-610.76 127.21,-612.66\"/>\n</g>\n<!-- 140447553042272 -->\n<g id=\"node12\" class=\"node\">\n<title>140447553042272</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"129.5,-712 28.5,-712 28.5,-693 129.5,-693 129.5,-712\"/>\n<text text-anchor=\"middle\" x=\"79\" y=\"-700\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553042272&#45;&gt;140447553041168 -->\n<g id=\"edge10\" class=\"edge\">\n<title>140447553042272&#45;&gt;140447553041168</title>\n<path fill=\"none\" stroke=\"black\" d=\"M84.78,-692.75C89.62,-685.42 96.68,-674.73 102.64,-665.7\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"105.74,-667.36 108.33,-657.09 99.9,-663.5 105.74,-667.36\"/>\n</g>\n<!-- 140450290826352 -->\n<g id=\"node13\" class=\"node\">\n<title>140450290826352</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"106,-779 52,-779 52,-748 106,-748 106,-779\"/>\n<text text-anchor=\"middle\" x=\"79\" y=\"-755\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140450290826352&#45;&gt;140447553042272 -->\n<g id=\"edge11\" class=\"edge\">\n<title>140450290826352&#45;&gt;140447553042272</title>\n<path fill=\"none\" stroke=\"black\" d=\"M79,-747.92C79,-740.22 79,-730.69 79,-722.43\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"82.5,-722.25 79,-712.25 75.5,-722.25 82.5,-722.25\"/>\n</g>\n<!-- 140447553044432 -->\n<g id=\"node14\" class=\"node\">\n<title>140447553044432</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"292.5,-657 203.5,-657 203.5,-638 292.5,-638 292.5,-657\"/>\n<text text-anchor=\"middle\" x=\"248\" y=\"-645\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553044432&#45;&gt;140447553043376 -->\n<g id=\"edge12\" class=\"edge\">\n<title>140447553044432&#45;&gt;140447553043376</title>\n<path fill=\"none\" stroke=\"black\" d=\"M228.88,-637.98C209.65,-629.42 179.77,-616.11 157.69,-606.28\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"159.03,-603.04 148.47,-602.17 156.18,-609.44 159.03,-603.04\"/>\n</g>\n<!-- 140447553018320 -->\n<g id=\"node24\" class=\"node\">\n<title>140447553018320</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"304.5,-602 191.5,-602 191.5,-583 304.5,-583 304.5,-602\"/>\n<text text-anchor=\"middle\" x=\"248\" y=\"-590\" font-family=\"monospace\" font-size=\"10.00\">AddcmulBackward0</text>\n</g>\n<!-- 140447553044432&#45;&gt;140447553018320 -->\n<g id=\"edge28\" class=\"edge\">\n<title>140447553044432&#45;&gt;140447553018320</title>\n<path fill=\"none\" stroke=\"black\" d=\"M242.83,-637.75C241.34,-630.8 240.9,-620.85 241.52,-612.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"245.01,-612.47 242.87,-602.09 238.07,-611.53 245.01,-612.47\"/>\n</g>\n<!-- 140447553044432&#45;&gt;140447553018320 -->\n<g id=\"edge29\" class=\"edge\">\n<title>140447553044432&#45;&gt;140447553018320</title>\n<path fill=\"none\" stroke=\"black\" d=\"M253.17,-637.75C254.66,-630.8 255.1,-620.85 254.48,-612.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"257.93,-611.53 253.13,-602.09 250.99,-612.47 257.93,-611.53\"/>\n</g>\n<!-- 140447553042080 -->\n<g id=\"node15\" class=\"node\">\n<title>140447553042080</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"308.5,-712 147.5,-712 147.5,-693 308.5,-693 308.5,-712\"/>\n<text text-anchor=\"middle\" x=\"228\" y=\"-700\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackwardBackward0</text>\n</g>\n<!-- 140447553042080&#45;&gt;140447553044432 -->\n<g id=\"edge13\" class=\"edge\">\n<title>140447553042080&#45;&gt;140447553044432</title>\n<path fill=\"none\" stroke=\"black\" d=\"M231.3,-692.75C233.98,-685.65 237.85,-675.4 241.19,-666.56\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"244.5,-667.68 244.76,-657.09 237.95,-665.21 244.5,-667.68\"/>\n</g>\n<!-- 140447553019232&#45;&gt;140447553042080 -->\n<g id=\"edge14\" class=\"edge\">\n<title>140447553019232&#45;&gt;140447553042080</title>\n<path fill=\"none\" stroke=\"black\" d=\"M228,-753.79C228,-745.6 228,-733.06 228,-722.55\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"231.5,-722.24 228,-712.24 224.5,-722.24 231.5,-722.24\"/>\n</g>\n<!-- 140447553019088 -->\n<g id=\"node17\" class=\"node\">\n<title>140447553019088</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"381.5,-834 292.5,-834 292.5,-815 381.5,-815 381.5,-834\"/>\n<text text-anchor=\"middle\" x=\"337\" y=\"-822\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140447553019088&#45;&gt;140447553044432 -->\n<g id=\"edge19\" class=\"edge\">\n<title>140447553019088&#45;&gt;140447553044432</title>\n<path fill=\"none\" stroke=\"black\" d=\"M338.08,-814.72C340.35,-792.38 343.51,-732.4 318,-693 309.03,-679.15 294.07,-668.79 280.28,-661.56\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"281.53,-658.27 271.01,-657.05 278.47,-664.57 281.53,-658.27\"/>\n</g>\n<!-- 140447553019088&#45;&gt;140447553019232 -->\n<g id=\"edge16\" class=\"edge\">\n<title>140447553019088&#45;&gt;140447553019232</title>\n<path fill=\"none\" stroke=\"black\" d=\"M320.92,-814.79C302.94,-805.07 273.63,-789.2 252.73,-777.89\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"254.26,-774.73 243.79,-773.05 250.92,-780.89 254.26,-774.73\"/>\n</g>\n<!-- 140447553018464 -->\n<g id=\"node18\" class=\"node\">\n<title>140447553018464</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"466.5,-894.5 365.5,-894.5 365.5,-875.5 466.5,-875.5 466.5,-894.5\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-882.5\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553018464&#45;&gt;140447553019088 -->\n<g id=\"edge17\" class=\"edge\">\n<title>140447553018464&#45;&gt;140447553019088</title>\n<path fill=\"none\" stroke=\"black\" d=\"M404.34,-875.37C391.86,-866.12 371.82,-851.28 356.84,-840.19\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"358.82,-837.31 348.7,-834.17 354.66,-842.93 358.82,-837.31\"/>\n</g>\n<!-- 140447553043328 -->\n<g id=\"node28\" class=\"node\">\n<title>140447553043328</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"500.5,-492 411.5,-492 411.5,-473 500.5,-473 500.5,-492\"/>\n<text text-anchor=\"middle\" x=\"456\" y=\"-480\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140447553018464&#45;&gt;140447553043328 -->\n<g id=\"edge31\" class=\"edge\">\n<title>140447553018464&#45;&gt;140447553043328</title>\n<path fill=\"none\" stroke=\"black\" d=\"M426.98,-875.42C448.69,-857.4 495,-813.25 495,-764.5 495,-764.5 495,-764.5 495,-591.5 495,-557.92 478.12,-521.78 466.57,-500.98\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"469.46,-498.98 461.42,-492.08 463.4,-502.49 469.46,-498.98\"/>\n</g>\n<!-- 140447553148144 -->\n<g id=\"node19\" class=\"node\">\n<title>140447553148144</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"443,-966 389,-966 389,-936 443,-936 443,-966\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-954\" font-family=\"monospace\" font-size=\"10.00\">x</text>\n<text text-anchor=\"middle\" x=\"416\" y=\"-943\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553148144&#45;&gt;140447553018464 -->\n<g id=\"edge18\" class=\"edge\">\n<title>140447553148144&#45;&gt;140447553018464</title>\n<path fill=\"none\" stroke=\"black\" d=\"M416,-935.8C416,-926.7 416,-914.79 416,-904.9\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"419.5,-904.84 416,-894.84 412.5,-904.84 419.5,-904.84\"/>\n</g>\n<!-- 140447553041456 -->\n<g id=\"node20\" class=\"node\">\n<title>140447553041456</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"264.5,-382 175.5,-382 175.5,-363 264.5,-363 264.5,-382\"/>\n<text text-anchor=\"middle\" x=\"220\" y=\"-370\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553041456&#45;&gt;140447553041312 -->\n<g id=\"edge20\" class=\"edge\">\n<title>140447553041456&#45;&gt;140447553041312</title>\n<path fill=\"none\" stroke=\"black\" d=\"M214.72,-362.75C210.29,-355.42 203.84,-344.73 198.38,-335.7\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"201.35,-333.84 193.19,-327.09 195.36,-337.46 201.35,-333.84\"/>\n</g>\n<!-- 140447553041360 -->\n<g id=\"node21\" class=\"node\">\n<title>140447553041360</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"290.5,-437 195.5,-437 195.5,-418 290.5,-418 290.5,-437\"/>\n<text text-anchor=\"middle\" x=\"243\" y=\"-425\" font-family=\"monospace\" font-size=\"10.00\">SqrtBackward0</text>\n</g>\n<!-- 140447553041360&#45;&gt;140447553041456 -->\n<g id=\"edge21\" class=\"edge\">\n<title>140447553041360&#45;&gt;140447553041456</title>\n<path fill=\"none\" stroke=\"black\" d=\"M239.2,-417.75C236.09,-410.57 231.58,-400.18 227.71,-391.27\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"230.92,-389.87 223.73,-382.09 224.5,-392.66 230.92,-389.87\"/>\n</g>\n<!-- 140447553015920 -->\n<g id=\"node22\" class=\"node\">\n<title>140447553015920</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"288.5,-492 199.5,-492 199.5,-473 288.5,-473 288.5,-492\"/>\n<text text-anchor=\"middle\" x=\"244\" y=\"-480\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n</g>\n<!-- 140447553015920&#45;&gt;140447553041360 -->\n<g id=\"edge22\" class=\"edge\">\n<title>140447553015920&#45;&gt;140447553041360</title>\n<path fill=\"none\" stroke=\"black\" d=\"M243.83,-472.75C243.7,-465.8 243.52,-455.85 243.35,-447.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"246.85,-447.02 243.16,-437.09 239.85,-447.15 246.85,-447.02\"/>\n</g>\n<!-- 140447553018560 -->\n<g id=\"node23\" class=\"node\">\n<title>140447553018560</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"288.5,-547 199.5,-547 199.5,-528 288.5,-528 288.5,-547\"/>\n<text text-anchor=\"middle\" x=\"244\" y=\"-535\" font-family=\"monospace\" font-size=\"10.00\">DivBackward0</text>\n</g>\n<!-- 140447553018560&#45;&gt;140447553015920 -->\n<g id=\"edge23\" class=\"edge\">\n<title>140447553018560&#45;&gt;140447553015920</title>\n<path fill=\"none\" stroke=\"black\" d=\"M244,-527.75C244,-520.8 244,-510.85 244,-502.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"247.5,-502.09 244,-492.09 240.5,-502.09 247.5,-502.09\"/>\n</g>\n<!-- 140447553018320&#45;&gt;140447553018560 -->\n<g id=\"edge24\" class=\"edge\">\n<title>140447553018320&#45;&gt;140447553018560</title>\n<path fill=\"none\" stroke=\"black\" d=\"M247.34,-582.75C246.82,-575.8 246.06,-565.85 245.41,-557.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"248.89,-556.8 244.65,-547.09 241.91,-557.32 248.89,-556.8\"/>\n</g>\n<!-- 140447553018272 -->\n<g id=\"node25\" class=\"node\">\n<title>140447553018272</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"420.5,-657 331.5,-657 331.5,-638 420.5,-638 420.5,-657\"/>\n<text text-anchor=\"middle\" x=\"376\" y=\"-645\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447553018272&#45;&gt;140447553018320 -->\n<g id=\"edge25\" class=\"edge\">\n<title>140447553018272&#45;&gt;140447553018320</title>\n<path fill=\"none\" stroke=\"black\" d=\"M355.43,-637.98C334.57,-629.34 302.03,-615.87 278.23,-606.02\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"279.52,-602.76 268.94,-602.17 276.84,-609.23 279.52,-602.76\"/>\n</g>\n<!-- 140447553018944 -->\n<g id=\"node26\" class=\"node\">\n<title>140447553018944</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"466.5,-712 365.5,-712 365.5,-693 466.5,-693 466.5,-712\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-700\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553018944&#45;&gt;140447553018272 -->\n<g id=\"edge26\" class=\"edge\">\n<title>140447553018944&#45;&gt;140447553018272</title>\n<path fill=\"none\" stroke=\"black\" d=\"M409.39,-692.75C403.74,-685.26 395.46,-674.28 388.55,-665.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"391.3,-662.96 382.48,-657.09 385.71,-667.18 391.3,-662.96\"/>\n</g>\n<!-- 140450290824272 -->\n<g id=\"node27\" class=\"node\">\n<title>140450290824272</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"443,-779 389,-779 389,-748 443,-748 443,-779\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-755\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140450290824272&#45;&gt;140447553018944 -->\n<g id=\"edge27\" class=\"edge\">\n<title>140450290824272&#45;&gt;140447553018944</title>\n<path fill=\"none\" stroke=\"black\" d=\"M416,-747.92C416,-740.22 416,-730.69 416,-722.43\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"419.5,-722.25 416,-712.25 412.5,-722.25 419.5,-722.25\"/>\n</g>\n<!-- 140447553043328&#45;&gt;140447553043424 -->\n<g id=\"edge30\" class=\"edge\">\n<title>140447553043328&#45;&gt;140447553043424</title>\n<path fill=\"none\" stroke=\"black\" d=\"M450.44,-472.94C439.45,-455.18 416,-412.69 416,-373.5 416,-373.5 416,-373.5 416,-261.5 416,-204.49 352.72,-163.93 314.12,-144.49\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"315.42,-141.23 304.9,-140.01 312.36,-147.53 315.42,-141.23\"/>\n</g>\n</g>\n</svg>\n"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net = Net()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)\n",
    "y = torch.tensor(1.0)\n",
    "\n",
    "optim = torchopt.MetaAdam(net, lr=1.0, moment_requires_grad=True)\n",
    "\n",
    "net_state_0 = torchopt.extract_state_dict(net, enable_visual=True, visual_prefix='step0.')\n",
    "inner_loss = F.mse_loss(net(x), y)\n",
    "optim.step(inner_loss)\n",
    "net_state_1 = torchopt.extract_state_dict(net, enable_visual=True, visual_prefix='step1.')\n",
    "\n",
    "outer_loss = F.mse_loss(net(x), y)\n",
    "display(\n",
    "    torchopt.visual.make_dot(\n",
    "        outer_loss, params=[net_state_0, net_state_1, {'x': x, 'outer_loss': outer_loss}]\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can see that the additional moment terms are added into the computational graph when we set `moment_requires_grad=True`."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Extract and Recover"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.1 Basic API\n",
    "\n",
    "We observe that how to reinitialize the inner-loop parameter in a new bi-level process vary in different meta-learning algorithms. For instance, in algorithm like Model-Agnostic Meta-Learning (MAML) ([arXiv:1703.03400](https://arxiv.org/abs/1703.03400)), every time a new task comes, we need to reset the parameters to the initial ones. In other cases such as Meta-Gradient Reinforcement Learning (MGRL) ([arXiv:1805.09801](https://arxiv.org/abs/1805.09801)), the inner-loop network parameter just inherit previous updated parameter to continue the new bi-level process.\n",
    "\n",
    "We provide the `torchopt.extract_state_dict` and `torchopt.recover_state_dict` functions to extract and restore the state of network and optimizer. By default, the extracted state dictionary is a reference (this design is for accumulating gradient of multi-task batch training, MAML for example). You can also set `by='copy'` to extract the copy of the state dictionary or set `by='deepcopy'` to have a detached copy."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a = tensor(-1.0000, grad_fn=<AddBackward0>)\n",
      "a = tensor(-1.0000, grad_fn=<AddBackward0>)\n"
     ]
    }
   ],
   "source": [
    "net = Net()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)\n",
    "\n",
    "optim = torchopt.MetaAdam(net, lr=1.0)\n",
    "\n",
    "# Get the reference of state dictionary\n",
    "init_net_state = torchopt.extract_state_dict(net, by='reference')\n",
    "init_optim_state = torchopt.extract_state_dict(optim, by='reference')\n",
    "# If set `detach_buffers=True`, the parameters are referenced as references while buffers are detached copies\n",
    "init_net_state = torchopt.extract_state_dict(net, by='reference', detach_buffers=True)\n",
    "\n",
    "# Set `copy` to get the copy of the state dictionary\n",
    "init_net_state_copy = torchopt.extract_state_dict(net, by='copy')\n",
    "init_optim_state_copy = torchopt.extract_state_dict(optim, by='copy')\n",
    "\n",
    "# Set `deepcopy` to get the detached copy of state dictionary\n",
    "init_net_state_deepcopy = torchopt.extract_state_dict(net, by='deepcopy')\n",
    "init_optim_state_deepcopy = torchopt.extract_state_dict(optim, by='deepcopy')\n",
    "\n",
    "# Conduct 2 inner-loop optimization\n",
    "for i in range(2):\n",
    "    inner_loss = net(x)\n",
    "    optim.step(inner_loss)\n",
    "\n",
    "print(f'a = {net.a!r}')\n",
    "\n",
    "# Recover and reconduct 2 inner-loop optimization\n",
    "torchopt.recover_state_dict(net, init_net_state)\n",
    "torchopt.recover_state_dict(optim, init_optim_state)\n",
    "\n",
    "for i in range(2):\n",
    "    inner_loss = net(x)\n",
    "    optim.step(inner_loss)\n",
    "\n",
    "print(f'a = {net.a!r}')  # the same result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 2.2 Multi-task Example with `extract_state_dict` and `recover_state_dict`\n",
    "\n",
    "Let's move to another more complex setting. Meta-Learning algorithms always fix network on several different tasks and accumulate outer loss of each task to the meta-gradient.\n",
    "\n",
    "Assume $x$ is a meta-parameter and $a$ is a normal parameter. We firstly update $a$ use inner loss $\\mathcal{L}_1^{\\textrm{in}} = a_0 \\cdot x^2$ to $a_1$. Then we use $a_1$ to compute the outer loss $\\mathcal{L}_1^{\\textrm{out}} = a_1 \\cdot x^2$ and backpropagate it. Then we use $a_0$ to compute the inner loss $\\mathcal{L}_2^{\\textrm{in}} = a_0 \\cdot x$ and update $a_0$ to $a_2 = a_0 - \\eta \\, \\frac{\\partial \\mathcal{L}_2^{\\textrm{in}}}{\\partial a_0} = a_0 - \\eta \\, x$. Then we compute outer loss $\\mathcal{L}_2^{\\textrm{out}} = a_2 \\cdot x$ and backpropagate it. So the accumulated meta-gradient would be:\n",
    "\n",
    "$$\n",
    "\\begin{split}\n",
    "        \\frac{\\partial \\mathcal{L}_1^{\\textrm{out}}}{\\partial x} + \\frac{\\partial \\mathcal{L}_2^{\\textrm{out}}}{\\partial x}\n",
    "    & = (- 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x) + \\frac{\\partial (a_2 \\cdot x)}{\\partial x} \\\\\n",
    "    & = (- 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x) + (\\frac{\\partial a_2}{\\partial x} \\cdot x + a_2) \\\\\n",
    "    & = (- 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x) + [\\frac{\\partial (a_0 - \\eta \\, x)}{\\partial x} \\cdot x + (a_0 - \\eta \\, x)] \\\\\n",
    "    & = (- 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x) + [(- \\eta) \\cdot x + (a_0 - \\eta \\, x)] \\\\\n",
    "    & = (- 4 \\, \\eta \\, x^3 + 2 \\, a_0 \\, x) + (- 2 \\, \\eta \\, x + a_0)\n",
    "\\end{split}\n",
    "$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's define the network and variables first."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net2Tasks(nn.Module):\n",
    "    def __init__(self):\n",
    "        super().__init__()\n",
    "        self.a = nn.Parameter(torch.tensor(1.0), requires_grad=True)\n",
    "\n",
    "    def task1(self, x):\n",
    "        return self.a * x**2\n",
    "\n",
    "    def task2(self, x):\n",
    "        return self.a * x\n",
    "\n",
    "\n",
    "net = Net2Tasks()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)\n",
    "\n",
    "optim = torchopt.MetaSGD(net, lr=1.0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Once we call `step` method of `MetaOptimizer`, the parameters of the network would be changed. We should use `torchopt.extract_state_dict` to extract state and use `torchopt.recover_state_dict` to recover the state. Note that if we use optimizers that have momentum buffers, we should also extract and recover them, vanilla SGD does not have momentum buffers so code `init_optim_state = torchopt.extract_state_dict(optim)` and `torchopt.recover_state_dict(optim, init_optim_state)` have no effect."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "init_optim_state = ((EmptyState(),),)\n",
      "Task 1: x.grad = tensor(-28.)\n",
      "Accumulated: x.grad = tensor(-31.)\n"
     ]
    }
   ],
   "source": [
    "# Get the reference of state dictionary\n",
    "init_net_state = torchopt.extract_state_dict(net, by='reference')\n",
    "init_optim_state = torchopt.extract_state_dict(optim, by='reference')\n",
    "# The `state_dict` is empty for vanilla SGD optimizer\n",
    "print(f'init_optim_state = {init_optim_state!r}')\n",
    "\n",
    "inner_loss_1 = net.task1(x)\n",
    "optim.step(inner_loss_1)\n",
    "outer_loss_1 = net.task1(x)\n",
    "outer_loss_1.backward()\n",
    "print(f'Task 1: x.grad = {x.grad!r}')\n",
    "\n",
    "torchopt.recover_state_dict(net, init_net_state)\n",
    "torchopt.recover_state_dict(optim, init_optim_state)\n",
    "inner_loss_2 = net.task2(x)\n",
    "optim.step(inner_loss_2)\n",
    "outer_loss_2 = net.task2(x)\n",
    "outer_loss_2.backward()\n",
    "\n",
    "# `extract_state_dict`` extracts the reference so gradient accumulates\n",
    "# x.grad = (- 4 * lr * x^3 + 2 * a_0 * x) + (- 2 * lr * x + a_0)\n",
    "#        = (- 4 * 1 * 2^3 + 2 * 1 * 2) + (- 2 * 1 * 2 + 1)\n",
    "#        = -28 - 3\n",
    "#        = -31\n",
    "print(f'Accumulated: x.grad = {x.grad!r}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Gradient Transformation in `MetaOptimizer`"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can also use some gradient normalization tricks in our `MetaOptimizer`. In fact `MetaOptimizer` decedents like `MetaSGD` are specializations of `MetaOptimizer`. Specifically, `MetaSGD(net, lr=1.)` is `MetaOptimizer(net, alias.sgd(lr=1., moment_requires_grad=True))`, where flag `moment_requires_grad=True` means the momentums are created with flag `requires_grad=True` so the momentums will also be the part of the computation graph.\n",
    "\n",
    "In the designing of TorchOpt, we treat these functions as derivations of `combine.chain`. So we can build our own chain like `combine.chain(clip.clip_grad_norm(max_norm=1.), sgd(lr=1., requires_grad=True))` to clip the gradient and update parameters using `sgd`.\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "        \\frac{\\partial \\mathcal{L}^{\\textrm{out}}}{\\partial x}\n",
    "    & = \\frac{\\partial (a_1 \\cdot x^2)}{\\partial x} \\\\\n",
    "    & = \\frac{\\partial a_1}{\\partial x} \\cdot x^2 + a_1 \\cdot \\frac{\\partial (x^2)}{\\partial x} \\\\\n",
    "    & = \\frac{\\partial (a_0 - \\eta \\, g)}{\\partial x} \\cdot x^2 + (a_0 - \\eta \\, g) \\cdot 2 x                                  & \\qquad (g \\propto \\frac{\\partial \\mathcal{L}^{\\textrm{in}}}{\\partial a_0} = x^2, \\  {\\lVert g \\rVert}_2 \\le G_{\\max}) \\\\\n",
    "    & = \\frac{\\partial (a_0 - \\eta \\, \\beta^{-1} \\, x^2)}{\\partial x} \\cdot x^2 + (a_0 - \\eta \\, \\beta^{-1} \\, x^2) \\cdot 2 x  & \\qquad (g = \\beta^{-1} \\, x^2, \\   \\beta > 0, \\  {\\lVert g \\rVert}_2 \\le G_{\\max}) \\\\\n",
    "    & = (- \\beta^{-1} \\, \\eta \\cdot 2 x) \\cdot x^2 + (a_0 - \\beta^{-1} \\, \\eta \\, x^2) \\cdot 2 x \\\\\n",
    "    & = - 4 \\, \\beta^{-1} \\, \\eta \\, x^3 + 2 \\, a_0 \\, x\n",
    "\\end{aligned}\n",
    "$$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "x.grad = tensor(-12.0000)\n"
     ]
    }
   ],
   "source": [
    "net = Net()\n",
    "x = nn.Parameter(torch.tensor(2.0), requires_grad=True)\n",
    "\n",
    "optim_impl = torchopt.combine.chain(\n",
    "    torchopt.clip.clip_grad_norm(max_norm=2.0),\n",
    "    torchopt.sgd(lr=1.0, moment_requires_grad=True),\n",
    ")\n",
    "optim = torchopt.MetaOptimizer(net, optim_impl)\n",
    "\n",
    "inner_loss = net(x)\n",
    "optim.step(inner_loss)\n",
    "\n",
    "outer_loss = net(x)\n",
    "outer_loss.backward()\n",
    "# Since `max_norm` is 2 and the gradient is x^2, so the scale = x^2 / 2 = 2^2 / 2 = 2\n",
    "# x.grad = - 4 * lr * x^3 / scale + 2 * a_0 * x\n",
    "#        = - 4 * 1 * 2^3 / 2 + 2 * 1 * 2\n",
    "#        = -16 + 4\n",
    "#        = -12\n",
    "print(f'x.grad = {x.grad!r}')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Learning Rate Scheduler\n",
    "\n",
    "TorchOpt also provides implementation of learning rate scheduler, which can be used as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "functional_adam = torchopt.adam(\n",
    "    lr=torchopt.schedule.linear_schedule(\n",
    "        init_value=1e-3, end_value=1e-4, transition_steps=10000, transition_begin=2000\n",
    "    )\n",
    ")\n",
    "\n",
    "adam = torchopt.Adam(\n",
    "    net.parameters(),\n",
    "    lr=torchopt.schedule.linear_schedule(\n",
    "        init_value=1e-3, end_value=1e-4, transition_steps=10000, transition_begin=2000\n",
    "    ),\n",
    ")\n",
    "\n",
    "meta_adam = torchopt.MetaAdam(\n",
    "    net,\n",
    "    lr=torchopt.schedule.linear_schedule(\n",
    "        init_value=1e-3, end_value=1e-4, transition_steps=10000, transition_begin=2000\n",
    "    ),\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 5. Accelerated Optimizer\n",
    "\n",
    "Users can use accelerated optimizer by setting the `use_accelerated_op=True`. Currently we only support the Adam optimizer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Check whether the `accelerated_op` is available:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "torchopt.accelerated_op_available(torch.device('cpu'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n"
     ]
    }
   ],
   "source": [
    "torchopt.accelerated_op_available(torch.device('cuda'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<graphviz.graphs.Digraph object at 0x7fbd302aafd0>\n"
     ]
    },
    {
     "data": {
      "image/svg+xml": "<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n<!DOCTYPE svg PUBLIC \"-//W3C//DTD SVG 1.1//EN\"\n \"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd\">\n<!-- Generated by graphviz version 2.42.3 (20191010.1750)\n -->\n<!-- Title: %3 Pages: 1 -->\n<svg width=\"542pt\" height=\"778pt\"\n viewBox=\"0.00 0.00 542.00 778.00\" xmlns=\"http://www.w3.org/2000/svg\" xmlns:xlink=\"http://www.w3.org/1999/xlink\">\n<g id=\"graph0\" class=\"graph\" transform=\"scale(1 1) rotate(0) translate(4 774)\">\n<title>%3</title>\n<polygon fill=\"white\" stroke=\"transparent\" points=\"-4,4 -4,-774 538,-774 538,4 -4,4\"/>\n<!-- 140450290825712 -->\n<g id=\"node1\" class=\"node\">\n<title>140450290825712</title>\n<polygon fill=\"#caff70\" stroke=\"black\" points=\"454.5,-30 377.5,-30 377.5,0 454.5,0 454.5,-30\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-18\" font-family=\"monospace\" font-size=\"10.00\">outer_loss</text>\n<text text-anchor=\"middle\" x=\"416\" y=\"-7\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140450533650240 -->\n<g id=\"node2\" class=\"node\">\n<title>140450533650240</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"472.5,-85 359.5,-85 359.5,-66 472.5,-66 472.5,-85\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-73\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackward0</text>\n</g>\n<!-- 140450533650240&#45;&gt;140450290825712 -->\n<g id=\"edge31\" class=\"edge\">\n<title>140450533650240&#45;&gt;140450290825712</title>\n<path fill=\"none\" stroke=\"black\" d=\"M416,-65.87C416,-59.11 416,-49.35 416,-40.26\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"419.5,-40.11 416,-30.11 412.5,-40.11 419.5,-40.11\"/>\n</g>\n<!-- 140450533648560 -->\n<g id=\"node3\" class=\"node\">\n<title>140450533648560</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"460.5,-140 371.5,-140 371.5,-121 460.5,-121 460.5,-140\"/>\n<text text-anchor=\"middle\" x=\"416\" y=\"-128\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140450533648560&#45;&gt;140450533650240 -->\n<g id=\"edge1\" class=\"edge\">\n<title>140450533648560&#45;&gt;140450533650240</title>\n<path fill=\"none\" stroke=\"black\" d=\"M416,-120.75C416,-113.8 416,-103.85 416,-95.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"419.5,-95.09 416,-85.09 412.5,-95.09 419.5,-95.09\"/>\n</g>\n<!-- 140450533647456 -->\n<g id=\"node4\" class=\"node\">\n<title>140450533647456</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"336.5,-217 247.5,-217 247.5,-176 336.5,-176 336.5,-217\"/>\n<text text-anchor=\"middle\" x=\"292\" y=\"-205\" font-family=\"monospace\" font-size=\"10.00\">AddBackward0</text>\n<text text-anchor=\"middle\" x=\"292\" y=\"-194\" font-family=\"monospace\" font-size=\"10.00\">step1.a</text>\n<text text-anchor=\"middle\" x=\"292\" y=\"-183\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140450533647456&#45;&gt;140450533648560 -->\n<g id=\"edge2\" class=\"edge\">\n<title>140450533647456&#45;&gt;140450533648560</title>\n<path fill=\"none\" stroke=\"black\" d=\"M329.88,-175.95C349.47,-165.84 372.87,-153.76 390.33,-144.75\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"392.07,-147.79 399.35,-140.09 388.86,-141.57 392.07,-147.79\"/>\n</g>\n<!-- 140447435136640 -->\n<g id=\"node5\" class=\"node\">\n<title>140447435136640</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"154.5,-638 53.5,-638 53.5,-619 154.5,-619 154.5,-638\"/>\n<text text-anchor=\"middle\" x=\"104\" y=\"-626\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447435136640&#45;&gt;140450533647456 -->\n<g id=\"edge3\" class=\"edge\">\n<title>140447435136640&#45;&gt;140450533647456</title>\n<path fill=\"none\" stroke=\"black\" d=\"M86.83,-618.83C57.29,-602.54 0,-564.4 0,-513.5 0,-513.5 0,-513.5 0,-316.5 0,-265.8 152.86,-226.17 237.38,-208.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"238.27,-211.52 247.33,-206.04 236.83,-204.67 238.27,-211.52\"/>\n</g>\n<!-- 140450533648416 -->\n<g id=\"node12\" class=\"node\">\n<title>140450533648416</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"254.5,-583 165.5,-583 165.5,-564 254.5,-564 254.5,-583\"/>\n<text text-anchor=\"middle\" x=\"210\" y=\"-571\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447435136640&#45;&gt;140450533648416 -->\n<g id=\"edge11\" class=\"edge\">\n<title>140447435136640&#45;&gt;140450533648416</title>\n<path fill=\"none\" stroke=\"black\" d=\"M121.03,-618.98C137.93,-610.54 164.06,-597.47 183.64,-587.68\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"185.28,-590.77 192.66,-583.17 182.15,-584.51 185.28,-590.77\"/>\n</g>\n<!-- 140447435236512 -->\n<g id=\"node6\" class=\"node\">\n<title>140447435236512</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"133.5,-704 74.5,-704 74.5,-674 133.5,-674 133.5,-704\"/>\n<text text-anchor=\"middle\" x=\"104\" y=\"-692\" font-family=\"monospace\" font-size=\"10.00\">step0.a</text>\n<text text-anchor=\"middle\" x=\"104\" y=\"-681\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447435236512&#45;&gt;140447435136640 -->\n<g id=\"edge4\" class=\"edge\">\n<title>140447435236512&#45;&gt;140447435136640</title>\n<path fill=\"none\" stroke=\"black\" d=\"M104,-673.84C104,-666.21 104,-656.7 104,-648.45\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"107.5,-648.27 104,-638.27 100.5,-648.27 107.5,-648.27\"/>\n</g>\n<!-- 140447435136688 -->\n<g id=\"node7\" class=\"node\">\n<title>140447435136688</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"336.5,-272 247.5,-272 247.5,-253 336.5,-253 336.5,-272\"/>\n<text text-anchor=\"middle\" x=\"292\" y=\"-260\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447435136688&#45;&gt;140450533647456 -->\n<g id=\"edge5\" class=\"edge\">\n<title>140447435136688&#45;&gt;140450533647456</title>\n<path fill=\"none\" stroke=\"black\" d=\"M292,-252.87C292,-246.22 292,-236.63 292,-227.28\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"295.5,-227.01 292,-217.01 288.5,-227.01 295.5,-227.01\"/>\n</g>\n<!-- 140447554132144 -->\n<g id=\"node8\" class=\"node\">\n<title>140447554132144</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"351.5,-327 232.5,-327 232.5,-308 351.5,-308 351.5,-327\"/>\n<text text-anchor=\"middle\" x=\"292\" y=\"-315\" font-family=\"monospace\" font-size=\"10.00\">UpdatesOpBackward</text>\n</g>\n<!-- 140447554132144&#45;&gt;140447435136688 -->\n<g id=\"edge6\" class=\"edge\">\n<title>140447554132144&#45;&gt;140447435136688</title>\n<path fill=\"none\" stroke=\"black\" d=\"M292,-307.75C292,-300.8 292,-290.85 292,-282.13\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"295.5,-282.09 292,-272.09 288.5,-282.09 295.5,-282.09\"/>\n</g>\n<!-- 140447554131664 -->\n<g id=\"node9\" class=\"node\">\n<title>140447554131664</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"426.5,-388 337.5,-388 337.5,-369 426.5,-369 426.5,-388\"/>\n<text text-anchor=\"middle\" x=\"382\" y=\"-376\" font-family=\"monospace\" font-size=\"10.00\">MuOpBackward</text>\n</g>\n<!-- 140447554131664&#45;&gt;140447554132144 -->\n<g id=\"edge7\" class=\"edge\">\n<title>140447554131664&#45;&gt;140447554132144</title>\n<path fill=\"none\" stroke=\"black\" d=\"M368.72,-368.79C354.28,-359.33 330.97,-344.05 313.83,-332.81\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"315.32,-329.6 305.04,-327.05 311.49,-335.46 315.32,-329.6\"/>\n</g>\n<!-- 140447435134816 -->\n<g id=\"node10\" class=\"node\">\n<title>140447435134816</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"275.5,-455 186.5,-455 186.5,-436 275.5,-436 275.5,-455\"/>\n<text text-anchor=\"middle\" x=\"231\" y=\"-443\" font-family=\"monospace\" font-size=\"10.00\">MulBackward0</text>\n</g>\n<!-- 140447435134816&#45;&gt;140447554131664 -->\n<g id=\"edge8\" class=\"edge\">\n<title>140447435134816&#45;&gt;140447554131664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M251.05,-435.87C277,-424.7 322.42,-405.15 352.36,-392.26\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"354.06,-395.34 361.87,-388.17 351.3,-388.91 354.06,-395.34\"/>\n</g>\n<!-- 140447554131904 -->\n<g id=\"node19\" class=\"node\">\n<title>140447554131904</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"174.5,-388 85.5,-388 85.5,-369 174.5,-369 174.5,-388\"/>\n<text text-anchor=\"middle\" x=\"130\" y=\"-376\" font-family=\"monospace\" font-size=\"10.00\">NuOpBackward</text>\n</g>\n<!-- 140447435134816&#45;&gt;140447554131904 -->\n<g id=\"edge21\" class=\"edge\">\n<title>140447435134816&#45;&gt;140447554131904</title>\n<path fill=\"none\" stroke=\"black\" d=\"M217.38,-435.73C200.57,-424.92 171.77,-406.38 151.86,-393.57\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"153.71,-390.6 143.41,-388.13 149.92,-396.48 153.71,-390.6\"/>\n</g>\n<!-- 140450533648992 -->\n<g id=\"node11\" class=\"node\">\n<title>140450533648992</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"290.5,-522 129.5,-522 129.5,-503 290.5,-503 290.5,-522\"/>\n<text text-anchor=\"middle\" x=\"210\" y=\"-510\" font-family=\"monospace\" font-size=\"10.00\">MseLossBackwardBackward0</text>\n</g>\n<!-- 140450533648992&#45;&gt;140447435134816 -->\n<g id=\"edge9\" class=\"edge\">\n<title>140450533648992&#45;&gt;140447435134816</title>\n<path fill=\"none\" stroke=\"black\" d=\"M212.83,-502.73C215.95,-493.09 221.05,-477.3 225.05,-464.91\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"228.47,-465.72 228.21,-455.13 221.81,-463.57 228.47,-465.72\"/>\n</g>\n<!-- 140450533648416&#45;&gt;140450533648992 -->\n<g id=\"edge10\" class=\"edge\">\n<title>140450533648416&#45;&gt;140450533648992</title>\n<path fill=\"none\" stroke=\"black\" d=\"M210,-563.79C210,-555.6 210,-543.06 210,-532.55\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"213.5,-532.24 210,-522.24 206.5,-532.24 213.5,-532.24\"/>\n</g>\n<!-- 140450533646448 -->\n<g id=\"node13\" class=\"node\">\n<title>140450533646448</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"363.5,-638 274.5,-638 274.5,-619 363.5,-619 363.5,-638\"/>\n<text text-anchor=\"middle\" x=\"319\" y=\"-626\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140450533646448&#45;&gt;140447435134816 -->\n<g id=\"edge15\" class=\"edge\">\n<title>140450533646448&#45;&gt;140447435134816</title>\n<path fill=\"none\" stroke=\"black\" d=\"M319.92,-618.81C321.83,-596.71 324.2,-537.21 300,-497 290.46,-481.14 273.84,-468.75 259.31,-460.21\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"260.6,-456.93 250.15,-455.16 257.21,-463.06 260.6,-456.93\"/>\n</g>\n<!-- 140450533646448&#45;&gt;140450533648416 -->\n<g id=\"edge12\" class=\"edge\">\n<title>140450533646448&#45;&gt;140450533648416</title>\n<path fill=\"none\" stroke=\"black\" d=\"M301.49,-618.98C284.03,-610.5 256.99,-597.35 236.83,-587.54\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"238.36,-584.4 227.83,-583.17 235.29,-590.69 238.36,-584.4\"/>\n</g>\n<!-- 140447553018176 -->\n<g id=\"node14\" class=\"node\">\n<title>140447553018176</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"423.5,-698.5 322.5,-698.5 322.5,-679.5 423.5,-679.5 423.5,-698.5\"/>\n<text text-anchor=\"middle\" x=\"373\" y=\"-686.5\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447553018176&#45;&gt;140450533646448 -->\n<g id=\"edge13\" class=\"edge\">\n<title>140447553018176&#45;&gt;140450533646448</title>\n<path fill=\"none\" stroke=\"black\" d=\"M365.03,-679.37C356.9,-670.55 344.07,-656.66 334.02,-645.77\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"336.36,-643.14 327,-638.17 331.21,-647.89 336.36,-643.14\"/>\n</g>\n<!-- 140447435135536 -->\n<g id=\"node25\" class=\"node\">\n<title>140447435135536</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"508.5,-583 419.5,-583 419.5,-564 508.5,-564 508.5,-583\"/>\n<text text-anchor=\"middle\" x=\"464\" y=\"-571\" font-family=\"monospace\" font-size=\"10.00\">PowBackward0</text>\n</g>\n<!-- 140447553018176&#45;&gt;140447435135536 -->\n<g id=\"edge30\" class=\"edge\">\n<title>140447553018176&#45;&gt;140447435135536</title>\n<path fill=\"none\" stroke=\"black\" d=\"M379.84,-679.47C394.86,-660.74 430.95,-615.72 450.64,-591.16\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"453.56,-593.12 457.08,-583.13 448.09,-588.74 453.56,-593.12\"/>\n</g>\n<!-- 140447553045424 -->\n<g id=\"node15\" class=\"node\">\n<title>140447553045424</title>\n<polygon fill=\"lightblue\" stroke=\"black\" points=\"400,-770 346,-770 346,-740 400,-740 400,-770\"/>\n<text text-anchor=\"middle\" x=\"373\" y=\"-758\" font-family=\"monospace\" font-size=\"10.00\">x</text>\n<text text-anchor=\"middle\" x=\"373\" y=\"-747\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553045424&#45;&gt;140447553018176 -->\n<g id=\"edge14\" class=\"edge\">\n<title>140447553045424&#45;&gt;140447553018176</title>\n<path fill=\"none\" stroke=\"black\" d=\"M373,-739.8C373,-730.7 373,-718.79 373,-708.9\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"376.5,-708.84 373,-698.84 369.5,-708.84 376.5,-708.84\"/>\n</g>\n<!-- 140447435136592 -->\n<g id=\"node16\" class=\"node\">\n<title>140447435136592</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"505.5,-455 404.5,-455 404.5,-436 505.5,-436 505.5,-455\"/>\n<text text-anchor=\"middle\" x=\"455\" y=\"-443\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140447435136592&#45;&gt;140447554131664 -->\n<g id=\"edge16\" class=\"edge\">\n<title>140447435136592&#45;&gt;140447554131664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M445.15,-435.73C433.44,-425.31 413.68,-407.71 399.38,-394.97\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"401.49,-392.16 391.69,-388.13 396.83,-397.39 401.49,-392.16\"/>\n</g>\n<!-- 140447552973856 -->\n<g id=\"node17\" class=\"node\">\n<title>140447552973856</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"442,-528 388,-528 388,-497 442,-497 442,-528\"/>\n<text text-anchor=\"middle\" x=\"415\" y=\"-504\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447552973856&#45;&gt;140447554131664 -->\n<g id=\"edge19\" class=\"edge\">\n<title>140447552973856&#45;&gt;140447554131664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M408.59,-496.72C404.49,-486.78 399.34,-473.29 396,-461 390.26,-439.84 386.38,-414.91 384.19,-398.25\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"387.65,-397.68 382.94,-388.18 380.7,-398.54 387.65,-397.68\"/>\n</g>\n<!-- 140447552973856&#45;&gt;140447435136592 -->\n<g id=\"edge17\" class=\"edge\">\n<title>140447552973856&#45;&gt;140447435136592</title>\n<path fill=\"none\" stroke=\"black\" d=\"M424.08,-496.75C430.15,-486.89 438.16,-473.87 444.5,-463.56\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"447.5,-465.37 449.76,-455.02 441.54,-461.7 447.5,-465.37\"/>\n</g>\n<!-- 140447553044544 -->\n<g id=\"node18\" class=\"node\">\n<title>140447553044544</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"348,-461 294,-461 294,-430 348,-430 348,-461\"/>\n<text text-anchor=\"middle\" x=\"321\" y=\"-437\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553044544&#45;&gt;140447554131664 -->\n<g id=\"edge18\" class=\"edge\">\n<title>140447553044544&#45;&gt;140447554131664</title>\n<path fill=\"none\" stroke=\"black\" d=\"M334.84,-429.75C344.48,-419.48 357.31,-405.81 367.16,-395.31\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"369.71,-397.71 374.01,-388.02 364.61,-392.92 369.71,-397.71\"/>\n</g>\n<!-- 140447553044544&#45;&gt;140447554131904 -->\n<g id=\"edge24\" class=\"edge\">\n<title>140447553044544&#45;&gt;140447554131904</title>\n<path fill=\"none\" stroke=\"black\" d=\"M293.95,-433.38C290.94,-432.21 287.91,-431.06 285,-430 245.4,-415.59 199.43,-400.86 167.9,-391.06\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"168.78,-387.67 158.2,-388.05 166.72,-394.35 168.78,-387.67\"/>\n</g>\n<!-- 140447554131904&#45;&gt;140447554132144 -->\n<g id=\"edge20\" class=\"edge\">\n<title>140447554131904&#45;&gt;140447554132144</title>\n<path fill=\"none\" stroke=\"black\" d=\"M153.56,-368.92C181.27,-358.83 227.47,-342 258.81,-330.59\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"260.22,-333.8 268.42,-327.09 257.82,-327.22 260.22,-333.8\"/>\n</g>\n<!-- 140450533648896 -->\n<g id=\"node20\" class=\"node\">\n<title>140450533648896</title>\n<polygon fill=\"lightgrey\" stroke=\"black\" points=\"129.5,-455 28.5,-455 28.5,-436 129.5,-436 129.5,-455\"/>\n<text text-anchor=\"middle\" x=\"79\" y=\"-443\" font-family=\"monospace\" font-size=\"10.00\">AccumulateGrad</text>\n</g>\n<!-- 140450533648896&#45;&gt;140447554131904 -->\n<g id=\"edge22\" class=\"edge\">\n<title>140450533648896&#45;&gt;140447554131904</title>\n<path fill=\"none\" stroke=\"black\" d=\"M85.88,-435.73C93.83,-425.6 107.1,-408.69 117.01,-396.06\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"119.81,-398.16 123.23,-388.13 114.3,-393.83 119.81,-398.16\"/>\n</g>\n<!-- 140447435236752 -->\n<g id=\"node21\" class=\"node\">\n<title>140447435236752</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"108,-528 54,-528 54,-497 108,-497 108,-528\"/>\n<text text-anchor=\"middle\" x=\"81\" y=\"-504\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447435236752&#45;&gt;140447554131904 -->\n<g id=\"edge25\" class=\"edge\">\n<title>140447435236752&#45;&gt;140447554131904</title>\n<path fill=\"none\" stroke=\"black\" d=\"M105.47,-496.99C117.69,-488.27 131.27,-475.95 138,-461 147.09,-440.79 142.15,-414.98 136.89,-397.88\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"140.08,-396.38 133.53,-388.05 133.45,-398.64 140.08,-396.38\"/>\n</g>\n<!-- 140447435236752&#45;&gt;140450533648896 -->\n<g id=\"edge23\" class=\"edge\">\n<title>140447435236752&#45;&gt;140450533648896</title>\n<path fill=\"none\" stroke=\"black\" d=\"M80.55,-496.75C80.26,-487.39 79.88,-475.19 79.57,-465.16\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"83.07,-464.91 79.26,-455.02 76.07,-465.12 83.07,-464.91\"/>\n</g>\n<!-- 140447553045904 -->\n<g id=\"node22\" class=\"node\">\n<title>140447553045904</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"247,-394 193,-394 193,-363 247,-363 247,-394\"/>\n<text text-anchor=\"middle\" x=\"220\" y=\"-370\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447553045904&#45;&gt;140447554132144 -->\n<g id=\"edge26\" class=\"edge\">\n<title>140447553045904&#45;&gt;140447554132144</title>\n<path fill=\"none\" stroke=\"black\" d=\"M237.8,-362.92C248.68,-354 262.58,-342.61 273.58,-333.6\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"275.81,-336.29 281.32,-327.25 271.37,-330.88 275.81,-336.29\"/>\n</g>\n<!-- 140447435237152 -->\n<g id=\"node23\" class=\"node\">\n<title>140447435237152</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"319,-394 265,-394 265,-363 319,-363 319,-394\"/>\n<text text-anchor=\"middle\" x=\"292\" y=\"-370\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447435237152&#45;&gt;140447554132144 -->\n<g id=\"edge27\" class=\"edge\">\n<title>140447435237152&#45;&gt;140447554132144</title>\n<path fill=\"none\" stroke=\"black\" d=\"M292,-362.92C292,-355.22 292,-345.69 292,-337.43\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"295.5,-337.25 292,-327.25 288.5,-337.25 295.5,-337.25\"/>\n</g>\n<!-- 140447435237232 -->\n<g id=\"node24\" class=\"node\">\n<title>140447435237232</title>\n<polygon fill=\"orange\" stroke=\"black\" points=\"499,-394 445,-394 445,-363 499,-363 499,-394\"/>\n<text text-anchor=\"middle\" x=\"472\" y=\"-370\" font-family=\"monospace\" font-size=\"10.00\">()</text>\n</g>\n<!-- 140447435237232&#45;&gt;140447554132144 -->\n<g id=\"edge28\" class=\"edge\">\n<title>140447435237232&#45;&gt;140447554132144</title>\n<path fill=\"none\" stroke=\"black\" d=\"M444.97,-366.33C441.95,-365.17 438.92,-364.04 436,-363 401.26,-350.66 361.14,-338.42 332.09,-329.92\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"332.88,-326.5 322.3,-327.07 330.93,-333.22 332.88,-326.5\"/>\n</g>\n<!-- 140447435135536&#45;&gt;140450533648560 -->\n<g id=\"edge29\" class=\"edge\">\n<title>140447435135536&#45;&gt;140450533648560</title>\n<path fill=\"none\" stroke=\"black\" d=\"M472.87,-563.95C491.7,-544.81 534,-496.3 534,-446.5 534,-446.5 534,-446.5 534,-261.5 534,-207.17 476.8,-165.48 442.05,-145.16\"/>\n<polygon fill=\"black\" stroke=\"black\" points=\"443.34,-141.87 432.91,-140 439.9,-147.96 443.34,-141.87\"/>\n</g>\n</g>\n</svg>\n"
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "net = Net().to(device='cuda')\n",
    "x = nn.Parameter(torch.tensor(2.0, device=torch.device('cuda')), requires_grad=True)\n",
    "y = torch.tensor(1.0, device=torch.device('cuda'))\n",
    "\n",
    "optim = torchopt.MetaAdam(net, lr=1.0, moment_requires_grad=True, use_accelerated_op=True)\n",
    "\n",
    "net_state_0 = torchopt.extract_state_dict(\n",
    "    net, by='reference', enable_visual=True, visual_prefix='step0.'\n",
    ")\n",
    "inner_loss = F.mse_loss(net(x), y)\n",
    "optim.step(inner_loss)\n",
    "net_state_1 = torchopt.extract_state_dict(\n",
    "    net, by='reference', enable_visual=True, visual_prefix='step1.'\n",
    ")\n",
    "\n",
    "outer_loss = F.mse_loss(net(x), y)\n",
    "display(\n",
    "    torchopt.visual.make_dot(\n",
    "        outer_loss, params=[net_state_0, net_state_1, {'x': x, 'outer_loss': outer_loss}]\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 6. Known Issues\n",
    "\n",
    "Here we record some common issues faced by users when using the meta-optimizer."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**1. Get `NaN` error when using `MetaAdam` or other meta-optimizers.**\n",
    "\n",
    "The `NaN` error is because of the numerical instability of the `Adam` in meta-learning. There exist an `sqrt` operation in `Adam`'s computation process. Backpropogating through the `Adam` operator introduces the second derivation of the `sqrt` operation, which is not numerical stable, i.e. ${\\left. \\frac{d^2 \\sqrt{x}}{{dx}^2} \\right\\rvert}_{x = 0} = \\texttt{NaN}$. You can also refer to issue [facebookresearch/higher#125](https://github.com/facebookresearch/higher/issues/125).\n",
    "\n",
    "For this problem, TorchOpt have two recommended solutions.\n",
    "\n",
    "* Put the `sqrt` operation into the whole equation, and compute the derivation of the output to the input manually. The second derivation of the `sqrt` operation will be eliminated. You can achieve this by setting the flag `use_accelerated_op=True`, you can follow the instructions in notebook [Functional Optimizer](1_Functional_Optimizer.ipynb) and Meta-Optimizer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "inner_optim = torchopt.MetaAdam(net, lr=1.0, use_accelerated_op=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "* Register hook to the first-order gradients. During the backpropagation, the NaN gradients will be set to 0, which will have a similar effect to the first solution but much slower. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "impl = torchopt.chain(torchopt.hook.register_hook(torchopt.hook.zero_nan_hook), torchopt.adam(1e-1))\n",
    "inner_optim = torchopt.MetaOptimizer(net, impl)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**2. Get `Trying to backward through the graph a second time` error when conducting multiple meta-optimization.**\n",
    "\n",
    "Please refer to the tutorial notebook [Stop Gradient](4_Stop_Gradient.ipynb) for more guidance."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.15"
  },
  "vscode": {
   "interpreter": {
    "hash": "2a8cc1ff2cbc47027bf9993941710d9ab9175f14080903d9c7c432ee63d681da"
   }
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}