{ "cells": [ { "attachments": {}, "cell_type": "markdown", "metadata": { "id": "iJUj0TiKtGJS" }, "source": [ "# Access GrandTour Data using HuggingFace 🤗\n", "© 2025 ETH Zurich\n", " \n", " [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/leggedrobotics/grand_tour_dataset/blob/main/examples/%5B0%5D_Accessing_GrandTour_Data.ipynb)\n", "\n", "\n", "## Overview\n", "> GrandTour data is avaialable in two formats, hosted on two platforms:\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", "
Format Hosted on
\"ROSROS Bags\"RSLKleinkram
\"ZarrZARR\"HuggingHuggingFace
\n", "\n", "> This notebook explains how to download the zarr/png converted dataset hosted on Huggingface.\n", ">\n", "> \n", "> 💡 Please refer to the `examples_hugging_face/explore.ipynb` on how to use the data.\n", " \n", "## Downloading\n", "> We provide the entire dataset on HuggingFace in `.zarr`, `.png`, and `.yaml` format.\n", "> \n", "> To avoid checking in +1M individual files on the HuggingHub, we created a tar-ball `.tar` for each topic per mission.\n", "\n", "> HuggingFace has an easy-to-use Python download API called `huggingface_hub`.\n", "> It is possible to download directly from the [GrandTour HuggingFace repo UI](https://huggingface.co/leggedrobotics), but we strongly reccomend making use of `huggingface_hub`, as it manages caching files, interrupted downloads and smart fetching of updated files.\n", "\n", "> First, install `huggingface_hub` which requires you to have an HuggingFace account. You can create one for free at [huggingface.co](https://huggingface.co/)." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip is available: \u001b[0m\u001b[31;49m24.0\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m25.1.1\u001b[0m\n", "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n" ] } ], "source": [ "! pip install -q huggingface_hub # Should be already installed when following the README.md and uv installation!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Then, login using the cli. This will store authentication tokens on your PC and allow you to use the API to download data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", " _| _| _| _| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _|_|_|_| _|_| _|_|_| _|_|_|_|\n", " _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|\n", " _|_|_|_| _| _| _| _|_| _| _|_| _| _| _| _| _| _|_| _|_|_| _|_|_|_| _| _|_|_|\n", " _| _| _| _| _| _| _| _| _| _| _|_| _| _| _| _| _| _| _|\n", " _| _| _|_| _|_|_| _|_|_| _|_|_| _| _| _|_|_| _| _| _| _|_|_| _|_|_|_|\n", "\n", " A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.\n", " Setting a new token will erase the existing one.\n", " To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .\n", "Enter your token (input will not be visible): Traceback (most recent call last):\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/bin/huggingface-cli\", line 8, in \n", " sys.exit(main())\n", " ^^^^^^\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/commands/huggingface_cli.py\", line 59, in main\n", " service.run()\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/commands/user.py\", line 111, in run\n", " login(\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py\", line 101, in inner_f\n", " return f(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py\", line 31, in inner_f\n", " return f(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/_login.py\", line 130, in login\n", " interpreter_login(new_session=new_session)\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py\", line 101, in inner_f\n", " return f(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/utils/_deprecation.py\", line 31, in inner_f\n", " return f(*args, **kwargs)\n", " ^^^^^^^^^^^^^^^^^^\n", " File \"/home/jonfrey/git/grand_tour_dataset/examples_hugging_face/.venv/grandtour/lib/python3.11/site-packages/huggingface_hub/_login.py\", line 287, in interpreter_login\n", " token = getpass(\"Enter your token (input will not be visible): \")\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/usr/lib/python3.11/getpass.py\", line 77, in unix_getpass\n", " passwd = _raw_input(prompt, stream, input=input)\n", " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", " File \"/usr/lib/python3.11/getpass.py\", line 146, in _raw_input\n", " line = input.readline()\n", " ^^^^^^^^^^^^^^^^\n", " File \"\", line 319, in decode\n", "KeyboardInterrupt\n" ] } ], "source": [ "# If your notebook isn't able to take input from the command line, run this in a local terminal instead\n", "# huggingface-cli login" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Now you can download an a mission of your choice. The next tutorial - _[1] Exploring GrandTour Data_ - uses 2024-10-01-11-29-55, so we will donwload it here in anticipation." ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ff45acd359a344588e67883ed6cb7ea8", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Fetching 135 files: 0%| | 0/135 [00:00 The downloaded data will be compressed into `.tar` files, and must be extracted before it can be used. We reccomend extracting to a destination of your choice outside the huggingface cache directory:" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Data will be extracted to: /home/jonfrey/grand_tour_dataset\n" ] } ], "source": [ "from pathlib import Path\n", "\n", "# Define the destination directory\n", "dataset_folder = Path(\"~/grand_tour_dataset\").expanduser()\n", "dataset_folder.mkdir(parents=True, exist_ok=True)\n", "\n", "# Print for confirmation\n", "print(f\"Data will be extracted to: {dataset_folder}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> Define a `.tar` extractor helper function and extract the files:" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "/home/jonfrey/grand_tour_dataset\n", "Moved data from /home/jonfrey/.cache/huggingface/hub/datasets--leggedrobotics--grand_tour_dataset/snapshots/a6c80c525d6690a1060204a9cd2bc4abaf7eae78 to /home/jonfrey/grand_tour_dataset !\n" ] } ], "source": [ "import os\n", "import shutil\n", "import tarfile\n", "import re\n", "\n", "def move_dataset(cache, dataset_folder, allow_patterns=[\"*\"]):\n", "\n", " def convert_glob_patterns_to_regex(glob_patterns):\n", " regex_parts = []\n", " for pat in glob_patterns:\n", " # Escape regex special characters except for * and ?\n", " pat = re.escape(pat)\n", " # Convert escaped glob wildcards to regex equivalents\n", " pat = pat.replace(r'\\*', '.*').replace(r'\\?', '.')\n", " # Make sure it matches full paths\n", " regex_parts.append(f\".*{pat}$\")\n", " \n", " # Join with |\n", " combined = \"|\".join(regex_parts)\n", " return re.compile(combined)\n", " \n", " pattern = convert_glob_patterns_to_regex(allow_patterns)\n", " files = [f for f in Path(cache).rglob(\"*\") if pattern.match(str(f))]\n", " tar_files = [f for f in files if f.suffix == \".tar\" ]\n", " \n", " for source_path in tar_files:\n", " dest_path = dataset_folder / source_path.relative_to(cache)\n", " dest_path.parent.mkdir(parents=True, exist_ok=True)\n", " \n", " try:\n", " with tarfile.open(source_path, \"r\") as tar:\n", " tar.extractall(path=dest_path.parent)\n", " except tarfile.ReadError as e:\n", " print(f\"Error opening or extracting tar file '{source_path}': {e}\")\n", " except Exception as e:\n", " print(f\"An unexpected error occurred while processing {source_path}: {e}\")\n", " \n", " other_files = [f for f in files if not f.suffix == \".tar\" and f.is_file()]\n", " for source_path in other_files:\n", " dest_path = dataset_folder / source_path.relative_to(cache)\n", " dest_path.parent.mkdir(parents=True, exist_ok=True)\n", " shutil.copy2(source_path,dest_path)\n", "\n", " print(f\"Moved data from {cache} to {dataset_folder} !\")\n", "\n", "print(dataset_folder)\n", "move_dataset(hugging_face_data_cache_path, dataset_folder, allow_patterns=allow_patterns)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> You should now be able to load the dataset in `.zarr` format an inspect the contents:" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['adis_imu', 'dlio_map_odometry', 'zed2i_depth_confidence_image', 'zed2i_right_images', 'zed2i_depth_image', 'zed2i_left_images', 'alphasense_right', 'depth_camera_rear_lower', 'stim320_gyroscope_temperature', 'dlio_hesai_points_undistorted', 'ap20_imu', 'hesai_undistorted', 'dlio_tf', 'livox_imu', 'gnss_raw_cpt7_ie_tc', 'anymal_imu', 'anymal_state_battery', 'hesai', 'stim320_accelerometer_temperature', 'cpt7_ie_rt_tf', 'depth_camera_front_lower', 'velodyne', 'prism_position', 'velodyne_undist', 'cpt7_ie_rt_odometry', 'depth_camera_left', 'alphasense_front_center', 'anymal_state_actuator', 'cpt7_ie_tc_tf', 'depth_camera_rear_upper', 'livox_points_undistorted', 'anymal_command_twist', 'cpt7_imu', 'depth_camera_front_upper', 'anymal_state_state_estimator', 'gnss_raw_cpt7_ie_rt', 'alphasense_front_left', 'livox_points', 'zed2i_vio_map', 'alphasense_left', 'stim320_imu', 'alphasense_front_right', 'cpt7_ie_tc_odometry', 'hdr_front', 'alphasense_imu', 'navsatfix_cpt7_ie_tc', 'hdr_left', 'depth_camera_right', 'anymal_state_odometry', 'hdr_right']\n" ] } ], "source": [ "import zarr.storage\n", "\n", "store = zarr.storage.LocalStore(dataset_folder / mission / \"data\")\n", "root = zarr.group(store=store)\n", "\n", "print([k for k in root.keys()])" ] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.12" } }, "nbformat": 4, "nbformat_minor": 4 }