{ "cells": [ { "cell_type": "markdown", "id": "6fe78d59-1f8b-41fb-b8db-9927b8ed049e", "metadata": {}, "source": [ "\"IOAI\n", "\n", "[IOAI 2025 (Beijing, China), Individual Contest](https://ioai-official.org/china-2025)\n", "\n", "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/IOAI-official/IOAI-2025/blob/main/Individual-Contest/Antique/Antique.ipynb)" ] }, { "cell_type": "markdown", "id": "a4c6054c-d42b-4c2b-bb79-deb64d936c24", "metadata": {}, "source": [ "# Antique Painting Authentication\n", "\n", "## 1. Problem Description\n", "\n", "You have studied Artificial Intelligence for quite some time. Old friend of your father, famous archeologist and art critic, heard about this and asked for your help. You need to design an algorithm that can classify antique paintings as either authentic or replica pieces.\n", "\n", "Because professional authentication is expensive, the research team has only obtained authenticity labels for a small portion of the paintings. For the majority of samples, the authenticity remains unknown. It is known that the paintings' digital features exhibit strong structural patterns. You are tasked with leveraging all available samples — including those with unknown labels — to train a model for classifying the authenticity of antique paintings.\n", "\n", "## 2. Dataset\n", "\n", "The dataset consists of a training set, a validation set and a test set, each of them has 500 independent samples. \n", "\n", "1. **Training Set (`training_set.csv`)**:\n", "\n", " - The first five columns represent the digital features of each antique painting.\n", " - The sixth column contains the label: 1 for authentic, -1 for replica, and 0 for unknown.\n", "\n", " The training set is used for training your models and can be accessed and downloaded directly during the competition.\n", "\n", "2. **Validation Set (`validation_set.csv`)**: \n", " - These are similar to the training set format but do not contain the label column.\n", "\n", " The validation set is used to calculate the Leaderboard A score and is not directly accessible during the competition.\n", "\n", "3. **Test Set (`test_set.csv`)**: \n", " - These are similar to the training set format but do not contain the label column.\n", "\n", " The test set is used to calculate the Leaderboard B score and is not directly accessible during the competition.\n", "\n", "## 3. Task\n", "\n", "Your task is to train an appropriate model capable of predicting the authenticity of paintings in the test sets, despite the large number of unlabeled samples.\n", "\n", "## 4. Submission\n", "\n", "Contestants need to submit a notebook file named `submission.ipynb`. The file should output a zip file named `submission.zip`, which should contain the following two files:\n", "\n", "1. `submissionA.csv`: Contains the model's predicted label results on the validation set, with each line being a -1 or 1 and no header.\n", "2. `submissionB.csv`: Contains the model's predicted label results on the test set, with each line being a -1 or 1 and no header.\n", "\n", "The testing machine will read `submission.zip` and calculate the scores. The submission files must strictly follow the above format and naming; otherwise, the system will not be able to read them correctly. \n", "\n", "Details about the submission procedure are provided in the baseline notebook. Contestants are encouraged to refer to it for guidance.\n", "\n", "## 5. Score\n", "\n", "The evaluation metric will be **classification accuracy**, defined as the proportion of correctly predicted samples over the total number of evaluated samples.\n", "\n", "## 6. Baseline and Training Set\n", "\n", "- Below you can find the baseline solution.\n", "- The dataset is in `training_set` folder.\n", "- The highest score by the Scientific Committee for this task is 0.98 in Leaderboard B, this score is used for score unification.\n", "- The baseline score by the Scientific Committee for this task is 0.46 in Leaderboard B, this score is used for score unification." ] }, { "cell_type": "markdown", "id": "44bf0dce", "metadata": {}, "source": [ "### Train Your Model" ] }, { "cell_type": "code", "execution_count": null, "id": "3acb09be", "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "\n", "# 1. Get the current working directory\n", "current_dir = os.getcwd()\n", "\n", "# 2. Check if the path contains \"Individual-Contest/Antique\" and trim it to that point\n", "if \"Individual-Contest/Antique\" in current_dir:\n", " root_index = current_dir.index(\"Individual-Contest/Antique\") + len(\"Individual-Contest/Antique\")\n", " project_root = current_dir[:root_index]\n", "else:\n", " raise Exception(\"Project root directory not found. Please check the folder structure.\")\n", "\n", "# 3. Change working directory to the project root\n", "os.chdir(project_root)\n", "print(\"Working directory set to:\", os.getcwd())\n", "\n", "# 4. Add module search path (e.g., where metrics.py is located)\n", "sys.path.append(os.path.join(project_root, \"Scoring\"))" ] }, { "cell_type": "code", "execution_count": null, "id": "03dae883", "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import os\n", "from sklearn.svm import SVC\n", "\n", "TRAIN_PATH = \"./training_set/\"\n", "train = pd.read_csv(TRAIN_PATH + \"training_set.csv\")\n", "\n", "X = np.array(train.iloc[:,:5])\n", "y = np.array(train.iloc[:,5])\n", "\n", "np.random.seed(42)\n", "y[y == 0] = np.random.choice([-1, 1], size=(y == 0).sum())\n", "\n", "svm_binary_model = SVC(kernel='rbf', C=1.0, gamma='scale', random_state=42)\n", "svm_binary_model.fit(X, y)" ] }, { "cell_type": "markdown", "id": "a2049ba4", "metadata": {}, "source": [ "### Make Predictions on the Validation and Test Set" ] }, { "cell_type": "code", "execution_count": null, "id": "c69d9d92", "metadata": {}, "outputs": [], "source": [ "VAL_DATA_PATH = \"./Solution/validation_set/\"\n", "TEST_DATA_PATH = \"./Solution/test_set/\"\n", "\n", "testA = np.array(pd.read_csv(VAL_DATA_PATH + \"validation_set.csv\"))\n", "testB = np.array(pd.read_csv(TEST_DATA_PATH + \"test_set.csv\"))\n", "\n", "predA = svm_binary_model.predict(testA)\n", "predB = svm_binary_model.predict(testB)" ] }, { "cell_type": "markdown", "id": "3e2141d8", "metadata": {}, "source": [ "### Generate `submission.zip` for Submission" ] }, { "cell_type": "code", "execution_count": null, "id": "342e6ddb", "metadata": {}, "outputs": [], "source": [ "import zipfile\n", "import os\n", "\n", "submissionA = pd.DataFrame(predA)\n", "submissionA.to_csv(\"./Scoring/submissionA.csv\", index=False, header=False)\n", "\n", "submissionB = pd.DataFrame(predB)\n", "submissionB.to_csv(\"./Scoring/submissionB.csv\", index=False, header=False)\n", "\n", "files_to_zip = ['./Scoring/submissionA.csv', './Scoring/submissionB.csv']\n", "zip_filename = './Scoring/submission.zip'\n", "\n", "with zipfile.ZipFile(zip_filename, 'w') as zipf:\n", " for file in files_to_zip:\n", " zipf.write(file, os.path.basename(file))\n", "\n", "print(f'{zip_filename} is created succefully!')" ] }, { "cell_type": "markdown", "id": "25da701b", "metadata": {}, "source": [ "### Evaluate the Model Performance" ] }, { "cell_type": "code", "execution_count": null, "id": "269246ef", "metadata": {}, "outputs": [], "source": [ "%run Scoring/metrics.py" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.9" } }, "nbformat": 4, "nbformat_minor": 5 }