ITFormer
This repository provides the official open-source implementation of ITFormer (Instruct Time Transformer), a novel framework for temporal-textual multimodal question answering (QA).
Overview
ITFormer (Instruct Time Transformer) is a state-of-the-art model for temporal-textual multimodal question answering. This repository provides the official open-source implementation with inference and training scripts.
Our work introduces a large-scale multitask dataset (EngineMT-QA) and demonstrates ITFormer's superior performance in bridging time series data with natural language understanding. Remarkably, our 0.5B model is lightweight and efficient while achieving strong performance.
Features
- 📊 Pre-trained Models: Ready-to-use ITFormer models (0.5B, 3B, 7B) available on Hugging Face.
- 🚀 Lightweight & Efficient: The 0.5B model offers strong temporal QA capabilities and easy deployment.
- 🎯 One-Click Scripts: Automated scripts for pre-training, SFT, and parallel inference.
- 📈 High Performance: State-of-the-art results on temporal-textual QA benchmarks.
- 🌐 Distributed Support: Fully compatible with
acceleratefor multi-GPU training and inference.
Quick Start
1. Organize Directory Structure
After downloading models and datasets, organize your files as follows:
ITFormer-ICML25/
├── dataset/
│ └── datasets/ # Place EngineMT-QA dataset files here
│ ├── time_series_data.h5
│ ├── train_qa.jsonl
│ └── test_qa.jsonl
├── LLM/ # Base Qwen2.5-Instruct models
├── checkpoints/
│ └── ITFormer-0.5B/ # ITFormer model checkpoints
├── scripts/ # One-click automation scripts
│ ├── run_pretrain.sh
│ ├── run_sft.sh
│ └── run_inference.sh
├── accelerate_config.yaml # Configuration for distributed execution
└── yaml/
└── infer.yaml # Inference configuration
2. Run Inference
We now support parallel inference using accelerate. This automatically aggregates results from multiple GPUs.
# Using the automated script (Recommended)
bash scripts/run_inference.sh
# Or launch manually via accelerate
accelerate launch --config_file accelerate_config.yaml inference.py --config yaml/infer.yamlThe inference script will:
- Load ITFormer and the corresponding Qwen2.5-Instruct.
- Distribute data across all available GPUs.
- Aggregate and save results to
inference_results/andoutput_result_all.json.
Training
We provide a streamlined training pipeline using accelerate. Ensure your accelerate_config.yaml is properly configured for your hardware.
A. Pre-training (Time-Series Encoder)
Stage A focuses on pre-training the TimeSeriesEncoder using masked modeling.
# One-click pre-training
bash scripts/run_pretrain.shB. Supervised Fine-Tuning (SFT)
Stage B performs end-to-end SFT, bridging the time-series encoder with the LLM via ITFormer.
# One-click SFT (Requires pre-trained ts_encoder weights)
bash scripts/run_sft.shKey Parameters in SFT:
--it_d_model,--it_n_heads,--it_layers: Configuration for the ITFormer module.--load_ts_encoder: Path to the weights generated in Stage A.--llm_model_path: Path to the base Qwen2.5-Instruct model.
Model Architecture
ITFormer leverages an Instruction-aware Time Series Transformer to align temporal features with textual queries before feeding them into a Large Language Model. The framework is designed to be parameter-efficient, freezing the LLM and TS Encoder during SFT while training only the ITFormer and projection layers.
Citation
If you use this code in your research, please cite:
@inproceedings{wang2025itformer,
title={ITFormer: Bridging Time Series and Natural Language for Multi-Modal QA with Large-Scale Multitask Dataset},
author={Yilin Wang and Peixuan Lei and Jie Song and Yuzhe Hao and Tao Chen and Yuxuan Zhang and Lei Jia and Yuanxiang Li and Zhongyu Wei},
booktitle={International Conference on Machine Learning (ICML)},
year={2025}
}License
MIT License — see the LICENSE file for details.