[Summary] A Decoder-Only Foundation Model for Time-Series Forecasting
TL;DR TimesFM is a 200M-parameter decoder-only transformer trained on ~100B timepoints. It treats time-series patches the way LLMs treat tokens. In zero-shot, it matches or beats supervised SOTA on standard benchmarks while costing a fraction of LLM-based approaches like LLMTime. Motivation Classical methods (ARIMA, ETS) fit per-series and cannot transfer across datasets. LLMTime repurposes GPT-3/LLaMA-2 as zero-shot forecasters but is expensive and underperforms supervised models. NLP and CV have foundation models, but time series is harder: no discrete vocabulary, variable context/horizon/granularity, and far less public data....