Accurate early-season crop type classification is crucial for the crop production estimation and monitoring of agricultural parcels. However, the complexity of the plant growth patterns and their spatio-temporal variability present significant challenges. While current deep learning-based methods show promise in crop type classification from single- and multi-modal time series, most existing methods rely on a single modality, such as satellite optical remote sensing data or crop rotation patterns. We propose a novel approach to fuse multimodal information into a model for improved accuracy and robustness across multiple crop seasons and countries. The approach relies on three modalities used: remote sensing time series from Sentinel-2 and Landsat 8 observations, parcel crop rotation and local crop distribution. To evaluate our approach, we release a new annotated dataset of 7.4 million agricultural parcels in France (FR) and the Netherlands (NL). We associate each parcel with time-series of surface reflectance (Red and NIR) and biophysical variables (LAI, FAPAR). Additionally, we propose a new approach to automatically aggregate crop types into a hierarchical class structure for meaningful model evaluation and a novel data-augmentation technique for early-season classification. Performance of the multimodal approach was assessed at different aggregation levels in the semantic domain, yielding to various ranges of the number of classes spanning from 151 to 8 crop types or groups. It resulted in accuracy ranging from 91% to 95% for the NL dataset and from 85% to 89% for the FR dataset. Pre-training on a dataset improves transferability between countries, allowing for cross- domain and label prediction, and robustness of the performances in a few-shot setting from FR to NL, i.e., when the domain changes as per with significantly new labels. Our proposed approach outperforms comparable methods by enabling deep learning methods to use the often overlooked spatio-temporal context of parcels, resulting in increased precision and generalization capacity.