Significant wave height (SWH) is a key parameter for monitoring the state of waves. Accurate and long-term SWH forecasting is significant to maritime shipping and coastal engineering. This study proposes a transformer model based on an attention mechanism to achieve the forecasting of SWHs. The transformer model can capture the contextual information and dependencies between sequences and achieves continuous time series forecasting. Wave scale classification is carried out according to the forecasting results, and the results are compared with gated recurrent unit (GRU) and long short-term memory (LSTM) machine-learning models and the key laboratory of MArine Science and NUmerical Modeling (MASNUM) numerical wave model. The results show that the machine-learning models outperform the MASNUM within 72 h, with the transformer being the best model. For continuous 12 h, 24 h, 36 h, 48 h, 72 h, and 96 h forecasting, the average mean absolute errors (MAEs) of the test sets were, respectively, 0.139 m, 0.186 m, 0.223 m, 0.254 m, 0.302 m, and 0.329 m, and the wave scale classification accuracies were, respectively, 91.1%, 99.4%, 86%, 83.3%, 78.9%, and 77.5%. The experimental results validate that the transformer model can achieve continuous and accurate SWH forecasting, as well as accurate wave scale classification and early warning of waves, providing technical support for wave monitoring.