Using observations from aircraft, surface stations and satellite, we comprehensively evaluate multi-model simulations of carbon monoxide (CO) and ozone (O3) in the Arctic and over lower latitude emission regions, as part of the POLARCAT Model Inter-comparison Project (POLMIP). Evaluation of eleven atmospheric models with chemistry shows that they generally underestimate CO throughout the Arctic troposphere, with the largest biases found during winter and spring. Negative CO biases are also found throughout the Northern Hemisphere, with multi-model mean gross errors (9-12%) suggesting models perform similarly over Asia, North America and Europe. A multi-model annual mean tropospheric OH (10.8 ± 0.6 × 105 molec cm−3) is found to be slightly h...