Evaluation of retrieval systems is mostly limited to laboratory settings and rarely considers changes of performance over time. This article presents an evaluation of retrieval systems for internal Web site search systems between the years 2006 and 2011. A holistic evaluation methodology for real Web sites was developed which includes tests for functionality, search quality, and user interaction. Among other sites, one set of 20 Web site search systems was evaluated three times in different years and no substantial improvement could be shown. It is surprising that the communication between site and user still leads to very poor results in many cases. Overall, the quality of these search systems could be improved, and several areas for impro...