Previous research efforts for optimizing energy usage of HVAC systems require either mathematical models of HVAC systems to be built or they require substantial historical operational data for learning the optimal operational settings. We introduce a model-free control policy that begins learning optimal settings with no prior historical data and optimizes HVAC operations. The control policy is an adaptive hybrid metaheuristic that uses real-time data, stored in building automation systems (e.g., gas/electricity consumption, weather, and occupancy), to find optimal setpoints at the building level and controls the setpoints accordingly. The algorithm consists of metaheuristic (k-nearest neighbor stochastic hill climbing), machine learning (r...