The energy consumed by commercial buildings for heating and cooling is significantly increased. To better cope with the uncertainty introduced by the high penetration of renewable generation units, exploiting the potential flexibility of commercial buildings is an effective solution. The parameters of commercial buildings are not onsite identified; hence, deep reinforcement learning (DRL) methods are integrated into the energy management problem. However, time delay commonly occurs in district heating systems due to the heat transfer process, which causes asynchronization between the state and the action and poses challenges to conventional DRL algorithms. In this paper, the asynchronization between the state and the action is eliminated by expending the state space to a larger but partially observed space, and the dispatch model for the district heating system is formulated as a partial observable Markov decision process (POMDP). Based on the finite difference method, this paper proposes a novel memory-augmented (MA) DRL method and utilizes dueling network structure to cope with the time delay caused by heat transfer process. The selection of the memory size is mathematically derived under a certain accuracy. Results from a case study of an industrial park demonstrate the satisfactory performance of the proposed method.