Rapid and accurate structural damage assessment after an earthquake is important for efficient emergency management. The widespread application of surveillance cameras provides a new possibility for improving the efficiency of assessment. However, it is still challenging to directly assess the structural seismic damage based on videos captured by indoor surveillance cameras during earthquakes. In this study, we elaborate on the concept of estimating the structural natural frequency based on the relative pixel displacement of inter-stories. Furthermore, we propose a strategy for post-earthquake structural damage assessment that integrates the computer vision and time-frequency analysis. This approach aims to navigate the difficulties inherent in earthquake damage assessment and improve emergency responses. The relative pixel displacement between the camera and the fixed features on the floor is extracted from videos by using the Harris corner detection and Kanade–Lucas–Tomasi algorithms. The structural natural frequency is estimated using the synchroextracting transform-enhanced empirical wavelet transform. The natural frequency shift-related seismic damage index is defined and calculated for damage assessment. A shake table experiment of a small-scale steel model is conducted to verify the accuracy and feasibility of the approach, and the practicality of the proposed approach is further verified by utilizing the data from a full-scale reinforced concrete benchmark model experiment. The results demonstrate that the approach can accurately and efficiently evaluate the structural damage after an earthquake based on the video captured by surveillance cameras during the earthquake. The error of the acquired damage index is less than 0.1. We will apply more advanced algorithms in the future to alleviate this problem.