With the significant increase of Android malware, the APP privacy data leakage incidents occur frequently, which poses a great threat to user property and information security. Specifically, the new malware has the characteristics of high evolution rate and diverse variants, leading to the fact that the current malware detection methods still have three key problems: (1) Difficulty in acquiring Android sample structural features; (2) Weakly in representing malware behavior structure; (3) Poor robustness of the detection model. To address the above limitations, we propose a new malware detection framework MPRLDroid with reinforcement learning. First of all, the MPRLDroid model extracts the Android APP structural features and constructs the heterogeneous information network data based on the semantic call structure between APP, API and permission. Subsequently, the model utilizes reinforcement learning to adaptively generate a meta-path for each sample and combines it with a graph attention network to effectively represent the graph of nodes. Finally, the low-dimensional graph node vector data is brought into the downstream detection task for classification, where the performance change of the classification result is used as a reward function for reinforcement learning. The experimental results demonstrate that the MPRLDroid model, when integrated with reinforcement learning, outperforms the baseline models in terms of performance, and its detection model exhibits greater robustness compared to other models.