The terahertz (THz) band can be deployed for label-free detection of malignant cells by using a cross-polarization converter sensor based on metasurfaces. Here, a metamaterial-based polarization converter is designed which can be tuned by varying the chemical potential of graphene. Dual band circular polarized (CP) waves are generated by an asymmetric Z-slotted graphene patch on a silicon dioxide constituted substrate. A Z-shaped conductor made of vanadium dioxide (VO2) is added in the slot to convert the metasurface from a CP converter into a cross-polarization converter. Optimizations in the design have been performed using Actor-Critic method with A3C (Asynchronous Actor-Critic Agents), which is a type of reinforcement learning algorithm. The apparatus is engineered to function based on the idea of refractive index (RI) sensing, which involves detecting the frequency shift of the maximum polarization conversion peak in the presence of the target compound. Insulator to metallic phase conversion of VO2 alters its dielectric properties like permittivity and conductivity due to variation of the anatomy of electron orbitals. By increasing the temperature of VO2, tunability in polarization conversion can be achieved due to its phase transition. Two degrees of freedom are thus present for tunability in polarization conversion. The change in maximum polarization conversion frequency is proportional to the RI difference between malignant and healthy cells. Peak sensitivity of the proposed metasurface is 420 GHz/RIU with maximum polarization conversion efficiency 98.3%. Finite integration based numerical simulations have been used to investigate the operating principle of the proposed polarization converter.