Sign languages differ dramatically from spoken languages in their linguistic articulators (the hands/face vs. the vocal tract) and in how they are perceived (visually vs. auditorily), which can impact how they are processed in the brain. This review focuses on the neural network involved in sign language comprehension, from processing the initial visual input to parsing meaningful sentences. We describe how the signer's brain decodes the visual signed signal into distinct and linguistically relevant representations (e.g., handshapes and movements) primarily in occipital and posterior temporal regions. These representations are converted into stable sign-based phonological representations in posterior temporal and parietal regions, which activate lexical-semantic representations. The higher-level processes which create combinatorial semantic-syntactic constructions from these lexical representations are subserved by a frontotemporal network of regions which overlaps with the network for spoken languages. The broad outline of this network is partially specific to the visual modality and partially supramodal in nature. Important avenues for future research include identifying and characterising patterns of activation and connectivity within macroanatomical regions which appear to serve multiple functional roles in sign language comprehension.