Globally, cyberattacks are growing and mutating each month. Intelligent Intrusion Network Detection Systems are developed to analyze and detect anomalous traffic to face these threats. A way to address this is by using network flows, an aggregated version of communications between devices. Network Flow datasets are used to train Artificial Intelligence (AI) models to classify specific attacks. Training these models requires threat samples usually generated synthetically in labs as capturing them on operational network is a challenging task. As threats are fast-evolving, new network flows are continuously developed and shared. However, using old datasets is still a popular procedure when testing models, hindering a more comprehensive characterization of the advantages and opportunities of recent solutions on new attacks. Moreover, a standardized benchmark is missing rendering a poor comparison between the models produced by algorithms. To address these gaps, we present a benchmark with fourteen recent and preprocessed datasets and study seven categories of algorithms for Network Intrusion Detection based on Network Flows. We provide a centralized source of pre-processed datasets to researchers for easy download. All dataset are also provided with a train, validation and test split to allow a straightforward and fair comparison between existing and new solutions. We selected open state-of-the-art publicly available algorithms, representatives of diverse approaches. We carried out an experimental comparison using the Macro F1 score of these algorithms. Our results highlight each model operation on dataset scenarios and provide guidance on competitive solutions. Finally, we discuss the main characteristics of the models and benchmarks, focusing on practical implications and recommendations for practitioners and researchers.