Object detection serves as one of most fundamental computer vision tasks. Existing works on object detection heavily rely on dense object candidates, such as k anchor boxes pre-defined on all grids of an image feature map of size H×W. In this paper, we present Sparse R-CNN, a very simple and sparse method for object detection in images. In our method, a fixed sparse set of learned object proposals ( N in total) are provided to the object recognition head to perform classification and localization. By replacing HWk (up to hundreds of thousands) hand-designed object candidates with N (e.g., 100) learnable proposals, Sparse R-CNN makes all efforts related to object candidates design and one-to-many label assignment completely obsolete. More importantly, Sparse R-CNN directly outputs predictions without the non-maximum suppression (NMS) post-processing procedure. Thus, it establishes an end-to-end object detection framework. Sparse R-CNN demonstrates highly competitive accuracy, run-time and training convergence performance with the well-established detector baselines on the challenging COCO dataset and CrowdHuman dataset. We hope that our work can inspire re-thinking the convention of dense prior in object detectors and designing new high-performance detectors.