REEF: Retainable Evaluator Execution Framework

Proc ACM SIGMOD Int Conf Manag Data. 2015 May-Jun:2015:1343-1355. doi: 10.1145/2723372.2742793.

Abstract

Resource Managers like Apache YARN have emerged as a critical layer in the cloud computing system stack, but the developer abstractions for leasing cluster resources and instantiating application logic are very low-level. This flexibility comes at a high cost in terms of developer effort, as each application must repeatedly tackle the same challenges (e.g., fault-tolerance, task scheduling and coordination) and re-implement common mechanisms (e.g., caching, bulk-data transfers). This paper presents REEF, a development framework that provides a control-plane for scheduling and coordinating task-level (data-plane) work on cluster resources obtained from a Resource Manager. REEF provides mechanisms that facilitate resource re-use for data caching, and state management abstractions that greatly ease the development of elastic data processing work-flows on cloud platforms that support a Resource Manager service. REEF is being used to develop several commercial offerings such as the Azure Stream Analytics service. Furthermore, we demonstrate REEF development of a distributed shell application, a machine learning algorithm, and a port of the CORFU [4] system. REEF is also currently an Apache Incubator project that has attracted contributors from several instititutions.

Keywords: Big Data; Database; Distributed Systems; High Performance Computing; Machine Learning.