Revisiting predictions of movie economic success: random Forest applied to profits

Thaís Luiza Donega E Souza; Marislei Nishijima; Ricardo Pires

doi:10.1007/s11042-023-15169-4

Revisiting predictions of movie economic success: random Forest applied to profits

Multimed Tools Appl. 2023 Mar 28:1-24. doi: 10.1007/s11042-023-15169-4. Online ahead of print.

Authors

Thaís Luiza Donega E Souza¹, Marislei Nishijima², Ricardo Pires³

Affiliations

¹ Information Systems Department, University of São Paulo, 1000, Arlindo Béttio - Ermelino Matarazzo, 03828-000, Room: L1 - 327, São Paulo, SP Brazil.
² University of São Paulo, Institute of International Relationships, Av. Prof. Lúcio Martins Rodrigues, Tv. 4 e 5, Cidade Universitária, São Paulo, SP 05508-020 Brazil.
³ Department of Electricity, Federal Institute of São Paulo, R. Pedro Vicente, 625 - Canindé, São Paulo, SP 01109-010 Brazil.

Abstract

Previous studies have employed machine learning tools to classify films according to success to guide a reduction in the degree of uncertainty of film production. We revisited the literature to contribute to three relevant issues in classifying films according to economic success. First, we explored the differences between the results of the shortest or longest samples in terms of time to study possible changes in patterns of consumption mainly due to technological changes and between total and wide-released films. Second, we used profits free of price inflation as measures of economic success instead of the usual box office nominal revenues. Third, we employed a smaller set of features, only the ones available at the time of production, to help producers maneuver contingencies since little or nothing can be done by the time a film is in the theaters. We followed the literature to choose the classifiers - Random Forest, Support Vector Machine, and Neural Network - and designed sub-datasets to model and compare the performance of our results. Our dataset includes all films with budgets disclosed at the Box Office Mojo website, resulting in 3167 movies released at theaters worldwide between 1980 and 2019. The Random Forest results outperform previous similar studies with different sampling in time, including results for a less usual larger sample, with the best data sample about 97% both in accuracy and F1-score.

Supplementary information: The online version contains supplementary material available at 10.1007/s11042-023-15169-4.

Keywords: Classification; Machine learning; Movie market; Movie success; Profit; Regime change.

© The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.