Burst and Memory-aware Transformer: capturing temporal heterogeneity

Byounghwa Lee; Jung-Hoon Lee; Sungyup Lee; Cheol Ho Kim

doi:10.3389/fncom.2023.1292842

Burst and Memory-aware Transformer: capturing temporal heterogeneity

Front Comput Neurosci. 2023 Dec 12:17:1292842. doi: 10.3389/fncom.2023.1292842. eCollection 2023.

Authors

Byounghwa Lee¹, Jung-Hoon Lee¹, Sungyup Lee¹, Cheol Ho Kim¹

Affiliation

¹ CybreBrain Research Section, Electronics and Telecommunications Research Institute, Daejeon, Republic of Korea.

Abstract

Burst patterns, characterized by their temporal heterogeneity, have been observed across a wide range of domains, encompassing event sequences from neuronal firing to various facets of human activities. Recent research on predicting event sequences leveraged a Transformer based on the Hawkes process, incorporating a self-attention mechanism to capture long-term temporal dependencies. To effectively handle bursty temporal patterns, we propose a Burst and Memory-aware Transformer (BMT) model, designed to explicitly address temporal heterogeneity. The BMT model embeds the burstiness and memory coefficient into the self-attention module, enhancing the learning process with insights derived from the bursty patterns. Furthermore, we employed a novel loss function designed to optimize the burstiness and memory coefficient values, as well as their corresponding discretized one-hot vectors, both individually and jointly. Numerical experiments conducted on diverse synthetic and real-world datasets demonstrated the outstanding performance of the BMT model in terms of accurately predicting event times and intensity functions compared to existing models and control groups. In particular, the BMT model exhibits remarkable performance for temporally heterogeneous data, such as those with power-law inter-event time distributions. Our findings suggest that the incorporation of burst-related parameters assists the Transformer in comprehending heterogeneous event sequences, leading to an enhanced predictive performance.

Keywords: Transformer; burst; event sequence; inter-event time; self-attention; temporal heterogeneity; temporal point process; timestamp.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This work was supported by Electronics and Telecommunications Research Institute (ETRI) grant funded by the Korean government (23ZS1100, Core Technology Research for Self-Improving Integrated Artificial Intelligence System).