In this paper, we consider optimal consumption and strategic asset allocation decisions of an investor with a finite planning horizon. A Q-learning approach is used to maximize the expected utility of consumption. The first part of the paper presents conceptually the implementation of Q -learning in a discrete state-action space and illustrates the relation of the technique to the dynamic programming method for a simplified setting. In the second part of the paper, different generalization methods are explored and, compared to other implementations using neural networks, a combination with self-organizing maps (SOMs) is proposed. The resulting policy is compared to alternative strategies.