This paper addresses the issue of selecting optimal spatio-spectral features, which is key to high performance motor imagery (MI) classification that is in turn one of the central topics in EEG-based brain computer interfaces. In particular, this work proposes a novel method which first formulates the selection of features as maximizing mutual information between class labels and features. It then uses a robust estimate of mutual information, within a filter-bank and common spatial pattern feature extraction framework, to select an effective feature set. We have assessed the proposed method on both BCI Competition IV Set I and a separate data set collected in our lab from 7 healthy subjects. The results indicate the method is effective in selecting optimal spatial-spectral features for classification.