Intrusive subjective speech quality estimation of mean opinion score (MOS) often involves mapping a raw similarity score extracted from differences between the clean and degraded utterance onto MOS with a fitted mapping function. More recent models such as support vector regression (SVR) or deep neural networks use multidimensional input, which allows for a more accurate prediction than one-dimensional (1-D) mappings but does not provide the monotonic property that is expected between similarity and quality. We investigate a multidimensional mapping function using deep lattice networks (DLNs) to provide monotonic constraints with input features provided by ViSQOL. The DLN improved the speech mapping to 0.24 mean-square error on a mixture of datasets that include voice over IP and codec degradations, outperforming the 1-D fitted functions and SVR as well as PESQ and POLQA. Additionally, we show that the DLN can be used to learn a quantile function that is well-calibrated and a useful measure of uncertainty. The quantile function provides an improved mapping of data driven similarity representations to human interpretable scales, such as quantile intervals for predictions instead of point estimates.