This article reports on a study undertaken to validate an assessment tool of medical students' ability to integrate clinical skills and scientific knowledge within the patient encounter. One hundred forty first-year medical students at the State University of New York at Buffalo examined a standardized patient with either acute lower back pain or gastroesophageal reflux disease (GERD). Forty-eight clinical exams were evaluated by two raters to test the interrater reliability of the instrument. Results were promising but mixed. The tool displayed high internal consistency. However, results from a generalizability study indicated that a significant amount of variance in student scores was due to faculty raters. It is recommended that future studies undertake a training workshop for raters and examine different cases in an effort to expand the flexibility of the instrument.