Controlling speech dialog using an additional sensor