opensourcejason.info
spoken context
Project Summary
- Motivating Opinion: Speech will be dominant mode of interaction
- Enable spoken interaction to simplify interface to phone features
- For example: Vocal annotation of audio-visual
Spoken Access to Apps
- focus on calendar apps
- emphasize domain/language portability
Construction a new domain
- Communications loop:
- language generation -> speech synthesis
- speech synthesis -> human hearing
- human speech -> speech recognition
- speech recognition -> language understanding
- language understanding -> context resolution
- context resolution -> dialog management
- dialog management -> lang generation
Challenges
- provide adequate expression flexibility
- incorporate context info from previous session
- acquire training data for language variability
Example Dialogue
- Speaker understanding
- speech generation
- complex queries "schedule after, when is, cancel that one"
- confirmation sub-dialogues
- conflict resolution, anaphoric references
- Re-adapted to mandarin
- Same representation, mapped to different languages
Content Annotation and Retrieval
- Retrieve photos using speech
- Client-server architecture
- Client side speech annotation, photo meta data
- Server side: speech processing, photo storage
- Use N-best list for recognition (vacation destinations)
Recognition and parsing
- Query carrier phrase: context-free grammars
- Free-form annotation: n-gram model
- Meta-data terms: context-free grammaers
Dimensions
- Dialogue context
- Remember preceding dialogue
- When to relax constraints
- Confirm/override using follow-ups
- Important for sentence fragments
- Personalization
- Adapt acoustic models to speaker
- Augment vocab with contacts, appts, etc
- Capture info from peered resources
- Situation context
- locality
- time zone
- info clusters (i.e., temporal locality)
- GUI
- Multi-modal
- small screen - important to exploit context
Summary, Future Work
- Calendar and Photo Stuff
- English and Mandarin
- Move language from server to mobile