What is the difference between a written and a spoken form?

A vocabulary word has exactly one written form. This is the written form that should be set automatically in the transcript when one of the up to ten stored spoken forms for this vocabulary is said in the audio or video track.

A brief example for better understanding:

Written form: gnocchi

Spoken forms: nyohki; nokey; nochi; gnotschi