

Using the phonetic system you can input the phonemes that conforms the word, allowing to the synthesizer, pick the correct sequence of diaphones to reconstruct the word. For example, the word "sing" (IPA: sɪŋ, written as in the VOCALOID Phonetic System) can be synthesized by concatenating the sequence of diaphones "#-s, s~ɪ, ɪ~ŋ, ŋ-#". This one takes a series of sustained sounds, diphonetic and triphonetic samples from a sample library which are specified by the phonetic system and utilizes them to reconstruct the word reassembling them in accordance to how a word would be phonetically pronounced.

VOCALOID uses the method called Frequency-domain Singing Articulation Splicing and Shaping, a kind of concatenative synthesis. As such, Japanese VOCALOIDs are often more precise than English ones on their diaphonetic sounds. This makes separating sounds for the English VOCALOIDs much harder to do. However, for English VOCALOIDs, the phonetic data has to be separated by cutting sections out of the recorded samples, because some sounds simply cannot be gathered unless they were spoken as part of a word. The libraries consist of various sounds recorded and separated for use with the software.įor Japanese the script is much simpler with each phonetic sample successfully divided across the notes with little trouble. The recording is then transferred to into a library which the VOCALOIDs will pull their results from. The samples are gathered via the provider reading out a script in various keys while being recorded. Note: The following applies to the VOCALOID2 system onwards, while both programs work in a similar fashion, some things may not apply to VOCALOID or work differently than VOCALOID2. 2.2.2 Coarticulation, Assimilation and Phoneme Combinations.
