Task Description
Song translation aims to translate the lyrics in a song so that it can be performed in the target language. Unlike lyrics interpretation, song translation is harder because in addition to capturing the meaning, it must also obey music prosody constraints to preserve the singability and intelligibility.
Linguists and musicians point out that the rhythmic prosody of lyrics and melody should match to have natural sounding; and for tonal language such as Chinese, the pitch flow created by tones should align with the melodic contour. And a misunderstanding will be caused if there is a mismatch in speech tones and music tones.
As shown in the figure above, for tonal language such as Chinese, the characters with same phonemes but different tones have different meanings.
Some famous misunderstanding examples caused by this mismatch can be found in Misunderstanding Examples. Among them, the first example “六眼飞鱼” even has become a term, which people sometimes use it to refer this kind of misunderstanding.
Song Translation is a practical need in both commercial activities, such as the productions of oversea editions of musicals/operas/movie theme songs (can be seen here); and amateur activities, such as covering popular songs in another language.
Our Method
GagaST
Given the dearth of suitable parallel data to learn these constraints, we propose a unsupervised song translation pipeline Guided AliGnment for Automatic Song Translation (GagaST).
- We pretrain a translation model and adopt it into lyrics domain by jointly training with unparallel lyrics data;
- We use melody as guidance in the decoding stage during test time.
Some of our results and the comparison with other method are listed below.
Demo (audio)
For each demo song, we attatched the audio with singing voice and music sheet with lyrics of all the three version (the audios of GagaST are sung by one of the authors in this paper):
- English version: the original version of this song
- GagaST (full constraints): our method with both length constraint and prosodic constraints, pretrained with data from both news commentary (WMT) and lyrics domain.
- GagaST (w/ len constraint, w/o prosodic constraints): our method with only length constraint but without prosodic constraints, pretrained with data from both news commentary (WMT) and lyrics domain.
《As the Deer》
English version | music sheet | |
GagaST (full constraints) |
music sheet | |
GagaST (w/ len constraint, w/o prosodic constraints) |
music sheet |
《Autumn In New York》 (Jazz)
English version | music sheet | |
GagaST (full constraints) |
music sheet | |
GagaST (w/ len constraint, w/o prosodic constraints) |
music sheet |
《A World Without Danger》 (Pop)
English version | music sheet | |
GagaST (full constraints) |
music sheet | |
GagaST (w/ len constraint, w/o prosodic constraints) |
music sheet |
Case Study
We present here some case analysis by comparing 1) the google translation 2) the proposed GagaST system with only length constraint 3) the GagaST system with both length and prosodic constraints. We compute the pitch alignment scores for these three systems and the BLEU score compared to the human-translated lyrics (no music align).
Base on these examples, we can see that:
- our proposed constraints and pretraining strategies indeed improves the alignments between lyrics and music;
- the translated lyrics by GagaST resembles to actually lyrics, and the langauges are somewhat “poetic” without dramatic changes in semantics.
Case 1
Case 2
Case 3
Case 4
Case 5
Misunderstanding Examples
The lyrics will be sung with the pitch of the music, not the pitch of the actual word. The heard word will be the syllables + the music tone, if the pitch flow of the music is very different from the language, for tonal languages, the word will be interpreted to some totally unrelated word by the audience.
Example 1
Example 2
Example 3
Example 4
Example 5
Song Translation by Human
There’re some oversea editions of musicals/operas/movie theme songs, three officially human-translated Disney songs in Mandarin are listed below:
We transcribe the lyrics and compute the BLEU with the human translation (without singing purpose) of the original lyrics in English, the BLEU is 12.8 (87 lines). Shows that Song Translation is indeed a hard task, even for human. It’s hard to preserve all meanings while aligning with the music in another language. One exmaple is that, for the song “Let It Go”, the original line is “It’s funny how some distance makes everything seem small”; while the translated line in the Mandarin version of the song is “这一点点的距离 让一切变精致”, which means “this distance makes everything looks exquisite”.