Audio Watermarking Through Parametric Signal Representations
Abstract
"Digital Watermarking" refers to the hiding of binary information in signals such as audio record¬ings, images, or video clips. While the dominating technologies utilize small masked changes in the amplitude to hide information in audio, in an alternate approach, this thesis presents a scheme based on frequency modulation. From an input signal, salient sinusoids are identified in its short-time spectra and parametrized by slowly time-varying frequency envelopes. To bear a watermark, the frequency envelopes are modified by quantization index modulation (QIM), a technique that rounds values to signaling grids. Frequency shifts due to QIM are intended to be not objectionable, if noticeable. To this end, the sensitivity of human ears to pitch changes is carefully considered in the design of quantization codebooks. Using modified frequency envelopes that carry the wa¬termark, sinusoids are synthesized, and then superposed with other time-frequency components to form a watermarked signal. Upon the retrieval of the watermarked signal, the decoder estimates the frequencies of sinusoids by the interpolation of log-magnitude spectra. Then, the hidden binary information is extracted by a maximum-likelihood method which involves optimal combination of binary decisions. The audio watermarking scheme is tested against common operations such as audio compression, low-pass filtering, reverberation, pitch scaling, and change of speed in the playback. It demonstrates adequate robustness for existing applications such as audio content authentication. Interestingly, the watermarking scheme also shows its potential for assisting sound source segregation, an application previously unexplored in the literature.