Hiding and Watermarking

Testing

Aural Tests

As our primary objective was to make the changes inaudible, we tested all of our algorithms aurally.

We initially tested the algorithms on a 440 Hz tone to ensure that the algorithms were working as expected. (We did not test the FMA on the tone as doing so would have been silly because the tone only has a single frequency with no other frequencies to modify.)

We continued our aural testing with a suite of six songs from different genres: classical, hip-hop, oldies, pop, rock, and techno. We adjusted any thresholds and predefined constants to the point of aural imperceptibility. Working within these limits we were then able to modify these constants to maximize bitrate, accuracy, and noise resilience.

The following figure details the particular songs chosen and their overall frequency spectrums.

**Figure 1:** Frequency spectrums of test suite songs

Bitrates

Our test suite had a CD quality sampling frequency: 44100Hz, which amounts to 220500 samples for a 5 second long clip. Ideally with no noise it would be possible to use a segment length of 2 samples. This setup translates to 220500/2 = 110250 segments in 5 seconds and 110250/5 = 22050bits/sec. I.e. at CD quality, we cannot get more than a 22Kbits/sec data rate.

In practice we found that Mat lab was unable to handle this amount of data. We were, however, able to successfully reach 4800 segments, or 46 samples per segment. These values translate to 220500/46 = 4793 segments in 5 seconds and 958bits/sec. I.e. we reliably demonstrated a 1Kbit/sec data rate.

Power Ratios

To measure how much we had changed each of the signals by encoding bits, we took a power ratio of the original signal to the output signal.

**Figure 2:** Formula for Power Ratio
Formula for Power Ratio

We found these ratios for two different input characters: ‘@’ and ‘w’. Because ‘@’ is encoded by 100 0000 in ASCII, these power ratios measure the minimum amount of change we make to our signals. Because ‘w’ is encoded by 111 0111 in ASCII, these power ratios measure the maximum amount of change we make to our signals.

TABLE 1
Power Ratios
	FMA		PSA		EA
	@	w	@	w	@	w
classical	1.0052	1.0362	1.0056	1.0352	0.9992	0.9955
hip/hop	1.0079	1.0507	1.0068	1.0413	0.997	0.9818
oldies	1.0133	1.0747	1.0069	1.0425	0.9986	0.9897
pop	1.0115	1.0776	1.0063	1.0388	0.9975	0.9842
rock	1.0131	1.0628	1.0072	1.0419	0.9975	0.9888
techno	1.0155	1.0897	1.0077	1.0463	0.9951	0.9723

Table 1. Power Ratios for each algorithm encoding one 1 per seven bits (“@”) and one 0 per seven bits (“w”)

**Figure 3:** Chart of Power Ratios

The most important feature of these results is that all of our power ratios are very close to one, indicating that we have not changed the signal very much.

We also see some variation across the different songs because which values are changed and by how much depends on the song; for example, with the PSA, the delay causes us to drop samples at the end of the segment, and the power in the dropped samples depends on the song.

As expected, for ‘w’, the power ratio is further from one as more one-bits are encoded. Because adding an echo can be variously constructive or deconstructive, the power ratio does not reflect the number of one-bits as much as FMA and PSA. This fact also explains why the power ratios for the EA are generally lower than those for the FMA and PSA.

Finally for FMA and PSA the power of the marked signal was lower than the power of the original signal. For the FMA, this decline in power was expected because we scaled frequencies down, thus, deceasing the power in the frequency spectrum, which, as Parseval’s Theorem tells us, corresponds to decreasing the power of the signal. For the PSA, this decline in power was also expected because the PSA delays the signal in various segments, dropping samples in the marked signal. The EA was the only case in which the marked signal had greater power than the original signal because the echoes in this case were more constructive than destructive

Uses

Secure Storage

Hiding security-sensitive information in music epitomizes the idea of "security through obscurity". Even if an intruder should gain access to the encrypted music files, there may not be any external indication that the files contain encrypted data, rather than simply music. Playing the files as they were intended produces regular, non-suspicious music that is nearly indistinguishable from the unmodified recording. Even if a would-be eavesdropper realizes that the music contains encoded bits, the issue still remains of finding the encoded bits within the signal and deciphering their meaning.

Covert Communication

The same reasoning can be applied to music that is openly broadcast between parties wishing to communicate in secret. An unsuspecting listener just hears music, but the desired audience has the tools to extract the hidden message. This is not just for spies, of course. A system could be designed where a radio station broadcasts music encoded with information about the song that is currently playing. A special receiver interprets the hidden code and offers the listener the option to buy the current song or other songs by that artist. Regardless, listeners with or without the special receiver do not perceive any loss of sound quality compared to a regular radio broadcast.

Copy Control

Individual copies of a piece of music could be labeled with imperceptible watermarks containing serial numbers. The watermarking algorithms designed in this project would have to be modified slightly, but they could be used to verify a signal's compliance with the following rule. A copy is only a legitimate copy if it includes the official watermark and/or a serial number that has already been sold. Furthermore, when each customer purchases a copy of the song, his or her purchase will be assigned a serial number. If multiple copies of a song bearing the same serial number are discovered where they should not be, then it is clear which user is responsible for breaking the rules.

Future Work

Recover the Encoded Message without Original Signal

All three encoding and decoding schemes require that the modified output signal be compared to the original signal to attempt to recover the encoded message. Obtaining the original signal can be cumbersome in practice and may present logistical problems. Fortunately, this requirement can be lifted with a slight design change.

Detect Whether a Signal has been Watermarked

This project could also be furthered by creating a decoding process which takes in a signal and a message and attempts to discover whether the signal has been marked with that message.

Survive Cropping Attacks

Our algorithms could survive cropping if we set up a matched filter in the decoder. First we would determine where the marked signal is located in the original signal using cross-correlation. Then we could crop the original signal in order to compare it to the marked signal and recreate the message (without, of course, the bits lost in the crop).

Increase Security by Pseudorandom Sequences

By first encoding the message with a pseudorandom sequence, we could increase the security of the message. If the encoder uses the pseudorandom sequence with the message as a seed to select which segments are encoded, the decoder can only find what the original message was if he also has the sequence. Thus, the encoder and the decoder must have some key sharing mechanism.

A second method of increasing the security using the pseudorandom sequences is varying the segment length. In the first step of each encoding process, the original signal is cut up into segments of equal time length. If, instead, the length of each segment is varied according to a pseudo-random sequence known by the transmitter and receiver. Without this sequence key, a potential eavesdropper would have great difficulty finding--let alone interpreting--the changes detected in the modified sound.

Encode on both Audio Channels

All three encoding processes currently only encode on one channel of a stereo audio signal. Both channels may be used to store additional information, at the cost of degrading the sound quality further. The effectiveness of this strategy is limited because the human brain is comparatively good at discerning differences in sound between the two ears.

Use Error Correcting Codes

This project focuses more heavily on the design of the encoding and decoding systems than the contents of the transmitted message. However, system performance in the presence of noise might improve if some form of error correcting code is used. If single-bit errors are evenly distributed throughout a decoded message, error correcting codes will improve accuracy. The trade-off is that fewer unique bits can be encoded, and message length must be reduced.

Extend Findings to Speech Signals

This project focused exclusively on hiding digital data within music files. Perhaps a more practical application is to apply these results to speech signals. Human speech typically covers a smaller frequency range than music and also typically lacks harmonic resonance. It is not clear how well the encoding and decoding schemes will perform when applied to speech.