Click on a result to hear it. These are the raw unfiltered signals
| Setup | Single point | Average | Delay & Sum | Ours | Input |
|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This is a baseline method where the audio is recovered from the vibrations of a single point on the surface of the object. This often results in a noisy and distorted signal as it fails to capture the full spatial dynamics of the vibrations.
This method involves taking a simple average of the vibration signals from all measurement points. While it can reduce some uncorrelated noise, it does not account for phase differences or the modal behavior of the object, leading to signal cancellation and suboptimal recovery.
Delay-and-sum beamforming shifts each signal by a single propagation delay so they align at a reference point before summation. This works when the difference between signals is well-modeled by a uniform time shift. Applying this procedure to our speckle measurements yields low-pass-filtered audio, since the global shifts `lock' onto the dominant lower frequencies. In contrast, the high frequencies are not aligned and might be nullified.
Our proposed modal-guided method. We first estimate the object's modal basis from the multi-point vibration data. Then, we use this physical prior to guide the extraction of the sound source from the structural vibrations, resulting in a significantly clearer and more accurate audio recovery.
This is the ground truth audio signal that was played in the room, causing the object to vibrate. It serves as the reference for evaluating the quality of the recovered audio from the different methods.