Recovered Audio Results (Figure 5)

Click on a result to hear it. These are the raw unfiltered signals

Setup	Single point	Average	Delay & Sum	Ours	Input
drum (fig 4.)	--:--	--:--	--:--	--:--	--:--
picture frame	--:--	--:--	--:--	--:--	--:--
laptop	--:--	--:--	--:--	--:--	--:--
trash can	--:--	--:--	--:--	--:--	--:--
guitar	--:--	--:--	--:--	--:--	--:--
wooden binder	--:--	--:--	--:--	--:--	--:--
plastic plate	--:--	--:--	--:--	--:--	--:--
drum (stereo)	--:--	--:--	--:--	--:--	--:--
yoga foam	--:--	--:--	--:--	--:--	--:--
physio ball	--:--	--:--	--:--	--:--	--:--
balloon	--:--	--:--	--:--	--:--	--:--

Setup

Single point

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

--:--

Single point

This is a baseline method where the audio is recovered from the vibrations of a single point on the surface of the object. This often results in a noisy and distorted signal as it fails to capture the full spatial dynamics of the vibrations.

Average

This method involves taking a simple average of the vibration signals from all measurement points. While it can reduce some uncorrelated noise, it does not account for phase differences or the modal behavior of the object, leading to signal cancellation and suboptimal recovery.

Delay & Sum

Delay-and-sum beamforming shifts each signal by a single propagation delay so they align at a reference point before summation. This works when the difference between signals is well-modeled by a uniform time shift. Applying this procedure to our speckle measurements yields low-pass-filtered audio, since the global shifts `lock' onto the dominant lower frequencies. In contrast, the high frequencies are not aligned and might be nullified.

Ours

Our proposed modal-guided method. We first estimate the object's modal basis from the multi-point vibration data. Then, we use this physical prior to guide the extraction of the sound source from the structural vibrations, resulting in a significantly clearer and more accurate audio recovery.

Input

This is the ground truth audio signal that was played in the room, causing the object to vibrate. It serves as the reference for evaluating the quality of the recovered audio from the different methods.