In this section, we complement our Final Report Section 4.4 with the following reconstruction results.
After we perform STFT, we cannot simply use Method 1 as we need to take into consideration corruption of phase information after interpolating in the frequency spectra before reconstruction and our NN could in thoery output complex output before reconstruction.
Method 1: The original audio waveform (ISTFT on STFT)
1024, 50%, Cat-C3
1024, 75%, Cat-C3
Method 2: ISTFT on Abs(STFT)
The reconstructed audio waveform using ISTFT on Abs(STFT)
1024, 50%, Cat-C3
1024, 75%, Cat-C3
Method 3: GL* on Abs(STFT)
The reconstructed audio waveform using Grin-Lim on Abs(STFT)
1024, 50%, Cat-C3
1024, 75%, Cat-C3 [Chosen one]
2. Pitch Shift Methods
In this section, we complement our Final Report Section 4.5 and 5.1 with the following reconstruction results.
METHOD 1: Frequency Domain Attempt - Interpolation of FFT Freq Values -> Using 1024, 0.75 Cat-C3, +5
GL on Abs(STFT)
GL on Abs(Expected STFT) (the label)
GL on Abs(Pitch shifted STFT)
-> Using 2048, 0.75, Cat-C3, +5
GL on Abs(STFT)
GL on Abs(Expected STFT) (the label)
GL on Abs(Pitch shifted STFT)
METHOD 2: Phase Vocoder + Resampling method -> Using 1024, 0.75 Cat-C3, +5
Original wav (constructed with GL)
Expected wav (constructed with GL)
Resample then stretch
Stretch then resample [Chosen one].
Indistinguishable to the above's resample then stretch audio but the spectra of stretching prior to resampling preserved the spectral shape from the original signal better.
3. ANN Training Audio Results
In this section, we complement our Final Report Section 4.5 and 5.2 with the following reconstruction results.
Example 1: Up pitch from E3 to G3
Original audio file (original)
Expected pitch shifted audio file
Before training, pitch shifted audio file
After Architecture 0 training
After Architecture 1 training
After Architecture 2 training
After Architecture 3 training
Example 2: Down pitch from E3 to Db3
Original audio file (original)
Expected pitch shifted audio file
Before training, pitch shifted audio file
After Architecture 0 training
After Architecture 1 training
After Architecture 2 training
After Architecture 3 training
4. Our Product Demonstration (Testing Results of ANN Training)
In this section, we complement our Final Report Section 4.5 and 5.2 with the following reconstruction results.
Speech Demo
Original audio fileFor the following, shift pitch up by 3:
Using conventional pitch shift (loss of realism)
Using Architecture 0
Using Architecture 1
No Result.
Using Architecture 2
No results.
Using Architecture 3[Chosen one]
Singing Demo
Original audio fileFor the following, shift pitch up by 3:
Using conventional pitch shift (loss of realism)
Using Architecture 0
Using Architecture 1
No Result.
Using Architecture 2
No Result.
Using Architecture 3 [Chosen one]
Speech Demo 2
Original audio fileFor the following, using architecture 3, we observe that pitch information is lost.