AUTO-TUNE
Reflection
Frequency Detection:
- Peak detection for phase-vocoding worked well. Unfortunately, we could not figure out a good way to use the peak information to manipulate the pitch of the sound in a way that preserves its quality.
- In the future, we need to look into methods of converting into the frequency domain that are less computationally intensive. We were significantly limited by the computational intensity of some of our approaches.
Denoising:
- Denoising worked well: We successfully increased SNR from 15dB to 30dB.
- However there are some problems: while we preserved signal and suppressed noise, some important features of the signal is lost. Although we tried our best to keep the instrumental features of the signal, some of them are lost. They are mainly high-frequency, low-intensity parts.
- However there are some problems: while we preserved signal and suppressed noise, some important features of the signal is lost. Altough we tried our best to keep the instrumental features of the signal, some of them are lost. They are mainly high-frequency, low-intensity parts.
Pitch Correction:
- We found the PSOLA proved to be the most effective algorithm for pitch-correction.
- Simplistic algorithms were sometimes effective at pitch-correction but only at the cost of severely damaging the quality of the sound
- More advanced techniques may improve sound quality in the future. For example, precisely manipulating peaks with a phase-vocoder algorithm could help to reduce unpleasant side effects of modifying an entire signal.
Smoothing:
- Naive smoothing technique is used and proved to be pretty good. We didn't test out different smoothing algorithms due to time constraints.
In-Class and Out-of-Class Techniques
Some of the in-class techniques we used were
- Windowing a signal to separate it into small segments
- Converting to the frequency domain with STFT (essentially a bunch of FFTs)
- Processing signals in other bases (e.g. Wavelet Transform)
- Analyzing the trade-off between sampling rate for windowing and transforming with STFT, which improves signal quality, and computation time
- Phase-Vocoding for pitch-identification and shifting
- PSOLA for pitch shifting
- Gaussian smoothing
- Block thresholding for de-noising
What We've Learned
Here is a quick summary of the techniques and skills we have learned:
- Using Short-Time Fourier Transforms and Wavelet Transforms to transform and analyze signals.
- Applying noise identification and thresholding techniques for denoising.
- Using phase-vocoder and threshold based algorithms for pitch shifting without a ground truth.
- Using column-based, segment-based, and pitch-based algorithms for matching a ground truth.
- Applying PSOLA to match an audio signal to a ground truth.
- Balancing audio quality and desired effects with compute time.
References
[1] G. Yu, S. Mallat and E. Bacry, "Audio Denoising by Time-Frequency Block Thresholding," in IEEE Transactions on Signal
Processing, vol. 56, no. 5, pp. 1830-1839, May 2008.
[2] Phu Ngoc Le, E. Ambikairajah and E. Choi, "An improved soft threshold method for DCT speech enhancement," 2008 Second International Conference on Communications and Electronics, Hoi an, 2008, pp. 268-271.
[3] Verhelst, Werner and Roelands Marc, “An Overlap-add Technique based on Waveform Similarity for High Quality Time-Scale Modification of Speech” https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=319366
[4] Cheng,Corey, “Design of a pitch quantization and pitch correction system for real-time music effects signal processing”https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6411995
[5] “Frequencies for equal-tempered scale, A4 = 440 Hz”. http://pages.mtu.edu/~suits/notefreqs.html
[6] Moshe, Dorit. “Denoising Using Wavelets”. http://cs.haifa.ac.il/hagit/courses/seminars/wavelets/Presentations/ Lecture09_Denoising.pdf
[7] A. Nagel "ELEN E4810 Digital Signal Processing Final Project: Pitch Correction". http://www.columbia.edu/~agn2114/index.html
[2] Phu Ngoc Le, E. Ambikairajah and E. Choi, "An improved soft threshold method for DCT speech enhancement," 2008 Second International Conference on Communications and Electronics, Hoi an, 2008, pp. 268-271.
[3] Verhelst, Werner and Roelands Marc, “An Overlap-add Technique based on Waveform Similarity for High Quality Time-Scale Modification of Speech” https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=319366
[4] Cheng,Corey, “Design of a pitch quantization and pitch correction system for real-time music effects signal processing”https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=6411995
[5] “Frequencies for equal-tempered scale, A4 = 440 Hz”. http://pages.mtu.edu/~suits/notefreqs.html
[6] Moshe, Dorit. “Denoising Using Wavelets”. http://cs.haifa.ac.il/hagit/courses/seminars/wavelets/Presentations/ Lecture09_Denoising.pdf
[7] A. Nagel "ELEN E4810 Digital Signal Processing Final Project: Pitch Correction". http://www.columbia.edu/~agn2114/index.html
Codes and data
eecs351_project_group2.zip | |
File Size: | 12608 kb |
File Type: | zip |
Team Members: Yufan Yue | Mingshuo Shao | Yuhan Chen | Eric Winsor
{funkyyue, mingshuo, chenyh, rcwnsr}@umich.edu
{funkyyue, mingshuo, chenyh, rcwnsr}@umich.edu