Module 3 - Fourier analysis

음성 2023. 8. 24. 15:21

Speech Processing > Module 3 > Digital Speech Signals > Videos > Fourier analysis

https://speech.zone/courses/speech-processing/module-3-digital-speech-signals/videos-2/fourier-analysis/

Fourier analysis

speech.zone

이제 급수 확장(series expansion) 접근법을 사용하여 시간 도메인의 디지털 신호에서 주파수 도메인의 표현으로 이동하겠습니다.
모든 신호는 디지털 신호입니다.
우리는 급수 확장이 어떻게 작동하는지 다소 추상적인 방식으로 이해했습니다.
이제 푸리에 분석을 통해 이를 구체적으로 살펴보겠습니다.
급수 확장에 대해 간단히 정리해 보겠습니다.
예를 들어 complex wave를 간단한 기저함수의 합으로 표현하는 것이 어떻게 가능한지 살펴봤습니다.복소파를 사인파의 가중 합과 같다고 썼습니다.좀 더 정확하게 표현해 보겠습니다.기저 함수에 가중치를 곱한 계수 또는 가중치가 있습니다.
이러한 기저 함수는 단위 진폭을 가지므로 계수에 따라 크기를 조정한 다음 합산합니다.
분석 중인 신호를 정확하게 재구성하기 위해 여러 기저 함수를 직렬로 추가합니다.
이것이 바로 계수(coefficients)에 의해 가중치가 부여된 기저 함수의 합입니다.
이 기저 함수(basic functions), 즉 사인파가 어떻게 직렬인지 주목하세요.
각각의 분석 프레임에는 이전 분석 프레임보다 사이클이 하나 더 있습니다.
이제 분석 중인 신호와 미리 지정된 기저 함수 집합만 주어졌을 때 이러한 계수를 찾는 방법에 대해 생각해야 합니다.
다음은 일련의 기저 함수입니다. 처음 네 개부터 시작하겠습니다.
주파수를 적고 이들 사이의 관계를 알아보세요.
동영상을 일시 정지합니다.
분석 윈도우의 지속 시간은 0.01초입니다.
최저 주파수 기준 함수(lowest frequency basis function)는 그 시간 동안 한 사이클을 만들며, 이는 100Hz의 주파수를 가짐을 의미합니다.
숫자와 단위 사이에 공백을 넣어 단위를 올바르게 작성하겠습니다.
두 번째 함수는 같은 시간에 두 번의 사이클을 만들므로 주파수는 200Hz여야 합니다.
다음 사이클은 세 번, 즉 300Hz입니다.
그리고 400Hz.
그 값을 얻었기를 바랍니다.
가장 낮은 주파수부터 시작해서 그 주파수의 모든 배수가 균등한 간격으로 연속되는 사인파의 연속입니다.
샘플링 속도가 16kHz라고 하면 기저 함수가 훨씬 더 많아집니다.
가장 높은 주파수 기저 함수는 무엇일까요?
비디오를 일시 정지합니다.
디지털 신호에서 우리가 표현할 수 있는 가장 높은 주파수는 샘플링 주파수의 절반인 나이퀴스트 주파수(Nyquist frequency)라는 것을 알고 있습니다.
여기서 나이퀴스트 주파수는 8kHz입니다.
실제로 볼 수 있도록 확대하는 것이 좋습니다.
여기 이 파형의 주파수는 8kHz입니다.
이보다 더 높아질 수는 없습니다.
푸리에 분석은 단순히 기저 함수의 계수를 찾는 것을 의미합니다.
분석 결과를 기록할 곳이 필요하므로 오른쪽에 몇 개의 축을 만들었습니다.
이 가로축은 기저 함수의 주파수가 될 것입니다.
8000Hz(즉, 8kHz)의 기저 함수로 올라갈 것이므로 이 주파수를 kHz 단위로 표시하겠습니다.
세로축에 계수의 값을 적겠습니다.
이 값을 크기라고 부르겠습니다.
다음은 최저 주파수 기준 함수입니다.
100Hz의 함수입니다.
신호를 재구성하기 위해 이 기저 함수를 얼마나 많이 사용해야 하는지 오른쪽에 100헤르츠(물론 0.1kHz)로 표시해 보겠습니다.
실제로 그 양을 어떻게 계산할까요?
기저 함수와 분석 중인 신호 사이의 유사성을 살펴볼 것입니다.
이를 상관관계라고 합니다.
상관관계는 두 신호에 샘플을 하나씩 곱하면 간단히 구할 수 있습니다.
따라서 이 샘플에 이 샘플을 곱하고, 이 샘플에 이 샘플을 더하고, 이 샘플에 이 샘플을 더하는 식으로 모든 샘플을 합산합니다.
이렇게 하면 두 신호가 매우 유사할 때 큰 값을 얻을 수 있습니다.
이 예제에서는 이 최저 주파수 기저 함수에 대해 이 작업을 수행하면 0.1이라는 값이 나옵니다.
여기에 약간의 스케일을 적용하겠습니다.
그런 다음 다음 기저 함수에 대해 이 작업을 수행합니다.
0.2kHz가 될 것이고 상관 관계를 계산해 보니 0.15가 필요하다는 것을 알 수 있습니다.
그런 다음 다음 0.3kHz에서 0.25가 필요하다는 것을 알았습니다.
그 다음 0.4kHz를 계산하면 0.2가 필요하다는 것을 알 수 있습니다.
오른쪽에 함수를 플롯했습니다.
더 쉽게 볼 수 있도록 점을 합쳐 보겠습니다.
이것을 스펙트럼이라고 합니다.
스펙트럼은 기저 함수의 각 주파수에서 원래 신호의 에너지 양입니다.
이제 푸리에 분석의 기술적이면서도 필수적인 속성에 대해 이야기할 필요가 있는데, 기저 함수는 사인파(즉, 순수한 톤)입니다.
정현파는 단 하나의 주파수에서만 에너지를 포함합니다.
즉, 이 시리즈의 모든 사인파 쌍은 직교(orthogonal)합니다.
이것이 무엇을 의미하는지 살펴봅시다.
한 쌍의 기저 함수를 가져옵니다.
처음 두 개를 가지고 이 두 신호 사이의 상관관계를 알아봅시다.
이 신호에 이 신호를 곱하고, 이 신호에 이 신호를 곱하고, 이 신호에 이 신호를 곱하는 식으로 샘플별로 곱하고 그 합을 계산합니다.
이 쌍의 경우 항상 0이 됩니다.
대칭만 봐도 알 수 있습니다.

이 두 신호는 직교합니다.
이 신호에는 이 주파수에 포함된 에너지가 없으며, 그 반대의 경우도 마찬가지입니다.
어떤 쌍(pair)에서도 같은 일이 일어납니다.
이들 사이의 상관관계는 0입니다.
이 파형에는 이 주파수의 에너지가 없으며 그 반대의 경우도 마찬가지입니다.
기저 함수 간의 이러한 직교성은 신호를 이러한 기저 함수의 가중치 합으로 분해할 때 고유한 해가 있다는 것을 의미합니다.
다시 말해, 작동하는 계수(coefficients) 집합은 단 하나뿐입니다.이러한 고유성은 매우 중요합니다.이는 계수 집합에 원래 신호와 동일한 정보가 있다는 것을 의미합니다.이 변환을 반전시키는 것도 쉽습니다.
기저 함수의 가중 합을 취하면 원래 신호를 완벽하게 되돌릴 수 있습니다.
따라서 푸리에 분석은 완벽하게 반전할 수 있으며 고유한 솔루션을 제공합니다.
시간 영역에서 주파수 영역으로, 그리고 다시 시간 영역으로 원하는 만큼 이동할 수 있으며 정보 손실은 전혀 없습니다.
지금까지 푸리에 분석의 필수 속성에 대해 알아보았습니다.
푸리에 분석은 sine 파형을 기본 함수로 사용합니다.
가장 낮은 주파수(분석 기간에 따라 주파수가 결정됨)부터 가장 높은 주파수(나이퀴스트 주파수에 따라 주파수가 결정됨)까지 일련의 정현파가 있습니다.
푸리에 '분석'이라고 했지만, 시간 영역에서 주파수 영역으로 변환하는 것을 흔히 '변환'이라고 합니다.
따라서 이제부터는 '푸리에 변환(Fourier transform)'이라는 용어를 더 많이 사용하겠습니다.
푸리에 변환은 이제 주파수 영역으로 우리를 데려다 줄 것입니다.
푸리에 변환은 음성 처리에서 가장 강력하고 널리 사용되는 변환 중 하나입니다.
우리는 시간 영역에서 하는 것보다 주파수 영역에서 훨씬 더 많은 처리를 수행합니다.

We're now going to use a series expansion approach to get from a digital signal in the time domain to its frequency domain representation.
All the signals are digital.
We understand how series expansion works, in a rather abstract way.
We're now going to make that concrete, as Fourier analysis.
Let's just recap series expansion.
We saw how it's possible to express a complex wave, for example this one, as a sum of simple basis functions.
We wrote the complex wave as equal to a weighted sum of sine waves.
Let's write that out a little more correctly.
We have some coefficient - or a weight - times a basis function.
These basis functions have a unit amplitude, so they're scaled by their coefficient and then added together.
We add some number of basis functions in a series, to exactly reconstruct the signal we're analysing.
So this is the analysis: the summation of basis functions weighted by coefficients.
Notice how those basis functions - those sine waves - are a series.
Each one has one more cycle in the analysis frame than the previous one.
These are coefficients and we now need to think about how to find those coefficients, given only the signal being analysed and some pre-specified set of basis functions.
Here is a series of basis functions: just the first four to start with.
I want you to write down their frequencies and work out the relationship between them.
Pause the video.
The duration of the analysis window is 0.01 s.
The lowest frequency basis function makes one cycle in that time, meaning it has a frequency of 100 Hz.
I'm going to start writing units correctly: we put a space between the number and the units.
The second one makes two cycles in the same amount of time, so it must have a frequency of 200 Hz.
The next one makes three cycles, that's 300 Hz.
And 400 Hz.
I hope you got those values.
It's just an equally-spaced series of sine waves, starting with the lowest frequency and then all the multiples of that, evenly spaced.
If I tell you now that the sampling rate is 16 kHz, there's lots more basis functions to go.
What's the highest frequency basis function that you can have?
Pause the video.
Well, we know from digital signals that the highest possible frequency we can represent is at the Nyquist frequency, which is simply half the sampling frequency.
The Nyquist frequency here would be 8 kHz.
We'd better zoom in so we can actually see that.
There we go: this waveform here has a frequency of 8 kHz.
We can't go any higher than that.
Fourier analysis simply means finding the coefficients of the basis functions.
We need somewhere to record the results of our analysis, so I've made some axes on the right.
This horizontal axis is going to be the frequency of the basis function.
Because we're going to go up to a basis function at 8000 Hz (that's 8 kHz), I'll give that units of kHz.
On the vertical axis, I'm going to write the value of the coefficient.
I'm going to call that magnitude.
Here's the lowest frequency basis function.
It's the one at 100 Hz.
So I'm going to plot on the right at 100 hertz (that's 0.1 kHz, of course) how much of this basis function we need to use to reconstruct our signal.
How do we actually work that amount out?
We're going to look at the similarity between the basis function and the signal being analysed.
That's a quantity known as correlation.
That's achieved simply by multiplying the two signals sample by sample.
So we multiply this sample by this sample and add it to this sample by this sample, and this sample by this sample, and so on and add all of that up.
That will give us a large value when the two signals are very similar.
In this example, if I do that for this lowest frequency basis function, I'm going to get a value 0.1.
Let's put some scale on this.
Then I'll do that for the next basis function
That's going to be at 0.2 kHz and I do the correlation and I find out that I need 0.15 of this one.
Then I do the next one, 0.3 kHz, and I find that I need 0.25 of this one.
Then the next one, 0.4 kHz, and I find that I need 0.2 of that one; and so on.
I've plotted a function on the right.
Let's just join the dots to make it easier to see.
This is called the spectrum.
It's the amount of energy in our original signal at each of the frequencies of the basis functions.
We now need to talk about a technical but essential property of Fourier analysis, where the basis functions are sine waves (in other words pure tones).
They contain energy at one and only one frequency.
That means that any pair of sine waves in our series are orthogonal.
Let's see what that means.
Take a pair of basis functions: any pair.
I'll take the first two, and work out the correlation between these two signals.
So multiply them sample by sample: this one by this one, this one by this one, and so on, and work out that sum.
For this pair, it will always be zero.
We can see that simply by symmetry.
These two signals are orthogonal.
There is no energy at this frequency contained in this signal, and vice versa.
The same thing will happen for any pair.
The correlation between them is zero.
There is no energy at this frequency in this waveform, and vice versa.
This property of orthogonality between the basis functions means that when we decompose a signal into a weighted sum of these basis functions, there is a unique solution to that.
In other words, there is only one set of coefficients that works.
That uniqueness is very important.
It means that there's same information in the set of coefficients as there is in the original signal.
It's also easy to invert this transform.
We could just take the weighted sum of basis functions and get back the original signal perfectly.
So Fourier analysis, then, is perfectly invertible, and gives us a unique solution.
We could go from the time domain to the frequency domain, and back to the time domain as many times as we like, and we lose no information.
We've covered, then, the essential properties of Fourier analysis.
It uses sine waves as the basis function.
There is a series of those from the lowest frequency one (and that frequency will be determined by the duration of the analysis window) up to the highest frequency one (and that will be determined by the Nyquist frequency).
We said Fourier 'analysis', but this conversion from time domain to frequency domain is often called a 'transform'.
So from now on we'll more likely say the 'Fourier transform'.
The Fourier transform is what's going to now get us into the frequency domain.
That's one of the most powerful and widely used transformations in speech processing.
We do a lot more processing in the frequency domain than we ever do in the time domain.

'음성' 카테고리의 다른 글

Module 3 - Frequency domain (0)	2023.08.24
Module 3 - Series expansion (0)	2023.08.24
Module 3 - Short-term analysis (0)	2023.08.24
Module 3 - Digital Signal (0)	2023.08.24
Module 3 - Pitch (1)	2023.08.24

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

인기포스트

ABOUT ME

저는 딥러닝을 모릅니다 저는 딥러닝을 모릅니다

'음성' 카테고리의 다른 글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

인기포스트

ABOUT ME

'음성' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역