采样定理的演示实验

采样定理

在实际测量中，很多信号本来是连续变化的，例如声音、机械振动、电压、电流等。但计算机只能记录离散数据，所以我们需要每隔一小段时间测一次。

采样定理回答的问题是：

采样频率要多高，才能从离散数据中正确看出原信号的频率信息？

如果信号中最高的有效频率是 $f_{\max}$，那么采样频率 $f_s$ 应该满足 $f_s / 2 > f_{\max}$，其中 $f_s / 2$ 叫做 Nyquist 频率，表示在当前采样频率下能可靠分辨的最高频率。

采样频率为什么有下限？

如果采样频率太低，高频信号不会简单地消失，而是会“折叠”成低频信号，这叫混叠。

例如真实信号里有一个 $70\ \mathrm{Hz}$ 的频率成分。如果采样频率是 $100\ \mathrm{Hz}$，那么 Nyquist 频率是 $50\ \mathrm{Hz}$。由于 $70$ Hz 超过了 $50$ Hz，它会在采样后表现成 $100-70=30\ \mathrm{Hz}$，也就是说：$70\ \mathrm{Hz} \to 30\ \mathrm{Hz}$。这时我们在频谱里看到的 $30$ Hz 峰，可能并不是原信号真的有 $30$ Hz，而是 $70$ Hz 混叠之后的假象。（如果原始信号有 $30$ Hz，那么两者的频谱贡献叠加）

临界值并不可靠

如果最高频率是 $70$ Hz，那么理论边界是 $2\times 70=140\ \mathrm{Hz}$，但实际中通常不会刚好取 $140$ Hz。因为 $70$ Hz 正好位于 Nyquist 边界，是一个很特殊的位置，容易受到相位、噪声和数值处理方式的影响。
采样定理通常写成 $f_s>2f_{\max}$，而不是 $f_s\ge 2f_{\max}$，实践中一般会留出余量，比如最高频率是 $70$ Hz 时，可以取 $160$ Hz、$200$ Hz 或更高。

低通滤波器

采样定理还有一个前提：信号本身要限制在 Nyquist 频率以内。但真实信号往往不是严格带限的，可能含有更高频率的噪声或扰动。为了避免这些高频成分混叠到低频区域，实际系统通常会在采样前加一个低通滤波器，也叫抗混叠滤波器。

实际流程可以理解为：

$$
\text{连续信号}
\rightarrow
\text{低通滤波}
\rightarrow
\text{采样}
\rightarrow
\text{频谱分析}.
$$

低通滤波器的作用不是简单地让信号变平滑，而是防止超过 Nyquist 频率的高频成分进入采样过程。

代码演示

下面的代码构造了一个包含多个频率的信号：

$$
20\ \mathrm{Hz},\quad 45\ \mathrm{Hz},\quad 70\ \mathrm{Hz},
$$

对应的振幅依次为 $1$，$0.8$，$0.6$，并加入小幅度的随机噪声以模拟实际测量。

然后我们用不同采样频率采样：

$$
200,\ 141,\ 140,\ 139,\ 100,\ 90,\ 80\ \mathrm{Hz}.
$$

可以观察到：

采样频率足够高时，频谱峰出现在真实频率；
采样频率不足时，高频峰会折叠到错误的低频位置；
$140$ Hz 是临界情况，$70$ Hz 正好落在 Nyquist 边界；（实践中会避免临界情况，预留一些冗余）
噪声只会让频谱变脏，而混叠会系统性地产生错误频率。

完整代码如下

import numpy as np
import matplotlib.pyplot as plt

# ============================================================
# 实际测量信号：多个频率叠加 + 随机相位 + 小随机噪声
# ============================================================

true_freqs = np.array([20, 45, 70])  # 真实频率，Hz
amps = np.array([1.0, 0.8, 0.6])  # 各频率振幅

T = 2.0  # 观测 2 秒
sampling_rates = [200, 141, 140, 139, 100, 90, 80]

sigma = 0.08  # 小噪声强度
rng = np.random.default_rng(1)

# 给每个频率分量加随机相位，使信号更像实际测量
phases = rng.uniform(0, 2 * np.pi, size=len(true_freqs))


def clean_signal(t):
    x = np.zeros_like(t)
    for a, f, phi in zip(amps, true_freqs, phases):
        x += a * np.sin(2 * np.pi * f * t + phi)
    return x


# ============================================================
# 构造一个“连续版本”的带噪声输入信号
# ============================================================

fs_fine = 10000
t_fine = np.arange(0, T, 1 / fs_fine)

# 在细网格上生成小随机噪声
noise_fine = sigma * rng.standard_normal(len(t_fine))

# 实际输入信号的连续版本：干净信号 + 细网格噪声
y_fine = clean_signal(t_fine) + noise_fine


def noisy_measurement(t):
    """
    从连续带噪声输入信号上采样。
    """
    return np.interp(t, t_fine, y_fine)


def alias_frequency(f, fs):
    """
    把真实频率 f 映射到采样后可见的 [0, fs/2] 区间。
    """
    r = f % fs

    if r > fs / 2:
        r = fs - r

    return r


def freq_label(f, vf, fs=None):
    """
    未混叠：显示 a Hz
    混叠：显示 a Hz → b Hz
    Nyquist 点：显示 a Hz (Nyquist)
    """
    if fs is not None and np.isclose(vf, fs / 2):
        return f"{f:.0f} Hz (Nyquist)"
    elif np.isclose(f, vf):
        return f"{f:.0f} Hz"
    else:
        return f"{f:.0f} Hz → {vf:.0f} Hz"


def grouped_visible_components(true_freqs, amps, fs):
    """
    按采样后的可见频率分组。

    返回格式：
    {
        visible_frequency: [(true_frequency, amplitude), ...]
    }
    """
    groups = {}

    for f, a in zip(true_freqs, amps):
        vf = alias_frequency(f, fs)

        # 避免浮点数作为字典 key 时出现 69.999999 之类的问题
        key = round(vf, 10)

        if key not in groups:
            groups[key] = []

        groups[key].append((f, a))

    return groups


def grouped_freq_label(components, vf, fs):
    """
    给频谱图使用的合并标签。

    如果只有一个频率：
        20 Hz
        70 Hz → 30 Hz
        70 Hz (Nyquist)

    如果多个频率混到同一个可见频率：
        20 Hz + 70 Hz → 20 Hz
    """
    labels = []

    for f, a in components:
        if np.isclose(vf, fs / 2):
            labels.append(f"{f:.0f} Hz (Nyquist)")
        elif np.isclose(f, vf):
            labels.append(f"{f:.0f} Hz")
        else:
            labels.append(f"{f:.0f} Hz")

    if len(components) == 1:
        f, a = components[0]
        return freq_label(f, vf, fs)

    left = " + ".join(labels)
    return f"{left} → {vf:.0f} Hz"


def grouped_amplitude_label(components):
    """
    控制台输出用：显示混到同一可见频率的原始频率及其强度。
    """
    parts = [f"{f:.0f} Hz, amp={a:.2f}" for f, a in components]
    return " + ".join(parts)


def sinc_reconstruction(t_eval, t_sample, y_sample, fs):
    """
    用理想 sinc 插值从采样点重建信号。
    """
    Ts = 1 / fs
    S = np.sinc((t_eval[:, None] - t_sample[None, :]) / Ts)
    return S @ y_sample


def one_sided_amplitude_spectrum(y, fs):
    """
    计算实值信号的一边振幅谱。

    对普通正频率，振幅需要乘 2；
    但 0 Hz 和 Nyquist 频率点不能乘 2。
    """
    Y = np.fft.rfft(y)
    freq = np.fft.rfftfreq(len(y), d=1 / fs)

    amplitude = np.abs(Y) / len(y)

    if len(y) % 2 == 0:
        # 偶数长度：0 Hz 和 Nyquist 点不乘 2
        amplitude[1:-1] *= 2
    else:
        # 奇数长度：只有 0 Hz 不乘 2
        amplitude[1:] *= 2

    return freq, amplitude


# ============================================================
# 多组采样频率下的时域采样数据、重建信号和频谱
# ============================================================

for fs in sampling_rates:
    # 采样点
    t = np.arange(0, T, 1 / fs)
    y = noisy_measurement(t)

    # 用更密的时间点画重建曲线
    t_recon = np.linspace(0, T, 4000, endpoint=False)
    y_recon = sinc_reconstruction(t_recon, t, y, fs)

    # FFT 频谱
    freq, amplitude = one_sided_amplitude_spectrum(y, fs)

    visible_groups = grouped_visible_components(true_freqs, amps, fs)

    print(f"sampling rate fs = {fs} Hz")
    print(f"Nyquist frequency = {fs / 2} Hz")
    print("visible frequencies after sampling:")

    for vf, components in sorted(visible_groups.items()):
        total_amp = sum(a for _, a in components)

        print(
            f"  {grouped_freq_label(components, vf, fs)}"
            f"    total input amplitude ≈ {total_amp:.2f}"
            f"    ({grouped_amplitude_label(components)})"
        )

    fig, axes = plt.subplots(1, 2, figsize=(12, 4))

    # --------------------------------------------------------
    # 图 1：实际输入信号连续版本 + 采样点 + 重建信号
    # --------------------------------------------------------

    axes[0].plot(
        t_fine,
        y_fine,
        linewidth=1.0,
        alpha=0.7,
        label="actual noisy input signal",
    )

    axes[0].plot(
        t_recon,
        y_recon,
        linewidth=1.5,
        label="reconstructed signal from samples",
    )

    axes[0].plot(
        t,
        y,
        marker="o",
        markersize=5,
        linewidth=0,
        label="samples",
    )

    axes[0].set_xlim(0, 0.25)
    axes[0].set_xlabel("time (s)")
    axes[0].set_ylabel("amplitude")
    axes[0].set_title(f"Input, samples, and reconstruction, fs = {fs} Hz")
    axes[0].legend()
    axes[0].grid(True)

    # --------------------------------------------------------
    # 图 2：采样数据的频谱
    # --------------------------------------------------------

    axes[1].plot(freq, amplitude, linewidth=1.5)

    axes[1].axvline(
        fs / 2,
        linestyle="--",
        label=f"Nyquist = {fs / 2:.1f} Hz",
    )

    for vf, components in sorted(visible_groups.items()):
        axes[1].axvline(vf, linestyle=":", alpha=0.8)

        axes[1].text(
            vf,
            max(amplitude) * 0.85,
            grouped_freq_label(components, vf, fs),
            rotation=90,
            va="top",
            ha="right",
        )

    axes[1].set_xlim(0, fs / 2)
    axes[1].set_xlabel("frequency (Hz)")
    axes[1].set_ylabel("amplitude")
    axes[1].set_title(f"FFT spectrum of samples, fs = {fs} Hz")
    axes[1].legend()
    axes[1].grid(True)

    plt.show()

注：完整观测时长是 2s，FFT 和重建都使用了完整 2s 数据；为了看清局部细节，时域图只显示前 0.25s。

采样频率 200 Hz 的输出如下

sampling rate fs = 200 Hz
Nyquist frequency = 100.0 Hz
visible frequencies after sampling:
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  45 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)
  70 Hz    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)

采样频率 141 Hz 的输出如下

sampling rate fs = 141 Hz
Nyquist frequency = 70.5 Hz
visible frequencies after sampling:
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  45 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)
  70 Hz    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)

采样频率 140 Hz 的输出如下

sampling rate fs = 140 Hz
Nyquist frequency = 70.0 Hz
visible frequencies after sampling:
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  45 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)
  70 Hz (Nyquist)    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)

采样频率 139 Hz 的输出如下

sampling rate fs = 139 Hz
Nyquist frequency = 69.5 Hz
visible frequencies after sampling:
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  45 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)
  70 Hz → 69 Hz    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)

采样频率 100 Hz 的输出如下

sampling rate fs = 100 Hz
Nyquist frequency = 50.0 Hz
visible frequencies after sampling:
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  70 Hz → 30 Hz    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)
  45 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)

采样频率 90 Hz 的输出如下

sampling rate fs = 90 Hz
Nyquist frequency = 45.0 Hz
visible frequencies after sampling:
  20 Hz + 70 Hz → 20 Hz    total input amplitude ≈ 1.60    (20 Hz, amp=1.00 + 70 Hz, amp=0.60)
  45 Hz (Nyquist)    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)

采样频率 80 Hz 的输出如下

sampling rate fs = 80 Hz
Nyquist frequency = 40.0 Hz
visible frequencies after sampling:
  70 Hz → 10 Hz    total input amplitude ≈ 0.60    (70 Hz, amp=0.60)
  20 Hz    total input amplitude ≈ 1.00    (20 Hz, amp=1.00)
  45 Hz → 35 Hz    total input amplitude ≈ 0.80    (45 Hz, amp=0.80)