亚洲人成色7777在线观看不卡,亚洲一区二区在线视频,久久影院亚洲一区

基于CNN和MFCC的語音情感識別

網友投稿 1255 2025-03-31

語音情感識別的主要任務是將蘊含在語音中的情感信息提取出來并識別出其類別。目前對于情感的描述主要有兩種方法。第一種是基于離散的情感劃分，將人類日常生活中廣泛使用的基本情感分為憤怒、開心、興奮、悲傷、厭惡等；另一種是基于連續維度情感劃分，主要通過不同的效價度和激活程度來對不同情感進行區分的。

那么作為一個分類任務，特征選擇是最關鍵的一步。本文中使用的語音特征是梅爾倒譜系數，有關梅爾倒譜系數是什么和怎樣提取的知識，可參閱文章《Python語音信號處理》。

本文在一定程度上參考了MITESHPUTHRANNEU/Speech-Emotion-Analyzer這個項目，下面開始介紹如何通過卷積神經網絡進行語音情感分析。

神經網絡結構

使用到的架構其實還是很簡單的，如下

數據集

我使用到是CASIA的語音情感數據庫。CASIA漢語情感語料庫由中國科學院自動化所（Institute of Automation, Chinese Academy of Sciences）錄制，共包括四個專業發音人，六種情緒生氣（angry）、高興（happy）、害怕（fear）、悲傷（sad）、驚訝（surprise）和中性（neutral），共9600句不同發音。其中300句是相同文本的，也即是說對相同的文本賦以不同的情感來閱讀，這些語料可以用來對比分析不同情感狀態下的聲學及韻律表現；另外100句是不同文本的，這些文本從字面意思就可以看出其情感歸屬，便于錄音人更準確地表現出情感。

但是完整的CASIA數據集是收費的，因此我只找到了1200句殘缺數據集。我把我找到的數據集放在我的網盤上：https://pan.baidu.com/s/1EsRoKaF17Q_3s2t7OMNibQ。

特征提取

我使用librosa模塊進行MFCC的提取，提取代碼如下。

%matplotlib inline

import librosa

import matplotlib.pyplot as plt

import numpy as np

path=r'D:\NLP\dataset\語音情感\test.wav'

y,sr = librosa.load(path,sr=None)

def normalizeVoiceLen(y,normalizedLen):

nframes=len(y)

y = np.reshape(y,[nframes,1]).T

#歸一化音頻長度為2s,32000數據點

if(nframes

res=normalizedLen-nframes

res_data=np.zeros([1,res],dtype=np.float32)

y = np.reshape(y,[nframes,1]).T

y=np.c_[y,res_data]

else:

y=y[:,0:normalizedLen]

return y[0]

def getNearestLen(framelength,sr):

framesize = framelength*sr

#找到與當前framesize最接近的2的正整數次方

nfftdict = {}

lists = [32,64,128,256,512,1024]

for i in lists:

nfftdict[i] = abs(framesize - i)

sortlist = sorted(nfftdict.items(), key=lambda x: x[1])#按與當前framesize差值升序排列

framesize = int(sortlist[0][0])#取最接近當前framesize的那個2的正整數次方值為新的framesize

return framesize

VOICE_LEN=32000

#獲得N_FFT的長度

N_FFT=getNearestLen(0.25,sr)

#統一聲音范圍為前兩秒

y=normalizeVoiceLen(y,VOICE_LEN)

print(y.shape)

#提取mfcc特征

mfcc_data=librosa.feature.mfcc(y=y, sr=sr,n_mfcc=13,n_fft=N_FFT,hop_length=int(N_FFT/4))

# 畫出特征圖，將MFCC可視化。轉置矩陣，使得時域是水平的

plt.matshow(mfcc_data)

plt.title('MFCC')

上面代碼的作用是加載聲音，取聲音的前兩秒進行情感分析。getNearestLen()函數根據聲音的采樣率確定一個合適的語音幀長用于傅立葉變換。然后通過librosa.feature.mfcc()函數提取mfcc特征，并將其可視化。

下面的代碼將數據集中的mfcc特征提取出來，并對每幀的mfcc取平均，將結果保存為文件。

#提取特征

import os

import pickle

counter=0

fileDirCASIA = r'D:\NLP\dataset\語音情感\CASIA database'

mfccs={}

mfccs['angry']=[]

mfccs['fear']=[]

mfccs['happy']=[]

mfccs['neutral']=[]

mfccs['sad']=[]

mfccs['surprise']=[]

mfccs['disgust']=[]

listdir=os.listdir(fileDirCASIA)

for persondir in listdir:

if(not r'.' in persondir):

emotionDirName=os.path.join(fileDirCASIA,persondir)

emotiondir=os.listdir(emotionDirName)

for ed in emotiondir:

if(not r'.' in ed):

filesDirName=os.path.join(emotionDirName,ed)

files=os.listdir(filesDirName)

for fileName in files:

if(fileName[-3:]=='wav'):

counter+=1

fn=os.path.join(filesDirName,fileName)

print(str(counter)+fn)

y,sr = librosa.load(fn,sr=None)

y=normalizeVoiceLen(y,VOICE_LEN)#歸一化長度

mfcc_data=librosa.feature.mfcc(y=y, sr=sr,n_mfcc=13,n_fft=N_FFT,hop_length=int(N_FFT/4))

feature=np.mean(mfcc_data,axis=0)

mfccs[ed].append(feature.tolist())

with open('mfcc_feature_dict.pkl', 'wb') as f:

pickle.dump(mfccs, f)

數據預處理

代碼如下：

%matplotlib inline

import pickle

import os

import librosa

import matplotlib.pyplot as plt

import numpy as np

from keras import layers

from keras import models

from keras import optimizers

from keras.utils import to_categorical

#讀取特征

mfccs={}

with open('mfcc_feature_dict.pkl', 'rb') as f:

mfccs=pickle.load(f)

#設置標簽

emotionDict={}

emotionDict['angry']=0

emotionDict['fear']=1

emotionDict['happy']=2

emotionDict['neutral']=3

emotionDict['sad']=4

emotionDict['surprise']=5

data=[]

labels=[]

data=data+mfccs['angry']

print(len(mfccs['angry']))

for i in range(len(mfccs['angry'])):

labels.append(0)

data=data+mfccs['fear']

print(len(mfccs['fear']))

for i in range(len(mfccs['fear'])):

labels.append(1)

print(len(mfccs['happy']))

data=data+mfccs['happy']

for i in range(len(mfccs['happy'])):

labels.append(2)

print(len(mfccs['neutral']))

data=data+mfccs['neutral']

for i in range(len(mfccs['neutral'])):

labels.append(3)

print(len(mfccs['sad']))

data=data+mfccs['sad']

for i in range(len(mfccs['sad'])):

labels.append(4)

print(len(mfccs['surprise']))

data=data+mfccs['surprise']

for i in range(len(mfccs['surprise'])):

labels.append(5)

print(len(data))

print(len(labels))

#設置數據維度

data=np.array(data)

data=data.reshape((data.shape[0],data.shape[1],1))

labels=np.array(labels)

labels=to_categorical(labels)

#數據標準化

DATA_MEAN=np.mean(data,axis=0)

DATA_STD=np.std(data,axis=0)

data-=DATA_MEAN

data/=DATA_STD

接下來保存好參數，模型預測的時候需要用到。

paraDict={}

paraDict['mean']=DATA_MEAN

paraDict['std']=DATA_STD

paraDict['emotion']=emotionDict

with open('mfcc_model_para_dict.pkl', 'wb') as f:

pickle.dump(paraDict, f)

最后是打亂數據集并劃分訓練數據和測試數據。

ratioTrain=0.8

numTrain=int(data.shape[0]*ratioTrain)

permutation = np.random.permutation(data.shape[0])

data = data[permutation,:]

labels = labels[permutation,:]

x_train=data[:numTrain]

x_val=data[numTrain:]

y_train=labels[:numTrain]

y_val=labels[numTrain:]

print(x_train.shape)

print(y_train.shape)

print(x_val.shape)

print(y_val.shape)

定義模型

使用keras定義模型，代碼如下：

from keras.utils import plot_model

from keras import regularizers

model = models.Sequential()

model.add(layers.Conv1D(256,5,activation='relu',input_shape=(126,1)))

model.add(layers.Conv1D(128,5,padding='same',activation='relu',kernel_regularizer=regularizers.l2(0.001)))