1411-李同学

, ,

# 分类算法-朴素贝叶斯算法

## 三、朴素贝叶斯 : 特征之间需要相互独立

### 文档分类

• p(科技|文档) 文档1： 词1，词2，词3

• p(娱乐|文档) 文档2：词a，词b，词c

## 算法案例

### 算法代码，统计20个新闻稿

from sklearn.neighbors import KNeighborsClassifier
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import  MultinomialNB
def naviebayes():
#朴素贝叶斯进行文本分类
news=fetch_20newsgroups(subset="all")

#进行时飓风厄
x_train,x_test,y_train,y_test=train_test_split(news.data,news.target,test_size=0.25)

#对数据集进行特征抽取
tf=TfidfVectorizer()

#以训练集当中的词的列表进行每篇文章重要性统计["a","b","c","d"]
x_train=tf.fit_transform(x_train)

print(tf.get_feature_names())

x_test=tf.transform(x_test)

#进行朴素贝叶斯算法的预测
mlt=MultinomialNB(alpha=1.0)

print(x_train.toarray())

mlt.fit(x_train,y_train)

y_predict=mlt.predict(x_test)

print("预测的文章类别为：",y_predict)

#得出准确率
print("准确率：",mlt.score(x_test,y_test))

if __name__=="__main__":
naviebayes()


Vieu3.3主题

Q Q 登 录