机器学习之logistic回归

在机器学习中，逻辑斯谛回归(Logistic regression)是经典的用线性回归做二分类的模型。通过样本数据建立模型，对新样本发生的概率进行预测。

Logistic回归初识

在上一篇博客中，总结了线性回归模型可以用来可以解决回归问题，线性模型能否用来做分类呢？答案是肯定的。怎么做分类呢？我们知道将样本数据加权求和得到的是一个连续的数据，针对二分类的问题，判断是或否的问题，说到这，不得不引出一个函数sigmoid函数，sigmoid函数可以将样本数据映射成概率，所以，sigmoid函数取值介于0-1之间。

sigmoid函数的推导

先占坑，后期补上。

sigmoid函数

sigmoid函数表达式可以表示为：

$s(t)=\frac{1}{1+e^{-t}}$

sigmoid函数的导数为：

$s'(t)=(\frac{1}{1+e^{-x}})'=\frac{e^{-x}}{(1+e^{-x})^2}$ $=\frac{1}{1+e^{-x}·\frac{e^{-x}}{1+e^{-x}}}=\frac{1}{1+e^{-x}}(1-\frac{1}{1+e^{-x}})=s(t)(1-s(t))$

逻辑回归模型

逻辑回归主要是将输入的样本数据映射成概率，使得分类结果尽可能地接近真实值。逻辑回归建模后得到的分类结果用数学表达式表示为：

$h_{\theta}(x)=s(\theta^Tx)=\frac{1}{1+e^{-\theta^Tx}}$

目标函数

参数估计

假定：

$p(y=1|x;\theta)=h_{\theta}(x)$ $p(y=0|x;\theta)=1-h_{\theta}(x)$

将这个分段函数合并，得到：

$p(y|x;\theta)=(h_{\theta(x)})^y(1-h_{\theta(x)})^{1-y}$

根据最大似然估计，似然函数为：

$L(\theta)=p(y|X;\theta)=\prod_{i=1}^m(h_{\theta}(x^{(i)}))^{y^{(i)}}(1-h_{\theta}(x^{(i)}))^{1-y^{(i)}}$

对数似然函数为：

$l(\theta)=logL(\theta)=\sum_{i=1}^my^{(i)}logh(x^{(i)})+(1-y^{(i)})log(1-h(x^{(i)}))$

对对数似然函数求偏导：

$\frac{\partial l(\theta)}{\partial \theta_j}=\sum_{i=1}^m(\frac{y^{(i)}}{h(x^{(i)})}-\frac{1-y^{(i)}}{1-h(x^{(i)})})·\frac{\partial h(x^{(i)})}{\partial \theta_j}$

将$h_{\theta}(x)=s(\theta^Tx)$带入上式，得：

$\frac{\partial l(\theta)}{\partial\theta_j}=\sum_{i=0}^m(y^{(i)}-s(\theta^Tx^{(i)}))·x_j^{(i)}$

求对数似然函数最大值，就是沿着梯度的方向，不断寻找最大值。Logistic回归参数的学习规则：

$\theta_j=\theta_j+\alpha(y^{(i)}-h_{\theta}(x^{(i)}))x_j^{(i)}$

损失函数

真实值$y_i$为0或1，假设预测值$y_i$输出概率如下：

$\hat y_i = \begin{cases} p_i, & y_i=1 \\ 1-p_i, & yi=0 \end{cases}$

似然函数可以表示为：

$L(\theta)=\prod_{i=1}^mp_i^{y_i}(1-p_i)^{1-y_i}$

则对数似然函数表示为：

$l(\theta)=\sum_{i=1}^mln[p_i^{y_i}(1-p_i)^{1-y_i}]$

令$p_i=\frac{1}{1+e^{-f_i}}$,对数似然函数表示为：

$l(\theta)=\sum_{i=1}^mln[(\frac{1}{1+e^{-f_i}})^{y_i}(\frac{1}{1+e^{f_i}})^{1-y_i}]$

取对数似然函数的最大值，即求$loss(y_i,\hat y_i)=-l(\theta)$的最小值。
目标函数：

$\bf{loss(y_i,\hat y_i)=\sum_{i=1}^m[y_iln(1+e^{-f_i})+(1-y_i)ln(1+e^{fi})]}$

将真实值$y_i$的取值映射到-1,1，可得：

$\hat y_i = \begin{cases} p_i, & y_i=1 \\ 1-p_i, & yi=-1 \end{cases}$

似然函数可以表示为：

$L(\theta)=\prod_{i=1}^mp_i^{(y_i+1)/2}(1-p_i)^{(1-y_i)/2}$

则对数似然函数表示为：

$l(\theta)=\sum_{i=1}^mln[p_i^{(y_i+1)/2}(1-p_i)^{(1-y_i)/2}]$

令$p_i=\frac{1}{1+e^{-f_i}}$,对数似然函数表示为：

$l(\theta)=\sum_{i=1}^mln[(\frac{1}{1+e^{-f_i}})^{(y_i+1)/2}(\frac{1}{1+e^{f_i}})^{(1-y_i)/2}]$

取对数似然函数的最大值，即求$loss(y_i,\hat y_i)=-l(\theta)$的最小值，最终目的还是为了得到损失函数取最小的时候参数的取值。
目标函数：

$loss(y_i,\hat y_i)=\sum_{i=1}^m[\frac12(y_i+1)ln(1+e^{-f_i})-\frac12(y_i-1)ln(1+e^{fi})]$

简化一下可得：

$\bf{loss(y_i,\hat y_i)=\sum_{i=1}^mln(1+e^{-y_i·f_i})}$

发生比

一个事件的发生比(odds)是指该事件发生的概率与该事件不发生概率的比值。如果事件发生的概率为$p$，则该事件不发生的概率为$1-p$，那么该事件的发生比为$\frac{p}{1-p}$，该事件的对数发生比，或logit函数是:

$logit(p)=log\frac{p}{1-p}=log\frac{h_{\theta}(x)}{1-h_{\theta}x}=\theta^Tx$

Softmax回归

Softmax回归适用于多分类的情形。Softmax是将K个类别的分类结果进行one-hot编码,假设第K个类别的参数为$\vec\theta_k$,则组成的参数矩阵应该为$\theta_{k×n}$。

目标函数

Softmax回归中事情发生的概率，可以表示为：

$p(c=k|x;\theta)=\frac{exp(\theta_k^Tx)}{\sum_{i=1}^kexp(\theta_i^Tx)},k=1,2,....k$

则似然函数可以表示为：

$L(\theta)=\prod_{i=1}^n\prod_{k=1}^kp(c=k|x^{(i)};\theta)^{y_k^{(i)}}$

则对数似然函数可以表示为:

$J(\theta)=lnL(\theta)=\sum_{i=1}^n\sum_{k=1}^ky_k^{(i)}·(\theta_k^Tx^{(i)}-ln\sum_{k=1}^kexp(\theta_k^Tx^{(i)}))$

目标函数则是求对数似然函数的最大值。

Logistic应用代码

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
from sklearn.preprocessing import StandardScaler,PolynomialFeatures
from sklearn.pipeline import Pipeline
import matplotlib.patches  as mpatches
if __name__=='__main__':
   np.set_printoptions(suppress=True)
   path='./Data/iris.data'
   data=pd.read_csv(path,header=None)
   #将第四列转化成0 1 2
   data[4]=pd.Categorical(data[4]).codes
   #按行将每一行4个4个元素的分开，即特征和label分开
   x,y=np.split(data.values,(4,),axis=1)
   x=x[:,:2]#取前两列
   lr=Pipeline([
       ('SC',StandardScaler()),
       ('poly',PolynomialFeatures(degree=3)),
       ('clf',LogisticRegression())
       ])
   lr.fit(x,y.ravel())
   y_hat=lr.predict(x)
   #返回概率
   y_hat_prob=lr.predict_proba(x)
   acc=100*np.mean(y_hat==y.ravel())
   print("准确率:%.2f"%(acc))
   #画图
   N,M=500,500
   x1_min,x1_max=x[:,0].min(),x[:,0].max()
   x2_min,x2_max=x[:,1].min(),x[:,1].max()
   t1=np.linspace(x1_min,x1_max, N)
   t2=np.linspace(x2_min,x2_max,M)
   #将t1复制n行，t2复制n列
   x1,x2=np.meshgrid(t1,t2)
   x_test=np.stack((x1.flat,x2.flat),axis=1)#生成网格采样点,两个列表按行进行堆叠
   print(x_test)#测试点
   mpl.rcParams['font.sans-serif'] = ['simHei']
   mpl.rcParams['axes.unicode_minus'] = False
   cm_light = mpl.colors.ListedColormap(['#77E0A0', '#FF8080', '#A0A0FF'])
   cm_dark = mpl.colors.ListedColormap(['g', 'r', 'b'])
   y_hat=lr.predict(x_test)
   y_hat=y_hat.reshape(x1.shape)
   plt.figure(facecolor='w')
   plt.pcolormesh(x1,x2,y_hat,cmap=cm_light)#划分区域，预测值的显示
   plt.scatter(x[:,0],x[:,1],c=y.ravel(),edgecolors='k',s=50,cmap=cm_dark)#样本的显示
   plt.xlabel("花萼长度",fontsize=18)
   plt.ylabel("花萼宽度",fontsize=18)
   plt.xlim(x1_min,x1_max)
   plt.ylim(x2_min,x2_max)
   plt.grid()
   ###生成图形，在做面积图的时候有用
   patchs = [mpatches.Patch(color='#77E0A0', label='Iris-setosa'),
             mpatches.Patch(color='#FF8080', label='Iris-versicolor'),
             mpatches.Patch(color='#A0A0FF', label='Iris-virginica')]
   plt.legend(handles=patchs)
   plt.title('鸢尾花Logistic回归分类效果', fontsize=17)
   plt.show()