회귀 모델 실습

1. 사용 라이브러리 불러오기

# 사용 라이브러리 

import numpy as np
import matplotlib.pyplot as plt

2. 고정된 출력 확인 하기 위한 Random Seed 고정

import os, random

# random seed 고정 
def set_seeds(seed):
    os.environ['PYTHONHASHSEED'] = str(seed)
    random.seed(seed)
    np.random.seed(seed)
#     tf.random.set_seed(seed) # Tensorflow 사용시 

SEED = 555
set_seeds(SEED)

3. 특성 행렬과 타겟 벡터 생성

r = np.random.RandomState(10)
x = 10 * r.rand(100)
y = 2 * x - 3 * r.rand(100)
plt.scatter(x,y)

<matplotlib.collections.PathCollection at 0x7fad0b154d50>

print(x.shape)
print(y.shape)

(100,)
(100,)

특성 행렬과 타겟 벡터 fit 인자에 맞게 Reshape

reshape하지 않으면 (100, )꼴의 특성 행렬(Matrix)와 타겟 벡터(Vector)가 곱해지기 때문에 곱셈을 위한 차원?크기?가 맞지 않아 아래와 같은 에러가 발생한다.

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Reshape 방법1

행렬 크기를 지정한다.

X = x.reshape(100,1)

Reshape 방법2

-1을 row나 columns 인자로 해주면 전체 행렬 크기에 맞게 나머지 값 설정된다. 방법1, 2 둘 중 한 방법 이용하면 된다. 좀 더 일반적인 코드가 2번째 방법이다.

# X_ = x_new.reshape(-1,1)
# X_.shape

4. 선형 회귀 모델(Linear Regression)로 학습(fit)

from sklearn.linear_model import LinearRegression

model = LinearRegression()
model.fit(X,y)

LinearRegression()

5. 새로운 입력 행렬로 예측(predict)

x_new = np.linspace(-1, 11, 100) # 특성 행렬
X_new = x_new.reshape(100,1)     # 특성 행렬 형태 변환 
y_new = model.predict(X_new)     # 예측

6. RMSE 평가

from sklearn.metrics import mean_squared_error

error = mean_squared_error(y, y_new) # y_true, y_pred
error

86.4719257443381

7. 평가 결과 시각화(scatter)

plt.scatter(x, y, label='input data')
plt.plot(X_new, y_new, color='red', label='regression line')

[<matplotlib.lines.Line2D at 0x7fad01064390>]

[Scikit-learn]Linear Regression(선형 회귀)

회귀 모델 실습

1. 사용 라이브러리 불러오기

2. 고정된 출력 확인 하기 위한 Random Seed 고정

3. 특성 행렬과 타겟 벡터 생성

특성 행렬과 타겟 벡터 fit 인자에 맞게 Reshape

Reshape 방법1

Reshape 방법2

4. 선형 회귀 모델(Linear Regression)로 학습(fit)

5. 새로운 입력 행렬로 예측(predict)

6. RMSE 평가

7. 평가 결과 시각화(scatter)

티스토리툴바