태태개발일지 - 머신러닝 기초

부트캠프/항해 AI

태태개발일지 - 머신러닝 기초

태태코 2025. 4. 10. 17:25

선형회귀

독립 변수와 종속 변수간의 선형 관계를 모델링한것. y = mx + b 형태의 직선 방정식을 사용하여 데이터를 예측한다.

ex) 집값 예측, 판매량 예측

import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt

# 데이터 준비
data = {'area': [1500, 2000, 2500], 'price': [300000, 400000, 500000]}
df = pd.DataFrame(data)

# 모델 학습
model = LinearRegression()
model.fit(df[['area']], df['price'])

# 예측 및 시각화
predicted_price = model.predict([[3300]])  # 면적 3300일 때 가격 예측
print(f"Predicted price: {predicted_price[0]}")

plt.scatter(df['area'], df['price'], color='red', marker='+')
plt.plot(df['area'], model.predict(df[['area']]), color='blue')
plt.xlabel('Area')
plt.ylabel('Price')
plt.show()

다중 선형 회귀

여러 독립 변수를 사용하여 종속 변수를 예측한다. y= b0 + b1x1 + b2x2 + ------ + bnxn

ex) 여러요인을 통한 집값 예측

from sklearn.model_selection import train_test_split

# 데이터 준비
data = {'area': [1500, 2000, 2500], 'bedrooms': [3, 4, 5], 'price': [300000, 400000, 500000]}
df = pd.DataFrame(data)

# 독립 변수와 종속 변수 분리
X = df[['area', 'bedrooms']]
y = df['price']

# 모델 학습
model = LinearRegression()
model.fit(X, y)

# 예측
predicted_price = model.predict([[3300, 4]])  # 면적 3300, 방 개수 4일 때 가격 예측
print(f"Predicted price: {predicted_price[0]}")

다중회귀

비선형 데이터를 학습하기 위해 독립 변수의 차수를 확장하여 다항식으로 모델링한다. y= b0 + b1x + b2x2...

ex) 곡선 형태의 데이터 모델

from sklearn.preprocessing import PolynomialFeatures

# 데이터 준비
X = np.array([1, 2, 3, 4]).reshape(-1, 1)
y = np.array([1.5, 3.5, 7.5, 13.5])

# 다항 특성 생성 (2차)
poly_features = PolynomialFeatures(degree=2)
X_poly = poly_features.fit_transform(X)

# 모델 학습
model = LinearRegression()
model.fit(X_poly, y)

# 예측 및 시각화
predicted_y = model.predict(X_poly)
print(f"Predicted values: {predicted_y}")

plt.scatter(X, y, color='red')
plt.plot(X, predicted_y, color='blue')
plt.xlabel('X')
plt.ylabel('y')
plt.show()

로지스틱 회귀

데이터를 특정 범주로 분류하는 데 사용되는 지도 학습 알고리즘, 시그모이드 함수를 사용하여 출력값을 확률로 변환.

ex) 이진분류, 다중클래스 분류

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# 데이터 생성
X, y = make_classification(n_samples=100, n_features=2, n_classes=2, random_state=42)

# 데이터 분리
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# 모델 학습 및 평가
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)

accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy}")