๋ณธ๋ฌธ ๋ฐ”๋กœ๊ฐ€๊ธฐ

Computer Science/ํŒŒ์ด์ฌ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ&sklearn&keras

ML&DL_sklearn๊ณต๋ถ€(2) << Iris Data๋ฅผ ์ด์šฉํ•œ ํ•™์Šต๊ณผ ํ‰๊ฐ€

ML&DL_sklearn๊ณต๋ถ€(2). Iris Data๋ฅผ ์ด์šฉํ•œ ํ•™์Šต๊ณผ ํ‰๊ฐ€

๋ฐ์ดํ„ฐ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ.

from sklearm.dataset import load_iris
data = load_iris()
  • ์—ฌ๊ธฐ์„œ load_iris๋Š” ๋ฐ์ดํ„ฐ ๋ฐ ๋ฐ์ดํ„ฐ์˜ ์„ค๋ช…์„ ๋‹ด์€ ๋”•์…”๋„ˆ๋ฆฌ๋ฅผ ๋ฐ˜ํ™˜.
  • dictionary๋ฅผ ์•„๋ž˜์˜ ์ฝ”๋“œ๋ฅผ ํ†ตํ•ด ํ™•์ธํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.

์ „์ฒ˜๋ฆฌ ๋ฐ EDA

np.unique(data.target, return_counts = True)
# uniqueํ•œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•ด์ฃผ๊ณ , return_counts๋ฅผ ์„ค์ •ํ•ด์ฃผ๋ฉด ๊ฐฏ์ˆ˜๋„ ๋ฐ˜ํ™˜ํ•ด์ค€๋‹ค.

print(data.target_names) #['setosa' 'versicolor' 'virginica']

print(data.target_names.shape, data.target_names[data.target].shape) # (3,) (150,)

# ์—ด์ด๋ฆ„, ํƒ€๊ฒŸ๊ฐ’ ์ˆ˜์ •
iris.columns = ['sl', 'sw', 'pl', 'pw']
iris['Species'] = data.target_names[data.target]
# => data.target_names๋ฅผ data.target์˜ ๊ฐœ์ˆ˜ ์ฆ‰ 150๋ฒˆ๋งŒํผ ์ƒ‰์ธํ•ด์ค€๋‹ค๋Š” ๊ฒƒ์„ ์˜๋ฏธ. ํƒ€๊ฒŸ๊ฐ’์„ ์ˆ˜์ •.
iris.head()

๊ฒฐ์ธก๊ฐ’ ์žˆ๋Š”์ง€ ํ™•์ธํ•ด์ฃผ๊ธฐ.

  • ๊ฒฐ์ธก๊ฐ’์ด ์žˆ๋‹ค๋ฉด ์ฒ˜๋ฆฌ๋ฅผ ํ•ด์ฃผ์–ด์•ผ ํ•จ.
  • ๊ฒฐ์ธก๊ฐ’ ์ œ๊ฑฐ: dropna
  • ๊ฒฐ์ธก๊ฐ’ ๋Œ€์ฒด: fillna, interpolate
iris.isna().sum() # ๊ฒฐ์ธก๊ฐ’ ๋ช‡๊ฐœ๋ƒ?

๊ธฐ์ดˆํ†ต๊ณ„ ๋ถ„์„

  • iris.describe() : ํ†ต๊ณ„์ •๋ณด ์•Œ๋ ค์คŒ.
  • iris.info() : Dataframe์˜ ๊ธฐ๋ณธ์ •๋ณด ํ™•์ธ.
  • iris.groupby('Species').size() : ์ข…๋ฅ˜๋ณ„๋กœ ๊ฐœ์ˆ˜ ์•Œ๋ ค์คŒ.
  • iris.Species.value_counts() : ์ข…๋ฅ˜๋ณ„๋กœ ๊ฐœ์ˆ˜ ์•Œ๋ ค์คŒ.

๋ฐ์ดํ„ฐ ์‹œ๊ฐํ™”

๊ธฐ์ดˆ ํ†ต๊ณ„๋Ÿ‰ ๋ฐ ์ด์ƒ์น˜ ์‹œ๊ฐํ™”(boxplot)

    def boxplot_iris(feature_names, dataset):
      i = 1
      plt.figure(figsize = (11, 9))
      for col in feature_names:
          plt.subplot(2, 2, i)
          plt.axis('on')
          plt.tick_params((axis = 'both', left = True, top = False,
                         right = False, bottom = True, labelleft = False,
                         labeltop = False, labelright = False, labelbottom = False)
          dataset[col].plot(kind = 'box', subplots = True, sharex = False, sharey = False)
          plt.title(col)
          i += 1
      plt.show()

    boxplot_iris(iris.columns[:-1], iris)

๋งŒ์•ฝ์— ์ƒ๋Œ€์ ์ธ ์œ„์น˜๋ฅผ ๋ณด๊ณ  ์‹ถ์œผ๋ฉด sharex์™€ sharey๋ฅผ True๋กœ ํ•ด์ค„ ๊ฒƒ.

  fig, axes = plt.subplots(2, 2, figsize = (11, 9), sharex = True, sharey = True)
  axes = axes.ravel() # np.ravel()์€ 1์ฐจ์› ๋ฐฐ์—ด๋กœ ๋งŒ๋“ค์–ด์ค€๋‹ค.
  for i, ax in enumerate(axes):
      iris.iloc[:, i].plot(kind = 'box', ax = ax)
      ax.set_title(iris.columns[i])
  plt.show()

๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‹œ๊ฐํ™”(histogram)

  def histogram_iris(feature_names, dataset):
      i = 1
      plt.figure(figsize = (11, 9))
      for col in feature_names:
          plt.subplot(2, 2, i)
          plt.axis('on')
          plt.tick_params(axis = 'both', left = True, top = False,
                         right = False, bottom = False, labelleft = False,
                         labeltop = False, labelright = False, labelbottom = False)
          dataset[col].plot(kind = 'hist', subplots = True, sharex = False, sharey = False)
          plt.title(col)
          i += 1
      plt.show()
  histogram_iris(iris.columns[:-1], iris)

์ƒ๊ด€๊ด€๊ณ„ ์‹œ๊ฐํ™”(heatmap)

  • correlationship matrix๋ฅผ ์ƒ์„ฑํ•ด์คŒ.

        corr = iris.corr() 
        cmap = sns.diverging_palette(220, 19, as_cmap = True)
        plt.figure(figsize=(11,9))
        sns.heatmap(corr, cmap=cmap, vmax=1.0, vmin=-1.0, center=0,
                   square=True, linewidths=.5, cbar_kws={'shrink':.5})
        plt.show()

ํ”ผ์ฒ˜ ๊ฐ„์˜ ์ƒ๊ด€๊ด€๊ณ„ ๋ฐ ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ์‹œ๊ฐํ™”(pairplot)

  sns.pairplot(iris, hue = 'Species')
  plt.show()

ํƒ€๊ฒŸ์˜ ํด๋ž˜์Šค ๋น„์œจ(piechart)

  • ํƒ€๊ฒŸ์˜ ํด๋ž˜์Šค ๋น„์œจ์„ ์‹œ๊ฐํ™”ํ•ด์„œ ์‚ดํŽด๋ณธ๋‹ค.

        def piechar_iris(feature_names, target, dataset):
            i = 1
            plt.figure(figsize = (11, 9))
            for colName in [target]:
                labels = []; sizes = []
                df = dataset.groupby(colName).size()
    
                for key in df.keys():
                    labels.append(key)
                    sizes.append(df[key])
    
                plt.subplot(2, 2, i)
                plt.axis('on')
                plt.tick_params(axis = 'both', left = False, top = False,
                               right = False, bottom = False, labelleft = True,
                               labeltop = True, labelright = False, labelbottom = False)
                plt.pie(sizes, labels = labels, autopct = '%1.1f%%',
                       shadow = True, startangle = 140)
                plt.axis('equal')
                i += 1
    
                #plt.axis('equal') ์€ ๋™๊ทธ๋ž€ ํ‘œ๋ฅผ ๋งŒ๋“ค์ž๋Š” ๋œป.
            plt.show()
    
        piechar_iris(iris.columns[:-1], iris.Species, iris)

Hold out

```python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(iris.iloc[:, :-1], iris.iloc[:, -1], test_size = 0.33, random_state = 42)
```
  • : sklearn์˜ train_test_split ํ•จ์ˆ˜๋กœ ํ›ˆ๋ จ์šฉ ๋ฐ์ดํ„ฐ์…‹๊ณผ ์„ฑ๋Šฅํ‰๊ฐ€์šฉ ๋ฐ์ดํ„ฐ์…‹์„ ๋‚˜๋ˆˆ๋‹ค.

ํ•™์Šต

  • train : ํ•™์Šต ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•™์Šต์‹œํ‚ค๋Š” ์šฉ๋„

  • validation : ์–ด๋–ค hyperparameter๊ฐ€ ์ข‹์€์ง€ ํŠœ๋‹๊ณผ ํ‰๊ฐ€

  • test : Best parameter๋กœ ํ•™์Šต์‹œํ‚จ ๋ชจ๋ธ์„ ํ‰๊ฐ€ํ•˜๋Š” ์šฉ๋„

๋ชจ๋ธ์ƒ์„ฑ๐Ÿ™Š

\- ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์„ ํƒ : ์ด ์˜ˆ์ œ์—์„œ๋Š” **Decision Tree Classifier** ์„ ์ด์šฉํ•  ๊ฒƒ์ด๋‹ค.  
`python from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier(random_state = 42)`  
\- ๋ชจ๋ธํ•™์Šต  
`python model.fit(X_train, y_train)`  
\- score  
`python model.score(X_test, y_test) # 0.98, iris ๋ฐ์ดํ„ฐ๋Š” ๊ต‰์žฅํžˆ ์ข‹์€ ๋ฐ์ดํ„ฐ๋ผ ์–ด๋–ป๊ฒŒ ํ•™์Šต์„ ์‹œ์ผœ๋„ ๋†’์€ ์ ์ˆ˜๊ฐ€ ๋‚˜์˜จ๋‹ค.`

๋ชจ๋ธ ์ผ๋ฐ˜ํ™” ์ „๋žต & Learning curve๐Ÿ™Š

๋จธ์‹ ๋Ÿฌ๋‹์ด๋ž€ ๋ฐ์ดํ„ฐ์— ์˜ํ•ด์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์ด ๊ฒฐ์ •๋˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„ํžˆ ๋งŽ์•„์•ผ ํ•œ๋‹ค. ๋ฐ์ดํ„ฐ๊ฐ€ ์ถฉ๋ถ„์น˜ ์•Š์€ ๊ฒฝ์šฐ, ํŠน์ • ํ•™์Šต๋ฐ์ดํ„ฐ์—๋งŒ ๋†’์€ ์„ฑ๋Šฅ์„ ๊ฐ€์ง€๋Š” ๊ณผ์ ํ•ฉ์ด ๋ฐœ์ƒํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ๋ชจ๋ธ์„ ์ผ๋ฐ˜ํ™”ํ•ด์ฃผ๋Š” ๋…ธ๋ ฅ์ด ํ•„์š”ํ•˜๋‹ค.
  • validation set
    ๐Ÿ”Ž ํ•™์Šต๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ์ผ๋ถ€๋ฅผ hold outํ•ด์„œ validation set์œผ๋กœ ํ™œ์šฉํ•ด์ค€๋‹ค.

  • cross vaildation(๊ต์ฐจ๊ฒ€์ฆ)
    ๐Ÿ”Ž ๋ฐ์ดํ„ฐ์…‹์„ k๊ฐœ๋งŒํผ ๋‚˜๋ˆ„์–ด, ์กฐ๊ฐ๋“ค์„ ๋Œ์•„๊ฐ€๋ฉด์„œ validation set์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค. ๊ทธ ๋‹ค์Œ, k๊ฐœ์˜ ์„ฑ๋Šฅ ๊ฒฐ๊ณผ๋ฅผ ํ‰๊ท ๋‚ด์„œ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ์ถ”์ •ํ•œ๋‹ค.

    • KFold

            from sklearn.model_selection import cross_val_score, KFold
      
            cv = KFold(n_splits = 10, shuffle = True, random_state = 42)
            results = cross_val_score(model, X_train, y_train, cv = cv)
            fin_result = results.mean()
      
            for i, r in enumerate(results):
            print(f"{i}๋ฒˆ์งธ ๊ต์ฐจ ๊ฒ€์ฆ ์ •ํ™•๋„: {r}")
      
            print(f"\n๊ต์ฐจ๊ฒ€์ฆ ์ตœ์ข… ์ •ํ™•๋„: {fin_result}")
    • Stratified KFold
      : ํ•œ validation set์•ˆ์— ๊ฝƒ 3๊ฐ€์ง€์˜ ์ข…๋ฅ˜๊ฐ€ ๊ณจ๊ณ ๋ฃจ ์„ž์–ด์ค€ ํ›„ ์„ฑ๋Šฅ์„ ํ‰๊ฐ€ํ•œ๋‹ค.

          from sklearn.model_selection import cross_val_score, StratifiedKFold
      
          cv = StratifiedKFold(n_splits = 10, shuffle = True, random_state = 42)
          results = cross_val_score(model, X_train, y_train, cv = cv)
          fin_result = results.mean()
      
          for i, r in enumerate(results):
          print(f"{i}๋ฒˆ์งธ ๊ต์ฐจ ๊ฒ€์ฆ ์ •ํ™•๋„: {r}")
      
          print(f"\n๊ต์ฐจ๊ฒ€์ฆ ์ตœ์ข… ์ •ํ™•๋„: {fin_result}")

  • Learning curve
    ๊ต์ฐจ๊ฒ€์ฆ ๊ธฐ๋ฒ•์€ ๋ฐ์ดํ„ฐ์˜ ์–‘์ด ์ถฉ๋ถ„ํ•˜์ง€ ์•Š์„ ๋•Œ ์‚ฌ์šฉํ•˜๋Š” ๊ธฐ๋ฒ•์ธ๋ฐ, ์ด ๋•Œ ๋ฐ์ดํ„ฐ์˜ ์–‘์ด ์ถฉ๋ถ„ํ•œ๊ฐ€์— ๋Œ€ํ•ด์„œ๋Š” ํ•™์Šต๊ณก์„ ์„ ๊ทธ๋ฆผ์œผ๋กœ์จ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

      import scikitplot as skplt
      skplt.estimators.plot_learning_curve(model, X_train, y_train, figsize = (6,6))
      plt.show()
    ``
    

๋ชจ๋ธ ์ตœ์ ํ™” ์ „๋žต๐Ÿ™Š

-   ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ : scikit-learn์—์„œ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ธ์Šคํ„ด์Šคํ™”ํ•  ๋•Œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ค์ •ํ•  ์ˆ˜ ์žˆ๋‹ค.
-   **GridSearchCV** ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ํƒ์ƒ‰  
    ๐Ÿ”Ž ํ•˜์ดํผ ํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์— ๋Œ€ํ•œ ๊ฒฝ์šฐ์˜ ์ˆ˜๋ฅผ ๋ชจ๋‘ ๊ฒฉ์ž(grid)์— ๋‚˜์—ดํ•˜๊ณ  ๋ชจ๋“  ์กฐํ•ฉ์„ ์ผ์ผํžˆ ํ•™์Šต ๋ฐ ์„ฑ๋Šฅ ์ธก์ •ํ•˜๋Š” ๊ธฐ๋Šฅ์ด๋‹ค.  
    ๐Ÿ”Ž estimator๋กœ๋Š” ์•Œ๊ณ ๋ฆฌ์ฆ˜ ์ธ์Šคํ„ด์Šค๋ฅผ ์ „๋‹ฌํ•˜๊ณ , param\_grid์—๋„ˆ๋Š ํ…Œ์ŠคํŠธํ•  ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋‹ด์€ **๋”•์…”๋„ˆ๋ฆฌ**๋ฅผ ์ „๋‹ฌํ•œ๋‹ค.
from sklearn.model_selection import GridSearchCV
estimator = DecisionTreeClassifier()
params= {'max_depth':range(4, 13, 2),
        'criterion':['gini', 'entropy'],
        'splitter':['best', 'random'],
        'min_weight_fraction_leaf':[0.0, 0.1, 0.2, 0.3],
        'random_state':[7, 23, 42, 78, 142],
        'min_impurity_decrease':[0., 0.05, 0.1, 0.2]}
model = GridSearchCV(estimator, params, cv=cv, verbose=1,
                    n_jobs = -1, refit=True) # validation set 10๊ฐœ * ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์กฐํ•ฉ์ˆ˜ 1600๊ฐœ = 16000๋ฒˆ
                    # refit : best parmaeter ํ•™์Šต๊ฒฐ๊ณผ๋ฅผ best_estimator_ ์†์„ฑ์— ๊ธฐ๋กํ•ด๋‘”๋‹ค.
model.fit(X_train, y_train)

print(f"Best Estimator: {model.best_estimator_}\n")
print(f"Best Params: {model.best_params_}\n")
print(f"Best Scorer: {model.best_score_}\n")

ํ‰๊ฐ€ ์ง€ํ‘œ ๋ฐ ๋ชจ๋ธ ํ‰๊ฐ€

์ •ํ™•๋„๋งŒ์œผ๋กœ ๋ชจ๋ธ์„ ํ‰๊ฐ€/๊ฒ€์ฆํ•˜๋Š” ๊ฒƒ์€ ๋ถ€์กฑํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ํ‰๊ฐ€์ง€ํ‘œ๋“ค๋„ ์•Œ ํ•„์š”๊ฐ€ ์žˆ๋‹ค.

Confusion Matrix

  • ์ด์ง„๋ถ„๋ฅ˜ : ์˜ˆ์ธก์„ ํ•  ๋•Œ ๋‘๊ฐ€์ง€๋กœ ์˜ˆ์ธกํ•˜๋Š” ๊ฒฝ์šฐ ์ด 4๊ฐ€์ง€์˜ ๊ฒฝ์šฐ๊ฐ€ ๋‚˜์˜ค๊ฒŒ ๋œ๋‹ค.

  • ๋‹ค์ค‘๋ถ„๋ฅ˜ : ์•„์ด๋ฆฌ์Šค ๋ฐ์ดํ„ฐ์˜ ๊ฒฝ์šฐ ๊ฝƒ์˜ ํŠน์ง•๋“ค(feature)๋“ค์„ ๊ฐ€์ง€๊ณ  ์ด ๊ฝƒ์ด setosa์ธ์ง€, versicolor์ธ์ง€, virginic์ธ์ง€ ์˜ˆ์ธกํ•˜๊ฒŒ ๋œ๋‹ค. ์ด๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ์ง€ํ‘œ๋ฅผ ๋งŒ๋“ค๋ฉด ์•„๋ž˜์™€ ๊ฐ™์ด ๋œ๋‹ค.

  • ์•„์ด๋ฆฌ์Šค ๋ฐ์ดํ„ฐ์…‹์˜ confusiom matrix ์ฝ”๋“œ

      from sklearn.metrics import confusion_matrix
    
      pred = model.predict(X_test)
      confMatrix = confusin_matrix(y_test, pred)
      print('Confusion Matrix :\n : ', confMatrix)

scikit-plot์„ ์ด์šฉํ•˜๋ฉด confusion matrix๋ฅผ ๋” ์ง๊ด€์ ์ธ heatmap์œผ๋กœ ์‹œ๊ฐํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค.

    skplt.metrics.plot_confusion_matrix(y_test, pred, figsize=(8,6))
    plt.show()

Precision, Recall, Fall-out, F-score

  • FP, FN, TP, TN : ์•ž์€ ์˜ˆ์ธก์„ ๋งž์ท„๋Š”์ง€ ํ‹€๋ ธ๋Š”์ง€ / ๋’ค๋Š” P๋ผ ์˜ˆ์ธกํ–ˆ๋Š”์ง€, N์ด๋ผ ์˜ˆ์ธกํ–ˆ๋Š”์ง€
  • precision(์ •๋ฐ€๋„)
    • ์˜ˆ์ธกํ•œ ํด๋ž˜์Šค ์ค‘์— ์‹ค์ œ๋กœ ๋งž์€ ๋น„์œจ(๋น„์‹ผ ์‹ ๋ขฐ์„ฑ์žˆ๋Š” ์ง„๋‹จ๊ธฐ), ์˜ˆ๋ฅผ ๋“ค์–ด ์–‘์„ฑ์ด๋ผ ์˜ˆ์ธกํ•œ ๊ฒƒ ์ค‘์—์„œ ์‹ค์ œ ์–‘์„ฑ์ธ ๊ฒƒ์˜ ๋น„์œจ.
    • TP/TP+FP
  • recall(์žฌํ˜„์œจ)(TPR)
    • ์‹ค์ œ ํƒ€๊ฒŸ ํด๋ž˜์Šค ์ค‘์—์„œ ์˜ˆ์ธก์ด ๋งž์€ ๋น„์œจ, ์˜ˆ๋ฅผ ๋“ค์–ด ์‹ค์ œ ์–‘์„ฑ์˜ ๊ฐœ์ˆ˜ ์ค‘์—์„œ ์–‘์„ฑ์œผ๋กœ ์˜ˆ์ธกํ•œ ๊ฒƒ์˜ ๋น„์œจ(ํšจ์œจ์ ์ธ ๊ฐ’์‹ผ ์ง„๋‹จ๊ธฐ)
    • TP/TP+FN
  • fall-out
    • ํƒ€๊ฒŸ์ด ์•„๋‹Œ ์‹ค์ œ ํด๋ž˜์Šค ์ค‘์—์„œ ํ‹€๋ฆฐ ๋น„์œจ.
  • f-score
    • ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์€ trade-off๊ด€๊ณ„์ด๋‹ค.
    • ์ •๋ฐ€๋„์™€ ์žฌํ˜„์œจ์˜ ๊ฐ€์ค‘์กฐํ™”ํ‰๊ท .
      precs = []
      rcls = []
      specs = []
      for i in range(3):
          TP = confMatrix[i, i]
          FN = confMatrix[i].sum() - TP
          FP = confMatrix[:,i].sum() - TP
          TN = confMatrix.sum() - TP - FN - FP
          precs.append(TP/(TP+FP))
          rcls.append(TP/(TP+FN))
          specs.append(TN/(TN+FP))

      precs = np.array(precs)
      rcls = np.array(rcls)
      specs = np.array(specs)
      fall_outs = 1 - specs

      print(precs)
      print(rcls)
      print(specs)
      print(fall_outs)

ROC curve์™€ AUC

  • ROC curve
    • ์–˜๋Š” TPR(True Positive Rate)๋ฅผ y์ถ•, FPR(False Positive Rate)๋ฅผ x์ถ•์œผ๋กœ ํ•˜๋Š” ๊ทธ๋ž˜ํ”„์ด๋‹ค.
    • TPR์€ recall์ด๋ž‘ ๊ฐ™๊ณ , FPR์€ fall-out๊ณผ ๊ฐ™์•„์„œ FP/FP+TN(์‹ค์ œ negative์ธ ํ‘œ๋ณธ๊ฐœ์ˆ˜)์„ ์˜๋ฏธํ•œ๋‹ค.
    • ํด๋ž˜์Šค ํŒ๋ณ„ ๊ธฐ์ค€๊ฐ’์˜ ๋ณ€ํ™”์— ๋”ฐ๋ฅผ ์žฌํ˜„์œจ๊ณผ ์œ„์–‘์„ฑ์œจ์˜ ๋ณ€ํ™”๋ฅผ ์‹œ๊ฐํ™” ํ•œ ๊ฒƒ.
    skplt.metrics.plot_roc(y_test, pred_proba, figsize = (8,6))
    plt.show()
  • AUC
    • ROC curve ๋ฐ‘์˜ ๋ฉด์ ์„ ๊ณ„์‚ฐ. ๊ฑฐ๊พธ๋กœ ๊ธฐ์–ต์ž์ผ์ˆ˜๋ก ์ข‹์€ ์„ฑ๋Šฅ์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

์ตœ์ข… ๋ชจ๋ธ

์ตœ์ข…๋ชจ๋ธํ•™์Šต

    model.fit(iris.iloc[:, :-1], iris.iloc[:, -1]model.fit(iris.iloc[:, :-1], iris.iloc[:, -1]`)

๋ชจ๋ธ์ €์žฅ

๋ชจ๋ธ์„ ์žฌ์‚ฌ์šฉ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ๋ชจ๋ธ์„ ์ €์žฅํ•ด๋‘์ž. pickle์„ ์ด์šฉํ•˜๋ฉด ๋ชจ๋ธ ๊ฐ์ฒด ์ž์ฒด๋ฅผ ์ €์žฅํ•  ์ˆ˜ ์žˆ๋‹ค.

    import pickle
    with open("final_model.pickle", "wb") as fp:
        pickle.dump(model, fp)

๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ ๋ฐ ์˜ˆ์ธก

๋ชจ๋ธ์„ ๋‹ค์‹œ ๋ถˆ๋Ÿฌ์˜ฌ ๋•Œ๋Š” load ๋ฉ”์„œ๋“œ๋ฅผ ์ด์šฉํ•˜๊ณ  ์ €์žฅํ•œ ๋ชจ๋ธ์„ ๋ถˆ๋Ÿฌ์™€์„œ ์˜ˆ์ธก๊ฐ’์„ csv๋กœ ์ €์žฅํ•ด๋ณด์ž.

    f = open('new_model.pickle', 'rb')
    model = pickle.load(f); f.close()

    predicted_species = model.predict(iris.iloc[:, :-1])
    iris['predicted_species'] = predicted_species #์ƒˆ๋กœ์šด ์—ด์„ ๋งŒ๋“ค์–ด์ฃผ๊ธฐ.
    iris.to_csv('Final.csv', index = False)