tf 에서 confusion matrix에서 actual value와 prediction value가 다른 경우

Machine Learning

tf 에서 confusion matrix에서 actual value와 prediction value가 다른 경우

jinmc 2023. 8. 18. 14:27

이미지 분류 모델을 test 하는데 있어서, test script를 짤 일이 있었습니다. confusion matrix를 만드는데, actual value와 prediction value을 다르게 만들고 싶어서 찾아봤습니다.

기본적인 confusion matrix인 실제암과 암진단의 confusion matrix

위의 경우와 같이 predicted value와 actual value가 같은 경우도 당연히 있고, 결국 맞출 수 있을것으로도 생각되지만, 어떤 경우에는 actual value 와 predicted value가 다른 경우도 있을 것입니다. 이에 대해서 한번 보았습니다.

일단 같을 때의 코드를 봅시다.

import numpy as np
import tensorflow as tf
import pandas as pd

def create_confusion_matrix(y_true, y_pred, normalize=True):
    """
    Create a confusion matrix using TensorFlow.
    
    Parameters:
    - y_true: True labels
    - y_pred: Predicted labels
    - normalize: Whether to normalize the matrix values to [0, 1]
    
    Returns:
    - Confusion matrix as a pandas DataFrame
    """
    # Convert labels to tensors
    y_true = tf.convert_to_tensor(y_true)
    y_pred = tf.convert_to_tensor(y_pred)
    
    # Use TensorFlow to compute the confusion matrix
    confusion_matrix = tf.math.confusion_matrix(y_true, y_pred).numpy()
    
    # Normalize the confusion matrix if required
    if normalize:
        confusion_matrix = confusion_matrix.astype('float') / confusion_matrix.sum(axis=1)[:, np.newaxis]
    
    # Convert the matrix to a pandas DataFrame for better display
    df_cm = pd.DataFrame(confusion_matrix, index=[f"Actual {i}" for i in range(confusion_matrix.shape[0])],
                         columns=[f"Predicted {i}" for i in range(confusion_matrix.shape[1])])
    
    return df_cm

# Example usage:
y_true = [1, 0, 1, 2, 2, 0, 1]
y_pred = [1, 0, 1, 2, 1, 0, 1]

df_cm = create_confusion_matrix(y_true, y_pred)
print(df_cm)

          Predicted 0  Predicted 1  Predicted 2
Actual 0          1.0          0.0          0.0
Actual 1          0.0          1.0          0.0
Actual 2          0.0          0.5          0.5

다음과 같이 하면 됩니다.

import numpy as np
import tensorflow as tf
import pandas as pd

def create_confusion_matrix(y_true, y_pred, actual_labels, predicted_labels):

    # Generate the confusion matrix
    con_mat = tf.math.confusion_matrix(labels=y_true, predictions=y_pred).numpy()

    # Identify unique labels in the ground truth
    unique_gt_labels = np.unique(y_true)
    print(unique_gt_labels)
    unique_pred_labels = np.unique(pred_indices)

    # Slice the confusion matrix to keep only rows corresponding to the unique ground truth labels
	# % 수정사항.. actual value가 더 작을 때는 되는데 prediction value가 더 작을 때는 되지 않는 edge case 가 발견되서 수정하였습니다
    if len(unique_pred_labels) < len(unique_gt_labels):
        con_mat = con_mat[:, unique_pred_labels]
    elif len(unique_gt_labels) < len(unique_pred_labels):
        con_mat = con_mat[unique_gt_labels, :]
	# con_mat = con_mat[unique_gt_labels]
    print(con_mat)
    
    # Normalize the confusion matrix if required
    con_mat_norm = np.around(con_mat.astype('float') / con_mat.sum(axis=1)[:, np.newaxis], decimals=2)

    print(con_mat_norm, "con_mat_norm")

    # Now, use these lists for the index and columns of the DataFrame
    con_mat_df = pd.DataFrame(con_mat_norm,
                              index=actual_labels,
                              columns=predicted_labels)
    
    return con_mat_df

# Example usage:
y_pred = [0, 1, 2, 3, 0, 2, 3, 1, 0, 2, 3, 3]
y_true = [0, 1, 2, 2, 0, 2, 2, 1, 2, 0, 2, 1]

predicted_labels = ['poodle', 'tiger', 'man', 'woman']
actual_labels = ['dog', 'cat', 'person']

df_cm = create_confusion_matrix(y_true, y_pred, actual_labels, predicted_labels)
print(df_cm)

pd.DataFrame에 label list를 넣고, con_mat = cont_mat[unique_gt_labels]로 dimension을 맞춰주면 됩니다.

        poodle  tiger   man  woman
dog       0.67   0.00  0.33   0.00
cat       0.00   0.67  0.00   0.33
person    0.17   0.00  0.33   0.50