轉載:y小川html
SettingWithCopyWarning 解決方案
問題場景:我在讀取csv文件以後,由於要新增一個特徵列並根據已有特徵修改新增列的值,結果在修改的時候就碰到了SettingWithCopyWarning這個警告,花了很長時間才解決這個問題。數組
一個簡易版的範例函數
import pandas as pd import numpy as np aa = np.array([1, 0, 1, 0]) bb = pd.DataFrame(aa.T, columns=['one']) print(bb)
輸出爲:url
添加一個新列後在輸出spa
bb['two'] = 0 print(bb) output[]: one two 0 1 0 1 0 0 2 1 0 3 0 0
按條件修改新列再輸出就報錯了:.net
for i in range(bb.shape[0]): if bb['one'][i] == 0: bb['two'][i] = 1 print(bb) output[]: C:/PycharmProjects/NaiveBayesProduct/pandas/try_index.py:22: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy bb['two'][i] = 1 one two 0 1 0 1 0 1 2 1 0 3 0 1
這個問題怎麼解決呢,我查了stackoverflow上的不少帖子,試了loc/iloc等函數都無論用,最後才發現是順序錯了。正確方案應該是生成好正確的數組再插入dataframe中。下面我把上面的例子用正確地方法再從新生成一遍。code
import pandas as pd import numpy as np aa = np.array([1, 0, 1, 0]) bb = pd.DataFrame(aa.T, columns=['one']) # 生成一個ndarray,裝要插入的值 two = np.zeros(bb.shape[0]) # 按條件修改two for i in range(bb.shape[0]): if bb['one'][i] == 0: two[i] = 1 # 完成後將two插入dataframe中 bb.insert(1,'two', two) print(bb) output[]: one two 0 1 0.0 1 0 1.0 2 1 0.0 3 0 1.0