錯誤信息:
java.io.IOException: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:340)
at org.apache.gora.sql.store.SqlStore.close(SqlStore.java:185)
at org.apache.gora.mapreduce.GoraRecordWriter.close(GoraRecordWriter.java:55)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.close(ReduceTask.java:579)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:650)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:417)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:260)
Caused by: java.sql.BatchUpdateException: Incorrect string value: '\xD6\xD0\xB9\xFA\xB9\xA4...' for column 'content' at row 1
at com.mysql.jdbc.PreparedStatement.executeBatchSerially(PreparedStatement.java:1666)
at com.mysql.jdbc.PreparedStatement.executeBatch(PreparedStatement.java:1082)
at org.apache.gora.sql.store.SqlStore.flush(SqlStore.java:328)
解決方法:
在nutch2.1 中配置 <property> <name>encodingdetector.charset.min.confidence</name> <value>1</value> <description>A integer between 0-100 indicating minimum confidence value for charset auto-detection. Any negative value disables auto-detection. </description> </property> 並確保mysql數據庫編碼爲UTF-8