Salesforce 數據清洗

新系統上線後,須要導入歷史數據,可是舊數據格式,數據缺失,數據錯誤,奇異值,屬性歸類與新系統有很大的gap。所以咱們須要創建一套數據動態清洗規則給Salesforce系統,經過這些規則自動清洗導入數據,清洗規則可讓function本身配置。而不須要IT負責html

 

下面將詳細舉一個例子如何在salesforce中作數據處理。數據清洗須要分紅5個步驟數組

1,創建2個關聯數據的Object的和 一個數據清洗後臺設置的Object的
2,數據導入頁面csv
3,定義每一個字段的範圍、屬性,若是是錯誤的則自動從新分配,或者修改爲臨近值
4,數據清洗合併。
5,導出錯誤數據到Excel
 
第一步,新創建兩個關聯的Recruit 和 Recruit Department, 而且創建一個清洗規則的Object,當導入數據後咱們能夠讀取設置的清洗規則,並對導入的數據進行清洗
第二步,對於清洗規則,咱們只能有一條規則被激活,所以咱們在插入新規則和更改舊規則的時候,咱們須要添加一個tirgger針對Data_Washing_Setting,保證規則的惟一性。
 
 1 trigger IsActiveChecking on Data_Washing_Setting__c (before insert,before update) {
 2 
 3     List<Data_Washing_Setting__c> ListOldData =[select Id from Data_Washing_Setting__c 
 4                                                where Active_this_Rule__c = true];
 5     List<Data_Washing_Setting__c> ListNewData =trigger.new;
 6     
 7     //system.debug('ListNewData:'+ListNewData.size());
 8     integer itemNum = 0;
 9     if(trigger.isInsert)
10     {
11         if(trigger.isBefore)
12         {
13             for(Data_Washing_Setting__c dws : trigger.new)
14             {
15                 if(dws.Active_this_Rule__c)
16                 {
17                     itemNum++;
18                 }
19             }
20            itemNum +=ListOldData.size();
21            
22            if(itemNum>1)
23            {
24                for(Data_Washing_Setting__c dws : trigger.new){   
25                     dws.adderror('only one record can be actived! pls check your history data and try again.');
26                 }
27            }
28         }
29     }
30     else if(trigger.isUpdate)
31     {   
32         if(trigger.isBefore)
33         {
34             // 去掉更新的數據
35             for(Data_Washing_Setting__c dws : trigger.new)
36             {
37                 for(integer i=0;i<ListOldData.size();i++){
38                     if(dws.Id== ListOldData[i].Id)
39                     {
40                         ListOldData.remove(i);
41                     }
42                 }
43                 if(dws.Active_this_Rule__c)
44                 {
45                     itemNum++;
46                 }
47             }
48            itemNum +=ListOldData.size();
49            if(itemNum>1)
50            {
51                for(Data_Washing_Setting__c dws : trigger.new){   
52                     dws.adderror('only one record can be actived! pls check your history data and try again.');
53                 }
54            }
55           
56         }
57     }
58 }

第三步,咱們須要創建導入頁面,並添加相應的驗證按鈕app

VF的代碼ui

 1 <apex:page controller="BatchInsertByCsvController">
 2     <apex:form >
 3     <apex:sectionHeader title="Upload Recruit Data"/>
 4    <apex:pageMessages />
 5    <apex:pageblock >
 6         <center>
 7             <apex:inputFile value="{!contentFile}" fileName="{!fileName}" />
 8             <apex:commandButton action="{!LoadData}" value="Batch Insert"/>
 9             <apex:commandButton action="{!LoadBlankList}" value="Filter Blank Data"/>
10             <apex:commandButton action="{!ExportBlankToCSV}" value="Export CSV"/>
11             
12         </center>
13     </apex:pageblock>
14      <apex:pageBlock title="Import Data">
15          <apex:pageblocktable value="{!RecruitList}" var="ReList">
16               <apex:column value="{!ReList.Name}" />
17               <apex:column value="{!ReList.Position_Name__c}" />
18               <apex:column value="{!ReList.Recruit_Department__c}" />
19               <apex:column value="{!ReList.Recruit_Type__c}" />
20               <apex:column value="{!ReList.Recruit_Number__c}" />
21         </apex:pageblocktable>
22      </apex:pageBlock>
23      <apex:pageBlock title="Blank Data">
24          <apex:pageblocktable value="{!BlankList}" var="BList">
25               <apex:column value="{!BList.Name}" />
26               <apex:column value="{!BList.Position_Name__c}" />
27               <apex:column value="{!BList.Recruit_Department__c}" />
28               <apex:column value="{!BList.Recruit_Type__c}" />
29               <apex:column value="{!BList.Recruit_Number__c}" />
30         </apex:pageblocktable>
31      </apex:pageBlock>
32     </apex:form>
33 </apex:page>

後臺APEX 導入代碼this

  1 public class BatchInsertByCsvController {
  2     
  3     public string fileName{get;set;}
  4     //Blob:二進制對象類型。經過inputFile選中後的文件在後臺獲取的時候是一個Blob類型,
  5     public Blob contentFile{get;set;}
  6     public String[] filelines = new String[]{};
  7     public List<Recruit__c> RecruitList{get;set;}
  8     public List<Recruit__c> BlankList{get;set;}
  9     public List<Recruit__c> invaildList{get;set;}
 10     //初始化
 11     public PageReference LoadData()
 12     {
 13         try{
 14             filename = bitToString(contentFile,'ISO-8859-1');
 15             filelines = fileName.split('\n');
 16            // ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'import account:'+filelines.size());
 17            // ApexPages.addMessage(msgs);
 18             RecruitList = new List<Recruit__c>();
 19             string[] inputvalues;
 20             string SwpNumber;
 21             
 22             for(Integer i=1;i<filelines.size();i++)
 23             {
 24                 inputvalues = new string[]{};
 25                 inputvalues = filelines[i].split(',');
 26                 Recruit__c recruits = new Recruit__c();
 27                 recruits.Name = inputvalues[0];
 28                 recruits.Position_Name__c = inputvalues[1];
 29                 recruits.Recruit_Department__c = [SELECT Id 
 30                                 FROM Recruit_Department__c 
 31                                 WHERE Name =:inputvalues[2] LIMIT 1].Id;
 32                 recruits.Recruit_Type__c = inputvalues[3];
 33                 SwpNumber = inputvalues[4];
 34                 recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
 35                 RecruitList.add(recruits);
 36             }
 37         }
 38         catch(exception e){
 39             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
 40             ApexPages.addMessage(errormsg);
 41         }
 42         try{
 43            // insert RecruitList;
 44           //   ApexPages.Message successMsg = new ApexPages.Message(ApexPages.severity.INFO,'import success');
 45             // ApexPages.addMessage(successMsg);
 46         }
 47         catch(Exception e)
 48         {
 49             //ApexPages.Message errormsg = new ApexPages.Message(ApexPages.severity.ERROR,'An error has occured inserting the records'+e.getMessage());
 50             //ApexPages.addMessage(errormsg);
 51         }
 52         return null;
 53     }
 54     //blob是二進制存儲的,String是16進制存儲的,因此使用此種方式加上編碼解碼等操做確定會更加適應,包括中文
 55     private String bitToString(Blob input, String inCharset){
 56          //轉換成16進制
 57         String hex = EncodingUtil.convertToHex(input);
 58          //一個String類型兩個字節 32位(bit),則一個String長度應該爲兩個16進制的長度,因此此處向右平移一個單位,即除以2
 59          //向右平移一個單位在正數狀況下等同於除以2,負數狀況下不等
 60          //eg 9  00001001  >>1 00000100   結果爲4
 61          final Integer bytesCount = hex.length() >> 1;
 62          //聲明String數組,長度爲16進制轉換成字符串的長度
 63          String[] bytes = new String[bytesCount];
 64          for(Integer i = 0; i < bytesCount; ++i) {
 65              //將相鄰兩位的16進制字符串放在一個String中
 66              bytes[i] =  hex.mid(i << 1, 2);
 67          }
 68          //解碼成指定charset的字符串
 69          return EncodingUtil.urlDecode('%' + String.join(bytes, '%'), inCharset);
 70      }
 71     //篩選空值
 72     public PageReference LoadBlankList()
 73     {
 74         try
 75         {
 76             BlankList=new list<Recruit__c>();
 77             DataWashingSetting dws=new DataWashingSetting();
 78             string[] flines = dws.AddQuestionsData(filelines);
 79             string[] inputvalues;
 80             string SwpNumber;
 81             
 82             for(Integer i=0;i<flines.size();i++)
 83             {
 84                     inputvalues = new string[]{};
 85                     inputvalues = flines[i].split(',');
 86                     Recruit__c recruits = new Recruit__c();
 87                     recruits.Name = inputvalues[0];
 88                     recruits.Position_Name__c = inputvalues[1];
 89                     recruits.Recruit_Department__c = [SELECT Id 
 90                                     FROM Recruit_Department__c 
 91                                     WHERE Name =:inputvalues[2] LIMIT 1].Id;
 92                     recruits.Recruit_Type__c = inputvalues[3];
 93                     SwpNumber = inputvalues[4];
 94                     recruits.Recruit_Number__c = Decimal.valueOf(SwpNumber.trim());
 95                     BlankList.add(recruits);
 96             }
 97             ApexPages.Message msgs = new ApexPages.Message(ApexPages.Severity.INFO, 'blank num:'+BlankList.size());
 98             ApexPages.addMessage(msgs);
 99         }
100         catch(Exception e)
101         {
102             ApexPages.Message errormsg = new ApexPages.Message(ApexPages.Severity.ERROR,'An error has occured reading the CSV file: '+e.getMessage());
103             ApexPages.addMessage(errormsg);
104         }
105         return null;
106     }
107     public PageReference ExportBlankToCSV()
108     {
109          return new PageReference('/apex/ExportCSV');
110     }
111 }

後臺調用的驗證清洗代碼,能夠根據須要任意添加編碼

 1 public class DataWashingSetting {
 2 
 3     //消除重複數據
 4     public List<Recruit__c> DelDuplicateData(List<Recruit__c> OriginalList)
 5     {        
 6         set<Recruit__c> myset= new set<Recruit__c>();
 7         List<Recruit__c> result = new List<Recruit__c>();
 8         
 9         myset.addAll(OriginalList);
10         result.addAll(myset);
11         
12         return result;
13     }
14     //篩選爲空數據
15     public string[] AddQuestionsData(string[] filelines)
16     {
17         string[] result =new string[]{}; 
18         string[] inputvalues;
19         for(Integer i=1;i<filelines.size();i++)
20         {
21             inputvalues = new string[]{};
22             inputvalues = filelines[i].split(',');
23             if(inputvalues[0] == ''||inputvalues[1] == '' ||inputvalues[2] == '' 
24                 ||inputvalues[3] == '' ||inputvalues[4] == '')
25             {
26                 result.add(filelines[i]);   
27             }
28          }
29         return result;
30     }
31     //檢測各個字段的合理性
32     public string[] CheckFiled(string[] filelines)
33     {
34         //讀取規則
35         Data_Washing_Setting__c dws = [select Position_Name_Rule__c,
36                                        Recruit_End_Number__c,Recruit_Department_Rule__c,Recruit_Start_Number__c from Data_Washing_Setting__c where Active_this_Rule__c = true];
37         string PositionNameRule = dws.Position_Name_Rule__c; //部門規則是否容許重複
38         decimal startNumber= dws.Recruit_Start_Number__c; //招聘人數底線
39         decimal endNumber= dws.Recruit_End_Number__c; //招聘人數上線
40         string department = dws.Recruit_Department_Rule__c;//部門限制
41         
42         string[] result =new string[]{}; 
43         string[] inputvalues;
44         for(Integer i=1;i<filelines.size();i++)
45         {
46             inputvalues = new string[]{};
47             inputvalues = filelines[i].split(',');
48             //填寫驗證代碼
49          }
50         return result; //返回不合格代碼
51     }
52 }

出現問題數據直接導出問題數據到Excel,手動處理後再導入。url

 1 <apex:page controller="BatchInsertByCsvController" cache="true" contentType="application/x-excel# BlankList.xls" showHeader="false">
 2  <head>
 3       <meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
 4  </head>
 5      <apex:pageBlock >
 6      <apex:pageblocktable value="{!BlankList}" var="BList">
 7               <apex:column value="{!BList.Name}" />
 8               <apex:column value="{!BList.Position_Name__c}" />
 9               <apex:column value="{!BList.Recruit_Department__c}" />
10               <apex:column value="{!BList.Recruit_Type__c}" />
11               <apex:column value="{!BList.Recruit_Number__c}" />
12         </apex:pageblocktable>
13     </apex:pageBlock>
14 </apex:page>

下面就是最終效果:spa

1,導入數據,自動篩選有缺失值的數據,並支持Excel導出debug

2,後臺清洗的規則設置。excel

相關文章
相關標籤/搜索