Watson是IBM公司推出的一套人工智能服務系統,功能十分強大。本系列文章包括兩部分,第一部分是翻譯Watson中的服務;第二部分是基於其中的幾個服務實現的一個demo。在這裏,我使用的語言是python。html
AlchemyLanguage服務是一套文本分析函數,能夠從文本內容中提取語義信息。你能夠輸入文本,HTML或者是一個公開的網站,經過複雜的天然語言處理很快就能夠得到對你文本的理解以及更細節的信息,好比情感或是檢測單位元和關鍵字。node
在分析HTML或者網頁信息以前,AlchemyLanguage會自動的移除廣告,標題,或者其餘不指望的信息,只留下最重要的源文件。HTML文件最大尺寸是600KB,而在淨化後獲得的文本大小最大是50KB。python
pip install --upgrade watson-developer-cloud
(同時還提供了node,Java,ios sdk可供下載)ios
在使用AlchemyLanguage以前,你須要獲得一個API key:git
註冊一個IBM bluemix帳號https://console.ng.bluemix.ne...github
登錄bluemix並進入AlchemyLanguage頁面web
點擊建立按鈕json
在AlchemyLanguage頁面點擊「服務憑證」查看你的API key。api
import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguage(api_key='API_KEY')
組合查詢(Combined Call)app
combined(parameter=value,...)
使用多文本分析操做分析文本,HTML,或者網頁內容。
若是你想從同一來源得到單元值和關鍵詞信息,你能夠調用一次「Combined Call」.定義合法的「exact」參數以便在分析中使用它們,全部應用於我的方法的使用費用將會在組合請求中反映。
任何extract方法的參數均可以使用,你能夠參考相應的應用部分來查看哪些操做是可用的,舉個例子,你能夠對實例和關鍵字啓用特定的情感信息服務,經過設定sentiment=1,由於這個參數使用了附加的使用費用而且是對實例和關鍵字這兩個部分。因此在請求中會是雙倍的費用。
import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.combined( url='https://www.ibm.com/us-en/', extract='entities,keywords', sentiment=1, max_items=1), indent=2))
這是這個調用的返回信息:
{ "status": "OK",\\表示請求是否成功 "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/us-en/", "totalTransactions": "4", "language": "english", "keywords": [ { "text": "NoSQL cloud database", "relevance": "0.940807", "sentiment": { "type": "positive", "score": "0.46058" } } ],//分析結果 "entities": [ { "type": "Company", "relevance": "0.805754", "sentiment": { "type": "positive", "score": "0.526551" }, "count": "4", "text": "IBM", "disambiguated": { "subType": [ "SoftwareLicense", "OperatingSystemDeveloper", "ProcessorManufacturer", "SoftwareDeveloper", "CompanyFounder", "ProgrammingLanguageDesigner", "ProgrammingLanguageDeveloper" ], "name": "IBM", "website": "http://www.ibm.com/", "dbpedia": "http://dbpedia.org/resource/IBM", "freebase": "http://rdf.freebase.com/ns/m.03sc8", "opencyc": "http://sw.opencyc.org/concept/Mx4rvViMoJwpEbGdrcN5Y29ycA", "yago": "http://yago-knowledge.org/resource/IBM", "crunchbase": "http://www.crunchbase.com/company/ibm" } } ] }
做者(Author)
從網頁或者HTML中獲得做者姓名
authors(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.authors( url='https://www.ibm.com/us-en/'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://techcrunch.com/2016/01/29/ibm-watson-weather-company-sale/", "authors": { "names": [ "Author Name 1", "Author Name 2" ] } }
概念(Concepts)
從網頁,HTML,純文本中提取概念。目前支持英語和西班牙語。
concepts(patameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.concepts( url='https://www.ibm.com/watson/', knowledgeGraph=1), indent=2))//knowledgeGraph能夠提供聯想信息,是附加收費參數。
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/watson/", "totalTransactions": "2", "language": "english", "concepts": [ { "text": "Thomas J. Watson", "relevance": "0.926128", "knowledgeGraph": { "typeHierarchy": "/people/thomas j. watson" }, "dbpedia": "http://dbpedia.org/resource/Thomas_J._Watson", "freebase": "http://rdf.freebase.com/ns/m.07qkt", "yago": "http://yago-knowledge.org/resource/Thomas_J._Watson" }, { "text": "Science", "relevance": "0.902652", "knowledgeGraph": { "typeHierarchy": "/fields/subjects/science" }, "dbpedia": "http://dbpedia.org/resource/Science", "freebase": "http://rdf.freebase.com/ns/m.06mq7", "opencyc": "http://sw.opencyc.org/concept/Mx4rwKQK2JwpEbGdrcN5Y29ycA" }, ... ] }
日期(Date Extraction)
從網頁,HTML,純文本中提取日期,目前僅支持英語。
dates(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.dates( text='Set a reminder for my appointment next Tuesday', anchor_date='2016-03-22 00:00:00'), indent=2)) { "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "totalTransactions": "1", "language": "english", "dates": [ { "date": "20160329T000000", "text": "next tuesday" } ] }
情感分析(Emotion Analysis)
從網頁,HTML,純文本中提取情感,目前僅支持英語。
emotion(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.emotion( url='charliechaplin.com/en/synopsis/articles/29-The-Great-Dictator-s-Speech'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.charliechaplin.com/en/synopsis/articles/29-The-Great-Dictator-s-Speech", "totalTransactions": "0", "language": "english", "docEmotions": { "anger": "0.639028", "disgust": "0.009711", "fear": "0.037295", "joy": "4e-05", "sadness": "0.002552" } }
實例(Entities)
從網頁,HTML,純文本中提取實例,標準實例表https://github.com/watson-dev....你也能夠本身建立用戶模型來獲得你想要的實例。
支持語言:英,德,法,意,葡,俄,西,瑞典。
entities(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.entities( url='http://www-03.ibm.com/press/us/en/pressrelease/49384.wss'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www-03.ibm.com/press/us/en/pressrelease/49384.wss", "language": "english", "entities": [ { "type": "Company", "relevance": "0.89792", "count": "12", "text": "IBM Cloud" }, { "type": "Company", "relevance": "0.590382", "count": "13", "text": "IBM", "disambiguated": { "subType": [ "SoftwareLicense", "OperatingSystemDeveloper", "ProcessorManufacturer", "SoftwareDeveloper", "CompanyFounder", "ProgrammingLanguageDesigner", "ProgrammingLanguageDeveloper" ], "name": "IBM", "website": "http://www.ibm.com/", "dbpedia": "http://dbpedia.org/resource/IBM", "freebase": "http://rdf.freebase.com/ns/m.03sc8", "opencyc": "http://sw.opencyc.org/concept/Mx4rvViMoJwpEbGdrcN5Y29ycA", "yago": "http://yago-knowledge.org/resource/IBM", "crunchbase": "http://www.crunchbase.com/company/ibm" } }, { "type": "Facility", "relevance": "0.252495", "count": "1", "text": "London Bluemix Garage" } ] }
Feed Detection
從網頁或html中提取RSS/ATOM 饋入連接。
feeds(parameter=values,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.feeds( url='news.ycombinator.com'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://news.ycombinator.com/", "feeds": [ { "feed": "https://news.ycombinator.com/rss" } ] }
關鍵詞(keywords)
從文本,網頁,HTML中提取關鍵詞
keywords(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.keywords( url='twitter.com/ibmwatson'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://mobile.twitter.com/ibmwatson", "totalTransactions": "1", "language": "english", "keywords": [ { "relevance": "0.936546", "text": "Watson" }, { "relevance": "0.823589", "text": "Watson Developer Cloud" }, ... ] }
語言識別(Language Detection)
從網頁,HTML,純文本中識別語言。 API會自動識別語言,但這個方法提供了關於語言識別的更多信息,爲了讓它有效工做,待識別的文本最好超過100個單詞。詳細的可識別語言列表參見http://www.ibm.com/watson/dev...
language(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.language( url='ibm.com/us-en'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.ibm.com/us-en/", "language": "english", "iso-639-1": "en", "iso-639-2": "eng", "iso-639-3": "eng", "ethnologue": "http://www.ethnologue.com/show_language.asp?code=eng", "native-speakers": "309-400 million", "wikipedia": "http://en.wikipedia.org/wiki/English_language" }
微格式(Microformats)
從網頁或HTML中提取微格式,更多關於微格式請見http://www.ibm.com/watson/dev...
microformats(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.microformats( url='http://microformats.org/wiki/hcard'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://microformats.org/wiki/hcard", "microformats": [ { "field": "RelTagLink", "data": "/wiki/Category:Specifications" }, { "field": "RelTag", "data": "Category:Specifications" }, { "field": "NameGivenName", "data": "Tantek" }, { "field": "NameFamilyName", "data": "Çelik" }, { "field": "FormattedName", "data": "Tantek Çelik" }, { "field": "Role", "data": "Editor" }, { "field": "Role", "data": "Author" } ] }
關係(Relations)
從網頁,文本,HTML中提取主謂賓關係
relations(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.relations( url='https://www.whitehouse.gov/the-press-office/2016/03/19/weekly-address-president-obamas-supreme-court-nomination', max_items=1), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "https://www.whitehouse.gov/the-press-office/2016/03/19/weekly-address-president-obamas-supreme-court-nomination", "language": "english", "relations": [ { "sentence": " WASHINGTON, DC — In this week's address, the President discussed his decision to nominate Chief Judge Merrick Garland to the Supreme Court of the United States.", "subject": { "text": "the President" }, "action": { "text": "to nominate", "lemmatized": "to nominate", "verb": { "text": "nominate", "tense": "future" } }, "object": { "text": "Merrick Garland" } } ] }
情感分析(Sentiment Analysis)
這個情感分析不一樣於以前的「Emotion Analysis」,它是對於整個文本的頁面的總體情感態度分析。
sentiment(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.sentiment( url='http://www.huffingtonpost.com/2010/06/22/iphone-4-review-the-worst_n_620714.html'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.huffingtonpost.com/2010/06/22/iphone-4-review-the-worst_n_620714.html", "totalTransactions": "1", "language": "english", "docSentiment": { "mixed": "1", "score": "-0.24582", "type": "negative" } }
目標情感(Targeted Sentiment)
在網頁,HTML,純文本中分析特定詞語的情感
targeted_sentiment(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.targeted_sentiment( url='http://www.zacks.com/stock/news/207968/stock-market-news-for-february-19-2016', targets=['NASDAQ', 'Dow']), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.zacks.com/stock/news/207968/stock-market-news-for-february-19-2016", "totalTransactions": "1", "language": "english", "results": [ { "sentiment": { "score": "-0.387744", "type": "negative" }, "text": "NASDAQ" }, { "sentiment": { "score": "-0.416076", "type": "negative" }, "text": "Dow" } ] }
分類(Taxonomy)
將網頁,HTML,純文本分紅5類。
taxonomy(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.taxonomy( url='cnn.com'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://www.cnn.com/", "totalTransactions": "1", "language": "english", "taxonomy": [ { "label": "/news", "score": "0.994385"//識別類型 }, { "label": "/art and entertainment/movies and tv/television", "score": "0.706355" }, { "confident": "no", "label": "/sports/football", "score": "0.471388" } ] }
文本(簡化版)Text (cleaned)
從網頁中提取主要文本。
text(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.text( url='techcrunch.com/2016/01/29/ibm-watson-weather-company-sale'), indent=2))
返回結果:
{ "status": "OK", "usage": "By accessing AlchemyAPI or using information generated by AlchemyAPI, you are agreeing to be bound by the AlchemyAPI Terms of Use: http://www.alchemyapi.com/company/terms.html", "url": "http://techcrunch.com/2016/01/29/ibm-watson-weather-company-sale/", "language": "english", "text": " IBM is taking another step to expand its Watson AI business and build its presence in areas like IoT: today the company announced ... " }
文本(原生)Text (raw)
從網頁中提取原生文本
text(parameter=value,...) import json from watson_developer_cloud import AlchemyLanguageV1 alchemy_language = AlchemyLanguageV1(api_key='API_KEY') print(json.dumps( alchemy_language.raw_text( url='techcrunch.com/2016/01/29/ibm-watson-weather-company-sale'), indent=2))