Watson Natural Language Understanding

 

沃森機器人天然語言理解服務的開發教程,學習使用 IBM Bluemix Watson NLU服務 完成天然語言的文本內容分析。app

導入 Watson Python SDK

可能須要手動安裝Python SDK,能夠經過命令:pip install --upgrade watson-developer-cloud 進行安裝與更新。框架

In [1]:工具

from watson_developer_cloud import NaturalLanguageUnderstandingV1
from watson_developer_cloud.natural_language_understanding_v1 import (
    Features, EntitiesOptions, KeywordsOptions)

經過用戶名和密碼鏈接服務

用戶名和密碼均可根據實際狀況在服務憑證信息中找到學習

In [2]:測試

username = '9461ca60-a53a-48d4-83da-89e52eeb2688'
password = 'qq4GhRKYyVAM'

In [3]:ui

nlu = NaturalLanguageUnderstandingV1(version='2017-12-01', username=username, password=password)

In [4]:this

features = Features(entities=EntitiesOptions(), keywords=KeywordsOptions())

準備一些文本資料開始測試

須要注意的是該服務的中文版本仍在開發狀態,因此建議你們用英文進行測試spa

In [5]:3d

texts = [
    '''Welcome to the official documentation of Godot Engine,
    the free and open source community-driven 2D and 3D game engine!
    If you are new to this documentation,
    we recommend that you read the introduction page to get
    an overview of what this documentation has to offer.''',
    
   '''Godot Engine is an open source project developed by a community of volunteers.
    It means that the documentation team can always use your feedback and help to
    improve the tutorials and class reference. If you do not manage to understand something,
    or cannot find what you are looking for in the docs,
    help us make the documentation better by letting us know!''' 
]

分析文本列表0的內容

In [6]:code

nlu.analyze(features=features, text=texts[0])

Out[6]:

{'entities': [{'count': 1,
   'relevance': 0.992572,
   'text': 'Godot Engine',
   'type': 'Company'},
  {'count': 1, 'relevance': 0.338173, 'text': 'official', 'type': 'JobTitle'}],
 'keywords': [{'relevance': 0.908999, 'text': 'Godot Engine'},
  {'relevance': 0.741794, 'text': 'open source'},
  {'relevance': 0.702832, 'text': 'introduction page'},
  {'relevance': 0.674454, 'text': 'official documentation'},
  {'relevance': 0.660084, 'text': 'game engine'},
  {'relevance': 0.230546, 'text': 'overview'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 282, 'text_units': 1}}

咱們能夠看到服務爲咱們識別出來兩個主題,即類型爲「Company」的「Godot Engine」;和類型爲「JobTitle」的「official」。除此以外還獲得了一些關鍵詞,如:Godot Engine,open source,introduction page,official documentation……,以及統計出了字符數量和文本單元數量。

封裝方法並測試文本列表1的內容

In [7]:

NLUA = lambda text: nlu.analyze(features=features, text=text)

In [8]:

NLUA(texts[1])

Out[8]:

{'entities': [],
 'keywords': [{'relevance': 0.973304, 'text': 'open source project'},
  {'relevance': 0.838422, 'text': 'Godot Engine'},
  {'relevance': 0.624781, 'text': 'class reference'},
  {'relevance': 0.60158, 'text': 'documentation team'},
  {'relevance': 0.369586, 'text': 'docs'},
  {'relevance': 0.32839, 'text': 'feedback'},
  {'relevance': 0.28266, 'text': 'community'},
  {'relevance': 0.282402, 'text': 'volunteers'},
  {'relevance': 0.274328, 'text': 'tutorials'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 372, 'text_units': 1}}

測試更大的文本內容並將關鍵字保存爲CSV文檔

經過 Python 的 Pandas 數據分析框架將關鍵字信息保存在CSV文檔中,便於使用Excel之類的工具進行瀏覽。

In [9]:

import pandas as pd

In [10]:

text = '''
Introduction

The Blender Game Engine (BGE) is Blender’s tool for real time projects,
from architectural visualizations and simulations to games.

A word of warning, before you start any big or serious project with the Blender Game Engine,
you should note that it is currently not very supported and that there are plans for its retargeting and
refactoring that, in the very least, will break compatibility. For further information,
you should get in touch with the developers via mailing list or IRC and read the development roadmap.

Use Cases and Sample Games

Blender has its own built in Game Engine that allows you to create interactive 3D applications or simulations.
The major difference between Game Engine and the conventional Blender system is in the rendering process.
In the normal Blender engine, images and animations are built off-line – once rendered they cannot be modified.
Conversely, the Blender Game Engine renders scenes continuously in real-time,
and incorporates facilities for user interaction during the rendering process.'''

In [11]:

response = NLUA(text)
response

Out[11]:

{'entities': [{'count': 1,
   'relevance': 0.325141,
   'text': 'Blender',
   'type': 'Company'},
  {'count': 1, 'relevance': 0.109868, 'text': 'BGE', 'type': 'Organization'}],
 'keywords': [{'relevance': 0.968481, 'text': 'Blender Game Engine'},
  {'relevance': 0.672281, 'text': 'normal Blender engine'},
  {'relevance': 0.642034, 'text': 'Blender’s tool'},
  {'relevance': 0.585455, 'text': 'conventional Blender'},
  {'relevance': 0.544693, 'text': 'real time projects'},
  {'relevance': 0.516305, 'text': 'Engine renders scenes'},
  {'relevance': 0.50586, 'text': 'interactive 3D applications'},
  {'relevance': 0.503355, 'text': 'rendering process'},
  {'relevance': 0.440065, 'text': 'architectural visualizations'},
  {'relevance': 0.42788, 'text': 'development roadmap'},
  {'relevance': 0.415892, 'text': 'mailing list'},
  {'relevance': 0.415685, 'text': 'major difference'},
  {'relevance': 0.411704, 'text': 'Sample Games'},
  {'relevance': 0.411643, 'text': 'user interaction'},
  {'relevance': 0.358522, 'text': 'simulations'},
  {'relevance': 0.348923, 'text': 'retargeting'},
  {'relevance': 0.326577, 'text': 'compatibility'},
  {'relevance': 0.325289, 'text': 'warning'},
  {'relevance': 0.324025, 'text': 'Introduction'},
  {'relevance': 0.322093, 'text': 'BGE'},
  {'relevance': 0.321985, 'text': 'IRC'},
  {'relevance': 0.320995, 'text': 'plans'},
  {'relevance': 0.320874, 'text': 'touch'},
  {'relevance': 0.320867, 'text': 'information'},
  {'relevance': 0.320457, 'text': 'word'},
  {'relevance': 0.319156, 'text': 'developers'},
  {'relevance': 0.319064, 'text': 'Cases'}],
 'language': 'en',
 'usage': {'features': 2, 'text_characters': 1050, 'text_units': 1}}

In [12]:

keywords = pd.DataFrame(response['keywords'])
keywords

Out[12]:

  relevance text
0 0.968481 Blender Game Engine
1 0.672281 normal Blender engine
2 0.642034 Blender’s tool
3 0.585455 conventional Blender
4 0.544693 real time projects
5 0.516305 Engine renders scenes
6 0.505860 interactive 3D applications
7 0.503355 rendering process
8 0.440065 architectural visualizations
9 0.427880 development roadmap
10 0.415892 mailing list
11 0.415685 major difference
12 0.411704 Sample Games
13 0.411643 user interaction
14 0.358522 simulations
15 0.348923 retargeting
16 0.326577 compatibility
17 0.325289 warning
18 0.324025 Introduction
19 0.322093 BGE
20 0.321985 IRC
21 0.320995 plans
22 0.320874 touch
23 0.320867 information
24 0.320457 word
25 0.319156 developers
26 0.319064 Cases

In [13]:

keywords.to_csv('keyowrds.csv')
相關文章
相關標籤/搜索