目錄python
This chapter provides a brief overview of scripting language extension programming and the mechanisms by which scripting language interpreters access C and C++ code.c++
本章簡要概述了腳本語言擴展編程,以及腳本語言解釋器訪問 C 和 C++ 代碼的機制。編程
When a scripting language is used to control a C program, the resulting system tends to look as follows:api
當使用腳本語言來控制 C 程序時,生成的系統每每以下所示:數組
In this programming model, the scripting language interpreter is used for high level control whereas the underlying functionality of the C/C++ program is accessed through special scripting language "commands." If you have ever tried to write your own simple command interpreter, you might view the scripting language approach to be a highly advanced implementation of that. Likewise, If you have ever used a package such as MATLAB or IDL, it is a very similar model--the interpreter executes user commands and scripts. However, most of the underlying functionality is written in a low-level language like C or Fortran.數據結構
The two-language model of computing is extremely powerful because it exploits the strengths of each language. C/C++ can be used for maximal performance and complicated systems programming tasks. Scripting languages can be used for rapid prototyping, interactive debugging, scripting, and access to high-level data structures such associative arrays.app
在此編程模型中,腳本語言解釋器用於高級控制,而 C/C++ 程序的基礎功能經過特殊腳本語言「命令」訪問。若是你曾嘗試編寫本身的簡單命令解釋器,則可能會將腳本語言方法視爲其高級實現。一樣,若是你曾經使用過像 MATLAB 或 IDL 這樣的軟件包,它就是一個很是類似的模型——解釋器執行用戶命令和腳本。可是,大多數底層功能都是用 C 或 Fortran 等低級語言編寫的。ide
雙語計算模型很是強大,由於它充分利用了每種語言的優點。C/C++ 可用於最大化性能和複雜系統編程任務。腳本語言可用於快速原型設計、交互式調試、腳本編寫,以及對高級數據結構(如關聯數組)的訪問。函數
Scripting languages are built around a parser that knows how to execute commands and scripts. Within this parser, there is a mechanism for executing commands and accessing variables. Normally, this is used to implement the builtin features of the language. However, by extending the interpreter, it is usually possible to add new commands and variables. To do this, most languages define a special API for adding new commands. Furthermore, a special foreign function interface defines how these new commands are supposed to hook into the interpreter.佈局
Typically, when you add a new command to a scripting interpreter you need to do two things; first you need to write a special "wrapper" function that serves as the glue between the interpreter and the underlying C function. Then you need to give the interpreter information about the wrapper by providing details about the name of the function, arguments, and so forth. The next few sections illustrate the process.
腳本語言圍繞一個知道如何執行命令和腳本的解析器構建。在此解析器中,有一種執行命令和訪問變量的機制。一般,這用於實現語言的內置功能。可是,經過擴展解釋器,一般能夠添加新的命令和變量。爲此,大多數語言都定義了一個用於添加新命令的特殊 API。此外,一個特殊的外部函數接口定義了這些新命令應該如何掛鉤到解釋器中。
一般,當你向腳本解釋器添加新命令時,你須要作兩件事。首先,你須要編寫一個特殊的「包裝器」函數,該函數充當解釋器和底層 C 函數之間的粘合劑。而後,你須要經過提供有關函數名稱、參數等的詳細信息,爲解釋器提供有關包裝器的信息。接下來的幾節將說明這一過程。
Suppose you have an ordinary C function like this :
假定你的初始 C 函數以下:
int fact(int n) { if (n <= 1) return 1; else return n*fact(n-1); }
In order to access this function from a scripting language, it is necessary to write a special "wrapper" function that serves as the glue between the scripting language and the underlying C function. A wrapper function must do three things :
As an example, the Tcl wrapper function for the fact()
function above example might look like the following :
爲了從腳本語言訪問此函數,有必要編寫一個特殊的「包裝器」函數,做爲腳本語言和底層 C 函數之間的粘合劑。包裝函數必須作三件事:
- 收集函數參數並確保它們有效。
- 調用 C 函數。
- 將返回值轉換爲腳本語言識別的形式。
舉個例子,上面例子中
fact()
函數的 Tcl 包裝器函數可能以下所示:
int wrap_fact(ClientData clientData, Tcl_Interp *interp, int argc, char *argv[]) { int result; int arg0; if (argc != 2) { interp->result = "wrong # args"; return TCL_ERROR; } arg0 = atoi(argv[1]); result = fact(arg0); sprintf(interp->result, "%d", result); return TCL_OK; }
Once you have created a wrapper function, the final step is to tell the scripting language about the new function. This is usually done in an initialization function called by the language when the module is loaded. For example, adding the above function to the Tcl interpreter requires code like the following :
一旦建立了包裝函數,最後一步就是告訴腳本語言有關新函數的信息。這一般在加載模塊時由語言調用的初始化函數中完成。例如,將上述函數添加到 Tcl 解釋器須要以下代碼:
int Wrap_Init(Tcl_Interp *interp) { Tcl_CreateCommand(interp, "fact", wrap_fact, (ClientData) NULL, (Tcl_CmdDeleteProc *) NULL); return TCL_OK; }
When executed, Tcl will now have a new command called "fact
" that you can use like any other Tcl command.
Although the process of adding a new function to Tcl has been illustrated, the procedure is almost identical for Perl and Python. Both require special wrappers to be written and both need additional initialization code. Only the specific details are different.
執行時,Tcl 將有一個名爲
fact
的新命令,你能夠像使用任何其餘 Tcl 命令同樣使用它。雖然只說明瞭向 Tcl 添加新函數的過程,但 Perl 和 Python 的過程幾乎相同。二者都須要編寫特殊的包裝器,而且都須要額外的初始化代碼。只有具體細節不一樣。
Variable linking refers to the problem of mapping a C/C++ global variable to a variable in the scripting language interpreter. For example, suppose you had the following variable:
變量連接指的是將 C/C++ 全局變量映射到腳本語言解釋器中變量的問題。例如,假設你有如下變量:
double Foo = 3.5;
It might be nice to access it from a script as follows (shown for Perl):
以以下所示從腳本中訪問它看起來挺不錯(顯示爲 Perl):
$a = $Foo * 2.3; # Evaluation $Foo = $a + 2.0; # Assignment
To provide such access, variables are commonly manipulated using a pair of get/set functions. For example, whenever the value of a variable is read, a "get" function is invoked. Similarly, whenever the value of a variable is changed, a "set" function is called.
In many languages, calls to the get/set functions can be attached to evaluation and assignment operators. Therefore, evaluating a variable such as $Foo
might implicitly call the get function. Similarly, typing $Foo = 4
would call the underlying set function to change the value.
爲了提供這種訪問,一般使用一對 get/set 函數來操縱變量。例如,每當讀取變量的值時,就會調用「get」函數。相似地,只要改變變量的值,就會調用「set」函數。
在許多語言中,對 get/set 函數的調用能夠附加到求值和賦值運算符。所以,評估諸如
$Foo
之類的變量可能會隱式調用 get 函數。相似地,鍵入$Foo = 4
將調用底層 set 函數來更改值。
In many cases, a C program or library may define a large collection of constants. For example:
在許多狀況下,C 程序或庫能夠定義大量常量。例如:
#define RED 0xff0000 #define BLUE 0x0000ff #define GREEN 0x00ff00
To make constants available, their values can be stored in scripting language variables such as $RED
, $BLUE
, and $GREEN
. Virtually all scripting languages provide C functions for creating variables so installing constants is usually a trivial exercise.
要使常量可用,它們的值能夠存儲在腳本語言變量中,例如
$RED
,$BLUE
和$GREEN
。實際上,全部腳本語言都提供了用於建立變量的 C 函數,所以放置常量一般不是一個問題。
Although scripting languages have no trouble accessing simple functions and variables, accessing C/C++ structures and classes present a different problem. This is because the implementation of structures is largely related to the problem of data representation and layout. Furthermore, certain language features are difficult to map to an interpreter. For instance, what does C++ inheritance mean in a Perl interface?
The most straightforward technique for handling structures is to implement a collection of accessor functions that hide the underlying representation of a structure. For example,
雖然腳本語言在訪問簡單函數和變量時沒有問題,但訪問 C/C++ 結構體和類會帶來不一樣的問題。這是由於結構體的實現主要與數據表示和佈局問題有關。此外,某些語言特徵難以映射到解釋器。例如,C++ 繼承在 Perl 接口中對應着什麼?
處理結構體最直接的技術是實現一個訪問器函數的集合以隱藏結構的底層表示。例如,
struct Vector { Vector(); ~Vector(); double x, y, z; };
can be transformed into the following set of functions :
能夠轉換爲如下一組函數:
Vector *new_Vector(); void delete_Vector(Vector *v); double Vector_x_get(Vector *v); double Vector_y_get(Vector *v); double Vector_z_get(Vector *v); void Vector_x_set(Vector *v, double x); void Vector_y_set(Vector *v, double y); void Vector_z_set(Vector *v, double z);
Now, from an interpreter these function might be used as follows:
如今,能夠從解釋器中使用這些函數,以下所示:
% set v [new_Vector] % Vector_x_set $v 3.5 % Vector_y_get $v % delete_Vector $v % ...
Since accessor functions provide a mechanism for accessing the internals of an object, the interpreter does not need to know anything about the actual representation of a Vector
.
因爲訪問器函數提供了訪問對象內部的機制,所以解釋器不須要知道關於
Vector
的實際表示的任何信息。
In certain cases, it is possible to use the low-level accessor functions to create a proxy class, also known as a shadow class. A proxy class is a special kind of object that gets created in a scripting language to access a C/C++ class (or struct) in a way that looks like the original structure (that is, it proxies the real C++ class). For example, if you have the following C++ definition :
在某些狀況下,可使用低級訪問器函數來建立代理類,也稱爲影子類。代理類是一種特殊類型的對象,它以腳本語言建立,以一種看起來像原始結構體的方式訪問 C/C++ 類(或結構體)(即它代理真正的 C++ 類)。例如,若是你有如下 C++ 定義:
class Vector { public: Vector(); ~Vector(); double x, y, z; };
A proxy classing mechanism would allow you to access the structure in a more natural manner from the interpreter. For example, in Python, you might want to do this:
代理分類機制容許你以更天然的方式從解釋器訪問結構體。例如,在 Python 中,你可能但願這樣作:
>>> v = Vector() >>> v.x = 3 >>> v.y = 4 >>> v.z = -13 >>> ... >>> del v
Similarly, in Perl5 you may want the interface to work like this:
一樣,在 Perl5 中,你可能但願接口像這樣工做:
$v = new Vector; $v->{x} = 3; $v->{y} = 4; $v->{z} = -13;
Finally, in Tcl :
最後是在 Tcl 中:
Vector v v configure -x 3 -y 4 -z -13
When proxy classes are used, two objects are really at work--one in the scripting language, and an underlying C/C++ object. Operations affect both objects equally and for all practical purposes, it appears as if you are simply manipulating a C/C++ object.
當使用代理類時,有兩個對象實際在起做用——一個在腳本語言中,另外一個在底層的 C/C++ 對象中。操做同等地影響兩個對象,以及全部實際目的,看起來好像只是在操做 C/C++ 對象。
The final step in using a scripting language with your C/C++ application is adding your extensions to the scripting language itself. There are two primary approaches for doing this. The preferred technique is to build a dynamically loadable extension in the form of a shared library. Alternatively, you can recompile the scripting language interpreter with your extensions added to it.
在 C/C++ 應用程序中使用腳本語言的最後一步是向腳本語言自己添加擴展。這有兩種主要方法。首選技術是以共享庫的形式構建可動態加載的擴展。或者,你能夠從新編譯腳本語言解釋器並添加擴展。
To create a shared library or DLL, you often need to look at the manual pages for your compiler and linker. However, the procedure for a few common platforms is shown below:
要建立共享庫或 DLL,一般須要查看編譯器和連接器的手冊。可是,一些常見系統的過程以下所示:
# Build a shared library for Solaris gcc -fpic -c example.c example_wrap.c -I/usr/local/include ld -G example.o example_wrap.o -o example.so # Build a shared library for Linux gcc -fpic -c example.c example_wrap.c -I/usr/local/include gcc -shared example.o example_wrap.o -o example.so
To use your shared library, you simply use the corresponding command in the scripting language (load, import, use, etc...). This will import your module and allow you to start using it. For example:
要使用共享庫,只需使用腳本語言中的相應命令(
load
、import
、use
等)。這將導入你的模塊並容許你開始使用它。例如:
% load ./example.so % fact 4 24 %
When working with C++ codes, the process of building shared libraries may be more complicated--primarily due to the fact that C++ modules may need additional code in order to operate correctly. On many machines, you can build a shared C++ module by following the above procedures, but changing the link line to the following :
使用 C++ 代碼時,構建共享庫的過程可能會更復雜——主要是由於 C++ 模塊可能須要額外的代碼才能正常運行。在許多機器上,你能夠按照上述過程構建共享 C++ 模塊,但將連接行更改成如下內容:
c++ -shared example.o example_wrap.o -o example.so
When building extensions as shared libraries, it is not uncommon for your extension to rely upon other shared libraries on your machine. In order for the extension to work, it needs to be able to find all of these libraries at run-time. Otherwise, you may get an error such as the following :
將擴展構建爲共享庫時,擴展依賴於計算機上的其餘共享庫的狀況並不罕見。爲了使擴展可以工做,它須要可以在運行時找到全部這些庫。不然,你可能會收到以下錯誤:
>>> import graph Traceback (innermost last): File "<stdin>", line 1, in ? File "/home/sci/data1/beazley/graph/graph.py", line 2, in ? import graphc ImportError: 1101:/home/sci/data1/beazley/bin/python: rld: Fatal Error: cannot successfully map soname 'libgraph.so' under any of the filenames /usr/lib/libgraph.so:/ lib/libgraph.so:/lib/cmplrs/cc/libgraph.so:/usr/lib/cmplrs/cc/libgraph.so: >>>
What this error means is that the extension module created by SWIG depends upon a shared library called "libgraph.so
" that the system was unable to locate. To fix this problem, there are a few approaches you can take.
-R
, -rpath
, etc. This is not implemented in a standard manner so read the man pages for your linker to find out more about how to set the search path for shared libraries.LD_LIBRARY_PATH
to the directory where shared libraries are located before running Python. Although this is an easy solution, it is not recommended. Consider setting the path using linker options instead.這個錯誤意味着 SWIG 建立的擴展模塊所依賴的名爲
libgraph.so
的共享庫在系統中沒法找到。要解決此問題,你能夠採起一些方法。
- 連接你的擴展並明確告訴連接器所需庫所在的位置。一般,這可使用特殊的連接器標誌來完成,例如
-R
、-rpath
等。這不是以標準方式實現的,所以請閱讀連接器的手冊以瞭解更多有關如何設置共享庫搜索路徑的信息。- 將共享庫放在與可執行文件相同的目錄中。在非 Unix 平臺上的正確操做有時須要此技術。
- 在運行 Python 以前,將 UNIX 環境變量
LD_LIBRARY_PATH
設置爲共享庫所在的目錄。雖然這是一個簡單的解決方案,但不建議這樣作。請考慮使用連接器選項設置路徑。
With static linking, you rebuild the scripting language interpreter with extensions. The process usually involves compiling a short main program that adds your customized commands to the language and starts the interpreter. You then link your program with a library to produce a new scripting language executable.
Although static linking is supported on all platforms, this is not the preferred technique for building scripting language extensions. In fact, there are very few practical reasons for doing this--consider using shared libraries instead.
使用靜態連接,你可使用擴展來重建腳本語言解釋器。該過程一般涉及編譯一個簡短的主程序,該程序將自定義命令添加到語言中並啓動解釋程序。而後,將程序與庫連接以生成新的腳本語言可執行文件。
雖然全部平臺都支持靜態連接,但這不是構建腳本語言擴展的首選技術。實際上,這樣作的實際理由不多——請考慮使用共享庫。