在之前的 Python的变量类型 文章中,已经介绍过数字类型,不过那篇文章只是简单的介绍,下面将对数字类型及相关的类型转换函数进行更加深入的探讨(主要是从Python的C源代码的角度来进行分析)。 在进入正题之前,需要补充说明的是,Python不仅在大的版本号之间存在脚本代码的兼容性问题(比如2.x.x与3.x.x的版本),而且在小的版本号之间也存在兼容问题...

    页面导航: 英文教程的下载地址:

    本篇文章是根据英文教程《Python Tutorial》来写的学习笔记。该英文教程的下载地址如下:

    百度盘地址:http://pan.baidu.com/s/1c0eXSQG

    DropBox地址:点此进入DropBox链接

    Google Drive:点此进入Google Drive链接

    这是学习笔记,不是翻译,因此,内容上会与英文原著有些不同。以下记录主要是根据英文教程的第八章来写的。(文章中的部分链接,可能需要通过代理访问!)

概述:

    在之前的"Python的变量类型"文章中,已经介绍过数字类型,不过那篇文章只是简单的介绍,下面将对数字类型及相关的类型转换函数进行更加深入的探讨(主要是从Python的C源代码的角度来进行分析)。

    在进入正题之前,需要补充说明的是,Python不仅在大的版本号之间存在脚本代码的兼容性问题(比如2.x.x与3.x.x的版本),而且在小的版本号之间也存在兼容问题,如下所示:

[email protected]:~$ python2.6
Python 2.6.6 (r266:84292, Nov 27 2010, 19:47:39) 
[GCC 4.5.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1,2,3,4,5}
  File "", line 1
    a = {1,2,3,4,5}
          ^
SyntaxError: invalid syntax
>>> a = set([1,2,3,4,5])
>>> quit()
[email protected]:~$ python2.7
Python 2.7.8 (default, Feb  8 2015, 20:16:27) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = {1,2,3,4,5}
>>> quit()
[email protected]:~$ 


    从上面的输出显示中可以看到,python2.6.6中使用大括号来设置set数据类型时,会抛出invalid syntax(无效的语法)的错误,而python2.7.8中则没有语法错误。因此,在写python代码时,需要注意各版本之间的兼容问题。

整数类型:

    下面将从python源代码的角度来分析整数类型,在之前"Python的安装与使用"文章里,我们介绍过如何使用源代码来编译安装python,如果想使用gdb来调试python源代码的话,就需要按照前面"Python基本的操作运算符"文章中所说的,使用configure --with-pydebug命令来重新编译安装python。

    在安装了可调试的python后,就可以使用gdb来调试分析python的C源代码了:

[email protected]:~$ gdb python -q
...................................................
>>> a = 11345
...................................................
Breakpoint 1, PyInt_FromLong (ival=11345) at Objects/intobject.c:91
91	    if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) {
(gdb) until 111
PyInt_FromLong (ival=11345) at Objects/intobject.c:111
111	    v->ob_ival = ival;
(gdb) n
112	    return (PyObject *) v;
(gdb) ptype v
type = struct {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    long int ob_ival;
} *
(gdb) p v->ob_ival
$1 = 11345
...................................................
[email protected]:~$  


    python源码中,与整数类型相关的代码位于Objects/intobject.c文件里,上面的 a = 11345 脚本在执行时,会调用PyInt_FromLong函数为11345这个整数创建一个对象,该对象的C语言结构体为:

struct {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    long int ob_ival;
} 


    其中,前四个字段是python的每个对象都会有的,python中的所有对象都会通过_ob_next_ob_prev这两个字段相互联系起来,以构成一个双向链表,ob_refcnt是每个对象的引用计数器,当引用计数器为0时,该对象就会被回收掉,ob_type字段用于表示对象的类型,第五个ob_ival则是整数对象专有的字段,ob_ival里会存储该整数对象具体的整数值,例如前面例子中的11345的整数值。

变量与值:

    python脚本代码中的所有东西都是对象,连变量名也是对象(确切的说,应该是字符串对象),在python源代码的Python/ceval.c文件中有如下一段C代码(2.7.8版本对应的起始行号为1948行):

case STORE_NAME:
	w = GETITEM(names, oparg);
	v = POP();
	if ((x = f->f_locals) != NULL) {
		if (PyDict_CheckExact(x))
			err = PyDict_SetItem(x, w, v);
		else
			err = PyObject_SetItem(x, w, v);
		Py_DECREF(v);
		if (err == 0) continue;
		break;
	}
	PyErr_Format(PyExc_SystemError,
				 "no locals found when storing %s",
				 PyObject_REPR(w));
	break;


    当对python的变量进行赋值时,就会执行上面这段C代码:

[email protected]:~$ gdb python -q
...................................................
>>> a = 11345
Program received signal SIGINT, Interrupt.(通过ctrl+c组合键来中断python,并进入gdb调试)
0xb7ee09f8 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) b ceval.c:1949
Breakpoint 1 at 0x80f30ee: file Python/ceval.c, line 1949.
(gdb) c
Continuing.
(这里按个回车键, 让上面的a = 11345脚本得以执行,并触发ceval.c中设置的断点!)

Breakpoint 1, PyEval_EvalFrameEx (f=0xb7d8ce44, throwflag=0)
    at Python/ceval.c:1949
1949	            w = GETITEM(names, oparg);
(gdb) n
1950	            v = POP();
(gdb) p * (PyStringObject *)w
$1 = {_ob_next = 0xb7dc1e34, _ob_prev = 0xb7dadcb8, ob_refcnt = 10, 
  ob_type = 0x81c9ca0, ob_size = 1, ob_shash = -468864544, ob_sstate = 1, 
  ob_sval = "a"}
(gdb) n
1951	            if ((x = f->f_locals) != NULL) {
(gdb) p * (PyIntObject *)v
$2 = {_ob_next = 0xb7ca1114, _ob_prev = 0xb7ca1aa8, ob_refcnt = 3, 
  ob_type = 0x81c2d80, ob_ival = 11345}
(gdb) c
Continuing.
...................................................
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 
 '__doc__': None, 'a': 11345, '__package__': None}
...................................................
>>> quit()
...................................................
[email protected]:~$ 


    上面的w是一个PyStringObject类型的字符串对象,该对象里存储的字符串"a"就是要设置的变量名,上面的v则是该变量对应的值,也就是值为11345PyIntObject的整数对象,这些变量和对应的值会构成key-value(名值对),并加入到python内部的f_locals的Dict(词典)中。在python的命令行下,可以输入如上所示的locals函数来查看Python里当前设置了哪些变量和值。

    Python中,在设置了某个变量后,还可以使用del关键字来删除该变量:

[email protected]:~$ python
Python 2.7.8 (default, Feb 20 2015, 12:54:46) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 11345
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 
 '__doc__': None, 'a': 11345, '__package__': None}
>>> del a
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 
 '__doc__': None, '__package__': None}
>>> print a
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'a' is not defined
>>> quit()
[email protected]:~$ 


    从上面的输出中,可以看到:当使用del a语句将变量a删除后,该变量就从f_locals词典中被移除掉了(可以通过locals函数来查看f_locals词典里的内容),再通过print a指令来访问该变量时,就会提示 name 'a' is not defined 即变量名'a'没有被定义过的错误了。

    del关键字的Python语法如下:

del var1[,var2[,var3[....,varN]]]]


    从上面的语法,可以看出:单条del语句还可以同时删除多个变量,如下所示:

[email protected]:~$ python
Python 2.7.8 (default, Feb 20 2015, 12:54:46) 
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 11345
>>> b = 45678
>>> c = 33456
>>> locals()
{'a': 11345, 'c': 33456, 'b': 45678, '__builtins__': <module '__builtin__' (built-in)>, '__package__': None, '__name__': '__main__', '__doc__': None}
>>> del a, b, c
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__package__': None, '__name__': '__main__', '__doc__': None}
>>> quit()
[email protected]:~$ 


    从Python源码角度分析的话,del语句在执行时,会通过Python/ceval.c文件里的如下C代码来删除变量(2.7.8的版本中对应的起始行号为1965):

case DELETE_NAME:
	w = GETITEM(names, oparg);
	if ((x = f->f_locals) != NULL) {
		if ((err = PyObject_DelItem(x, w)) != 0)
			format_exc_check_arg(PyExc_NameError,
					     NAME_ERROR_MSG,
					     w);
		break;
	}
	PyErr_Format(PyExc_SystemError,
		     "no locals when deleting %s",
		     PyObject_REPR(w));
	break;


    上面的w是一个字符串对象,该对象中保存了变量名信息,代码里通过PyObject_DelItem函数来将该变量名对应的key-value(名值对)从f_locals词典中给移除掉。

长整数类型:

    和Python长整数类型相关的C源代码,位于Objects/longobject.c文件中,我们通过下面的例子来进行分析:

[email protected]:~$ gdb python -q 
..................................................
>>> a = 1232323232344566777886555L
Program received signal SIGINT, Interrupt.
0xb7ee09f8 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) b PyLong_FromString 
Breakpoint 1 at 0x8088445: file Objects/longobject.c, line 1717.
(gdb) c
Continuing.
(这里按个回车键, 让上面的a = 123232....的脚本得以执行,
并触发longobject.c中设置的断点!)

Breakpoint 1, PyLong_FromString (str=0xb7ca0118 "1232323232344566777886555L", 
    pend=0x0, base=0) at Objects/longobject.c:1717
1717	    int sign = 1;
(gdb) until 1977
PyLong_FromString (str=0xb7ca0132 "", pend=0x0, base=10)
    at Objects/longobject.c:1977
1977	    return (PyObject *) z;
(gdb) ptype z
type = struct _longobject {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    Py_ssize_t ob_size;
    digit ob_digit[1];
} *
(gdb) p* (PyLongObject *)z
$1 = {_ob_next = 0xb7ca1114, _ob_prev = 0x81c5960, ob_refcnt = 1, 
  ob_type = 0x81c4440, ob_size = 6, ob_digit = {14171}}
(gdb) p z->ob_digit[0]
$2 = 14171
(gdb) p z->ob_digit[1]
$3 = 15083
(gdb) p z->ob_digit[2]
$4 = 1098
(gdb) p z->ob_digit[3]
$5 = 674
(gdb) p z->ob_digit[4]
$6 = 20294
(gdb) p z->ob_digit[5]
$7 = 32
(gdb) c
Continuing.
[40761 refs]
>>> 14171 * (32768**0) + 15083 * (32768**1) + 1098 * (32768**2) + 674 * (32768**3) + 20294 * (32768**4) + 32 * (32768**5)
1232323232344566777886555L
..................................................
[email protected]:~$ 


    当我们执行 a = 1232323232344566777886555L 的脚本代码时,Python内部会通过PyLong_FromString函数,将1232323232344566777886555L这个十进制数转为32768进制的数,并将32768进制的每一位都存储到PyLongObject的ob_digit字段所对应的数组中,例如上面例子中,数组的第一个元素14171为该进制的最低位,第六个元素32为该进制的最高位,数组的元素个数统计在ob_size字段中。在C代码里,32768是被定义为PyLong_BASE宏的形式,该宏定义于Include/longintrepr.h的头文件里:

..................................................
#define PyLong_SHIFT	15
..................................................
#define PyLong_BASE	((digit)1 << PyLong_SHIFT)
..................................................


    可以看到,PyLong_BASE宏就是1左移15位的值,即2的15次方,也就是上面提到的32768 。

    在python中设置长整数时,最好以大写的"L"字符结尾,小写的"l"字符容易与数字1混淆。

    在不少论坛上,很多人都说Python的长整数类型没有尺寸大小的限制,但是通过分析C源码可知,在将十进制数转为32768进制时,32768进制的数是有位数限制的,可以从Objects/longobject.c文件的_PyLong_New函数中看出来:

/* Allocate a new long int object with size digits.
   Return NULL and set exception if we run out of memory. */

#define MAX_LONG_DIGITS \
    ((PY_SSIZE_T_MAX - offsetof(PyLongObject, ob_digit))/sizeof(digit))

PyLongObject *
_PyLong_New(Py_ssize_t size)
{
    if (size > (Py_ssize_t)MAX_LONG_DIGITS) {
        PyErr_SetString(PyExc_OverflowError,
                        "too many digits in integer");
        return NULL;
    }
    /* coverity[ampersand_in_size] */
    /* XXX(nnorwitz): PyObject_NEW_VAR / _PyObject_VAR_SIZE need to detect
       overflow */
    return PyObject_NEW_VAR(PyLongObject, &PyLong_Type, size);
}


    当32768进制的size(该进制的位数)超过MAX_LONG_DIGITS宏的限制时,就会产生"too many digits in integer"的溢出错误。

浮点数类型:

    Python浮点数对象相关的C源码,位于Objects/floatobject.c文件中,如下例所示:

[email protected]:~$ gdb -q python
...................................................
>>> a = 123.456789
Program received signal SIGINT, Interrupt.
0xb7ee09f8 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) b PyFloat_FromDouble 
Breakpoint 1 at 0x807730b: file Objects/floatobject.c, line 145.
(gdb) c
Continuing.
(这里再按个回车键, 让上面的a = 123.456789脚本得以执行,
并触发floatobject.c中设置的断点!)

Breakpoint 1, PyFloat_FromDouble (fval=123.456789) at Objects/floatobject.c:145
145	    if (free_list == NULL) {
(gdb) until 153
PyFloat_FromDouble (fval=123.456789) at Objects/floatobject.c:153
153	    op->ob_fval = fval;
(gdb) n
154	    return (PyObject *) op;
(gdb) ptype op
type = struct {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    double ob_fval;
} *
(gdb) p * (PyFloatObject *)op
$1 = {_ob_next = 0xb7ca1114, _ob_prev = 0x81c5960, ob_refcnt = 1, 
  ob_type = 0x81c2800, ob_fval = 123.456789}
...................................................
[email protected]:~$ 


    当上面 a = 123.456789 脚本执行时,会调用PyFloat_FromDouble的C函数来创建一个PyFloatObject类型的浮点数对象,该对象的ob_fval字段用于存储具体的浮点值。

复数类型:

    和Python复数相关的C源码,位于Objects/complexobject.c文件里:

[email protected]:~$ gdb -q python
....................................................
>>> a = 12 + 34j
Program received signal SIGINT, Interrupt.
0xb7ee09f8 in ___newselect_nocancel () from /lib/libc.so.6
(gdb) b PyComplex_FromCComplex 
Breakpoint 1 at 0x8161543: file Objects/complexobject.c, line 235.
(gdb) c
Continuing.
(这里按个回车键, 让上面的 a = 12 + 34j 脚本得以执行,
并触发complexobject.c中设置的断点!)

Breakpoint 1, PyComplex_FromCComplex (cval=...) at Objects/complexobject.c:235
235	    op = (PyComplexObject *) PyObject_MALLOC(sizeof(PyComplexObject));
(gdb) p cval (第一次执行该函数时,只传了imag即复数的虚部)
$1 = {real = 0, imag = 34}
(gdb) c
Continuing.

Breakpoint 1, PyComplex_FromCComplex (cval=...) at Objects/complexobject.c:235
235	    op = (PyComplexObject *) PyObject_MALLOC(sizeof(PyComplexObject));
(gdb) p cval (第二次执行该函数时,才将real实部与imag虚部都传递过来)
$2 = {real = 12, imag = 34}
(gdb) until 239
PyComplex_FromCComplex (cval=...) at Objects/complexobject.c:239
239	    op->cval = cval;
(gdb) n
240	    return (PyObject *) op;
(gdb) ptype op
type = struct {
    struct _object *_ob_next;
    struct _object *_ob_prev;
    Py_ssize_t ob_refcnt;
    struct _typeobject *ob_type;
    Py_complex cval;
} *
(gdb) p * (PyComplexObject *)op
$1 = {_ob_next = 0xb7ca1b18, _ob_prev = 0x81c5960, ob_refcnt = 1, 
  ob_type = 0x81ebd40, cval = {real = 12, imag = 34}}
....................................................
[email protected]:~$ 


    从上面的输出中,可以看到:当设置一个复数时,Python会在内部通过PyComplex_FromCComplex函数,为其分配一个PyComplexObject类型的对象,在该对象的cval字段中存储了复数的real(实部)与imag(虚部)。在Objects/complexobject.c文件里还定义了复数相关的加减乘除之类的运算:

static PyObject *
complex_add(PyObject *v, PyObject *w)
{
    Py_complex result;
    Py_complex a, b;
    TO_COMPLEX(v, a);
    TO_COMPLEX(w, b);
    PyFPE_START_PROTECT("complex_add", return 0)
    result = c_sum(a, b);
    PyFPE_END_PROTECT(result)
    return PyComplex_FromCComplex(result);
}

static PyObject *
complex_sub(PyObject *v, PyObject *w)
{
    ................................................
}

static PyObject *
complex_mul(PyObject *v, PyObject *w)
{
    ................................................
}

static PyObject *
complex_div(PyObject *v, PyObject *w)
{
    ................................................
}


    有关复数的相关概念,请参考 wiki百科 复数-维基百科 该链接对应的文章。

数字类型的转换函数:

    上面介绍的几个数字类型,可以通过Python函数来进行相互转换,如下所示:
  • int(x) :可以将参数x转换为整数类型。
  • long(x) :将参数x转换为长整数类型。
  • float(x) :将参数x转换为浮点数类型。
  • complex(x) :将参数x转换为复数,该复数的实部为x,虚部为0 。
  • complex(x, y) :将参数x与参数y转换为复数,该复数的实部为x,虚部为y 。
    下面是个简单的例子:

[email protected]:~$ gdb -q python
....................................................
>>> int(123.456)
123
>>> float(123)
123.0
>>> long(123.5)
123L
>>> complex(12, 34)
(12+34j)
....................................................
[email protected]:~$ 


    如果读者想了解这些函数在Python内部都调用了哪些C函数的话,可以查看下面的gdb调试的例子:

[email protected]:~$ gdb -q python
....................................................
>>> int(123.456)

Breakpoint 1, type_call (type=0x81c2d80, args=0xb7d45424, kwds=0x0)
    at Objects/typeobject.c:729
729	    obj = type->tp_new(type, args, kwds);
(gdb) s
int_new (type=0x81c2d80, args=0xb7d45424, kwds=0x0) at Objects/intobject.c:1067
1067	    PyObject *x = NULL;
(gdb) c
Continuing.
123
[40762 refs]
>>> float(123)

Breakpoint 1, type_call (type=0x81c2800, args=0xb7d45424, kwds=0x0)
    at Objects/typeobject.c:729
729	    obj = type->tp_new(type, args, kwds);
(gdb) s
float_new (type=0x81c2800, args=0xb7d45424, kwds=0x0)
    at Objects/floatobject.c:1804
1804	    PyObject *x = Py_False; /* Integer zero */
(gdb) c
Continuing.
123.0
[40762 refs]
>>> long(123.5)

Breakpoint 1, type_call (type=0x81c4440, args=0xb7d45424, kwds=0x0)
    at Objects/typeobject.c:729
729	    obj = type->tp_new(type, args, kwds);
(gdb) s
long_new (type=0x81c4440, args=0xb7d45424, kwds=0x0)
    at Objects/longobject.c:4005
4005	    PyObject *x = NULL;
(gdb) c
Continuing.
123L
[40763 refs]
>>> complex(12, 34)

Breakpoint 1, type_call (type=0x81ebd40, args=0xb7d3a84c, kwds=0x0)
    at Objects/typeobject.c:729
729	    obj = type->tp_new(type, args, kwds);
(gdb) s
complex_new (type=0x81ebd40, args=0xb7d3a84c, kwds=0x0)
    at Objects/complexobject.c:1134
1134	    PyNumberMethods *nbr, *nbi = NULL;
(gdb) c
Continuing.
(12+34j)
....................................................
[email protected]:~$ 


    它们都会通过type_call函数转到各自的xxxx_new函数(如complex_new之类的C函数)去进行具体的转换工作。

    限于篇幅,本章先到这里,下一篇将介绍和数学运算相关的python函数。

    OK,休息,休息一下 o(∩_∩)o~~

    最本质的人生价值就是人的独立性。

——  布迪曼

上下篇

下一篇: Python相关的数学运算函数

上一篇: Python循环语句

相关文章

Python相关的数学运算函数

Python元组类型及相关函数

Python循环语句

Python基本的I/O操作 (五)

Python的变量类型

Python基本的I/O操作 (三)