0%

Python多线程


利用Python多线程,可以加快程序的运行速度,利用计算机多核等优势,实现多线程的并行。接下来会介绍两种常见的多线程的处理方法,以及线程同步等内容。

threading模块

Python文档中,介绍的多线程方案,一般会引入threading模块来进行举例。主要有两种编写多线程程序的方法,下面分别进行介绍。

1. 继承threading模块

该方法中,需要继承threading.Thread类,并重写类中的run()方法,下面以一个累加函数来举例:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import threading

class SummingThread(threading.Thread):
def __init__(self,low,high):
super(SummingThread, self).__init__()
self.low=low
self.high=high
self.total=0

def run(self):
for i in range(self.low,self.high):
self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join() # This waits until the thread has completed
thread2.join()
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

该程序中,两个线程分别计算不同段的累加和,最后程序中等待两个线程结束,将结果相加得到最终的结果。

2. 直接创建线程

在这种方法中,可以直接创建线程。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
import threading
from random import randint
from time import sleep


def print_number(number):
# Sleeps a random 1 to 10 seconds
rand_int_var = randint(1, 10)
sleep(rand_int_var)
print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):
# Instantiates the thread
# (i) does not make a sequence, so (i,)
t = threading.Thread(target=print_number, args=(i,))
# Sticks the thread in a list so that it remains accessible
thread_list.append(t)

# Starts threads
for thread in thread_list:
thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

此多线程方案中的线程同步,需要对一些共享资源加锁,防止多线程的读写混乱。此部分可以参考Python的多线程.

注:由于Python中GIL(global interpreter lock)的存在,以上介绍的两种方法,均不是真正意义上的并行,CPU在执行的过程中,会在多个线程之间进行切换,最终从整体上来看,程序还是串行执行的。如果需要真正意义上的并行,需要采用下面介绍的第二种方法,也就是multiprocessing模块。

multiprocessing模块

本方案中,将串行的任务映射到多核上分别进行执行,实现了真正意义上的并行。

一般的写法很简单,如下所示:

1
2
3
from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(4)
results = pool.map(my_function, my_array)

上面的3行代码,是如下串行程序的并行版本:

1
2
3
results = []
for item in my_array:
results.append(my_function(item))

下面以一个更具体的程序举例,该程序中打开多个网页进行读取。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
import urllib2 
from multiprocessing.dummy import Pool as ThreadPool

urls = [
'http://www.python.org',
'http://www.python.org/about/',
'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
'http://www.python.org/doc/',
'http://www.python.org/download/',
'http://www.python.org/getit/',
'http://www.python.org/community/',
'https://wiki.python.org/moin/',
]

# make the Pool of workers
pool = ThreadPool(4)

# open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# close the pool and wait for the work to finish
pool.close()
pool.join()

程序中利用Map操作,将任务分配到了多个核上分别进行执行,

示意图

其中pool的数目根据实际的CPU进行设置,在具体使用时可以进行测试后,选取最佳的数目。

参考链接: