Python多线程

利用Python多线程，可以加快程序的运行速度，利用计算机多核等优势，实现多线程的并行。接下来会介绍两种常见的多线程的处理方法，以及线程同步等内容。

threading模块

Python文档中，介绍的多线程方案，一般会引入threading模块来进行举例。主要有两种编写多线程程序的方法，下面分别进行介绍。

1. 继承threading模块

该方法中，需要继承threading.Thread类，并重写类中的run()方法，下面以一个累加函数来举例：

import threading

class SummingThread(threading.Thread):
     def __init__(self,low,high):
         super(SummingThread, self).__init__()
         self.low=low
         self.high=high
         self.total=0

     def run(self):
         for i in range(self.low,self.high):
             self.total+=i


thread1 = SummingThread(0,500000)
thread2 = SummingThread(500000,1000000)
thread1.start() # This actually causes the thread to run
thread2.start()
thread1.join()  # This waits until the thread has completed
thread2.join()  
# At this point, both threads have completed
result = thread1.total + thread2.total
print result

该程序中，两个线程分别计算不同段的累加和，最后程序中等待两个线程结束，将结果相加得到最终的结果。

2. 直接创建线程

在这种方法中，可以直接创建线程。

import threading
from random import randint
from time import sleep


def print_number(number):
    # Sleeps a random 1 to 10 seconds
    rand_int_var = randint(1, 10)
    sleep(rand_int_var)
    print "Thread " + str(number) + " slept for " + str(rand_int_var) + " seconds"

thread_list = []

for i in range(1, 10):
    # Instantiates the thread
    # (i) does not make a sequence, so (i,)
    t = threading.Thread(target=print_number, args=(i,))
    # Sticks the thread in a list so that it remains accessible
    thread_list.append(t)

# Starts threads
for thread in thread_list:
    thread.start()

# This blocks the calling thread until the thread whose join() method is called is terminated.
# From http://docs.python.org/2/library/threading.html#thread-objects
for thread in thread_list:
    thread.join()

# Demonstrates that the main process waited for threads to complete
print "Done"

此多线程方案中的线程同步，需要对一些共享资源加锁，防止多线程的读写混乱。此部分可以参考Python的多线程.

注：由于Python中GIL（global interpreter lock）的存在，以上介绍的两种方法，均不是真正意义上的并行，CPU在执行的过程中，会在多个线程之间进行切换，最终从整体上来看，程序还是串行执行的。如果需要真正意义上的并行，需要采用下面介绍的第二种方法，也就是multiprocessing模块。

multiprocessing模块

本方案中，将串行的任务映射到多核上分别进行执行，实现了真正意义上的并行。

一般的写法很简单，如下所示：

1
2
3

from multiprocessing.dummy import Pool as ThreadPool 
pool = ThreadPool(4) 
results = pool.map(my_function, my_array)

上面的3行代码，是如下串行程序的并行版本：

1
2
3

results = []
for item in my_array:
    results.append(my_function(item))

下面以一个更具体的程序举例，该程序中打开多个网页进行读取。

import urllib2 
from multiprocessing.dummy import Pool as ThreadPool 

urls = [
  'http://www.python.org', 
  'http://www.python.org/about/',
  'http://www.onlamp.com/pub/a/python/2003/04/17/metaclasses.html',
  'http://www.python.org/doc/',
  'http://www.python.org/download/',
  'http://www.python.org/getit/',
  'http://www.python.org/community/',
  'https://wiki.python.org/moin/',
  ]

# make the Pool of workers
pool = ThreadPool(4) 

# open the urls in their own threads
# and return the results
results = pool.map(urllib2.urlopen, urls)

# close the pool and wait for the work to finish 
pool.close() 
pool.join()

程序中利用Map操作，将任务分配到了多个核上分别进行执行，

其中pool的数目根据实际的CPU进行设置，在具体使用时可以进行测试后，选取最佳的数目。

参考链接：

How to use threading in Python?