multiprocessing.dummy to implement multi-process and multi-threading, adding ThreadPoolExecutor and other content.
Just Add One-line
There is not much description about pros and cons between multiprocessing and Thread, because the key is that we want our code to be parallel:-D
Read this article [parallelism-in-one-line] (http://chriskiehl.com/article/parallelism-in-one-line/) to get a rough impression.
Providing an example written:
There are something to mention：
- If the
shuflleoperation is performed on the generated file list to make the file unordered,then all the time will be longer. Explain that the thread is
I/O Boundwhen it manipulates the file.
Line 22 calculates the factorial from 1-300, which is very typical of CPU consumption. In this case the time health is:
There are 403345 files normal case: 16.230872 multi-process: 4.874608 multi-thread: 21.812273
There are obvious multi-process advantages on multi=progress.
4<16<21 it is not difficult to choose.
If we comment the line 22,then situation will be:
There are 403365 files normal case: 0.182251 multi-process: 0.39652 multi-thread: 0.234787
- For security, you can add
pool.close() pool.join()after map() . The role of
Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed , the worker processes will exit.,
Wait for the worker processes to exit
The role of
Lock is for some sensitive functions or variables, ensuring that only one process/thread is running. Also because Lock is not pickable, it cannot be passed as a parameter to map() . Use global variables directly to solve.
An example is as follows:
With lock implements the context manager, which can be compared to Java’s
Synchronized keyword. During the test, it was found that if all the functions to be executed were placed under lock, the execution time of multiple processes was even worse than the sequential execution.
In the multiprocessing pool,
apply/apply_async is provided in addition to map/map_async. The main differences include:
map/map_async can accept a list containing a large number of arguments and send each element of the list to the function to be executed. Apply/apply_async only accepts one parameter like tuple, such as
map/apply is block-type, that is, the return time depends on the longest execution time and returns after all tasks have been executed. Map_async/apply_async returns an AsyncResult object when it starts executing the task, and then uses the
.get()method to get the result.
map_async/apply_async has one more parameter than map/apply: callback. Callback is a function that accepts only one parameter and is used to perform operations such as
write file/read into databaseon the result of the execution.
The two favorite ones are map and apply_async, where the output of map is the same as the order of the input, and the apply_async output is independent of the input order.
Make a simple comparison as follows:
pool.apply_async. Each task returns result the moment it is finished.
1 occurs 2 occurs 3 occurs 4 occurs calculate 1 result is 2 5 occurs calculate 2 result is 4 6 occurs calculate 3 result is 8 7 occurs calculate 4 result is 16 8 occurs calculate 5 result is 32 9 occurs calculate 6 result is 64 calculate 7 result is 128 calculate 8 result is 256 calculate 9 result is 512 Total cost time is：5.700147
pool.map. It is waiting for all results to be finished.
1 occurs 2 occurs 3 occurs 4 occurs 5 occurs 6 occurs 7 occurs 8 occurs 9 occurs // it will pause for a few seconds calculate 1 result is 2 calculate 2 result is 4 calculate 3 result is 8 calculate 4 result is 16 calculate 5 result is 32 calculate 6 result is 64 calculate 7 result is 128 calculate 8 result is 256 calculate 9 result is 512 Total cost time is: 5.706731
Regarding performance, the time spent remain contant while doing a lot of testing So performance should not be a bottleneck.
Apply_async is more suitable for occasions where you want immediate results.
A Complicated Example
The above method is sufficient for development in most cases. But if we want to have better control over multi-process/thread operations such as sharing state among different threads. we need to understand more.
I used to write an example of running with
mp.Process and using
mp.Queue as a queue, as shown below.
Thread/Process And PoolExecutor In Python3
Python3 provides a high-level abstraction。There are two ways to use:
A standard example is：
put it in the above example.The result is:
normal case: 12.71012 multi-process case: 5.799454 multi-thread: 18.085609 ThreadPoolExecutor case: 18.403806 ProcessPoolExecutor case: 5.764657
In most cases,
ProcessPoolExecutor, are less effective than
multiprocessing. The reason can be referred to processpoolexecutor-from-concurrent-futures-way-slower-than-multiprocessing-pool That is to say, the best application in PoolExecutor is to use submit() to monitor the results of instant updates instead of applying mp.
As mentioned above, the use of map for PoolExecutor is of little significance. The main discussion of submit is as follows.
It is obvious that you can see the shadow of Java. The best feature of the future feature is to return once it’s done, without having to wait for all blocking to return. It is more abstract than apply_async.
The example of the official website is slightly improved. It can be used as follows: