forked from jdb/jdb.github.com
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathfunctional.html
563 lines (531 loc) · 47 KB
/
functional.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>A journey with Pi, Python, a functional algorithm and multicore — bits v0.7 documentation</title>
<link rel="stylesheet" href="static/sphinxdoc.css" type="text/css" />
<link rel="stylesheet" href="static/pygments.css" type="text/css" />
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT: '',
VERSION: '0.7',
COLLAPSE_INDEX: false,
FILE_SUFFIX: '.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="static/jquery.js"></script>
<script type="text/javascript" src="static/underscore.js"></script>
<script type="text/javascript" src="static/doctools.js"></script>
<link rel="top" title="bits v0.7 documentation" href="index.html" />
<link rel="next" title="One application, multiple languages" href="i18n.html" />
<link rel="prev" title="Every bit counts" href="index.html" />
</head>
<body>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
accesskey="I">index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="i18n.html" title="One application, multiple languages"
accesskey="N">next</a> |</li>
<li class="right" >
<a href="index.html" title="Every bit counts"
accesskey="P">previous</a> |</li>
<li><a href="index.html">bits v0.7 documentation</a> »</li>
</ul>
</div>
<div class="sphinxsidebar">
<div class="sphinxsidebarwrapper">
<h3><a href="index.html">Table Of Contents</a></h3>
<ul>
<li><a class="reference internal" href="#">A journey with Pi, Python, a functional algorithm and multicore</a><ul>
<li><a class="reference internal" href="#the-math-of-the-problem">The math of the problem</a></li>
<li><a class="reference internal" href="#a-procedural-and-a-functional-algorithm">A procedural and a functional algorithm</a></li>
<li><a class="reference internal" href="#better-performance-through-lazyness">Better performance through <em>lazyness</em></a></li>
<li><a class="reference internal" href="#optimization-with-processes-and-a-compiled-c-function">Optimization with processes and a compiled C function</a></li>
<li><a class="reference internal" href="#a-fast-algorithm-to-compute-pi">A fast algorithm to compute Pi</a></li>
</ul>
</li>
</ul>
<h4>Previous topic</h4>
<p class="topless"><a href="index.html"
title="previous chapter">Every bit counts</a></p>
<h4>Next topic</h4>
<p class="topless"><a href="i18n.html"
title="next chapter">One application, multiple languages</a></p>
<h3>This Page</h3>
<ul class="this-page-menu">
<li><a href="sources/functional.txt"
rel="nofollow">Show Source</a></li>
</ul>
<div id="searchbox" style="display: none">
<h3>Quick search</h3>
<form class="search" action="search.html" method="get">
<input type="text" name="q" size="18" />
<input type="submit" value="Go" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
<p class="searchtip" style="font-size: 90%">
Enter search terms or a module, class or function name.
</p>
</div>
<script type="text/javascript">$('#searchbox').show(0);</script>
</div>
</div>
<div class="document">
<div class="documentwrapper">
<div class="bodywrapper">
<div class="body">
<div class="section" id="a-journey-with-pi-python-a-functional-algorithm-and-multicore">
<h1>A journey with Pi, Python, a functional algorithm and multicore<a class="headerlink" href="#a-journey-with-pi-python-a-functional-algorithm-and-multicore" title="Permalink to this headline">¶</a></h1>
<p>This article describes a journey with the Python programming language,
and <span class="math">\pi</span>. It begins by the comparison of the <em>procedural</em> and
<em>functional</em> styles of computer programming, illustrated with an
algorithm approximating <span class="math">\pi</span>. The math on which the
approximation relies is interesting because it only requires random
numbers and simple knowledge about circles and squares.</p>
<p>The first part presents the math of the problem, then the second part
compares the differences between the functional and procedural
styles. A quick performance and complexity wall requires us to
introduce the generator, which is a powerful Python object. The
implementation is then adapted to use efficiently, in C, the many
processors and cores available on a host. Finally, a better
mathematical method is used which is faster by several orders of
magnitude.</p>
<div class="section" id="the-math-of-the-problem">
<h2>The math of the problem<a class="headerlink" href="#the-math-of-the-problem" title="Permalink to this headline">¶</a></h2>
<ol class="arabic">
<li><p class="first">Take a square and the circle which fits into the square. A random
point of the square can be either in the circle or outside of the
circle. Now, the frequency for a random point to be part of the
circle can be calculated as <em>the ratio between the number of points
of the circle and the total number of points</em>. In math, this is
summarised as:</p>
<div class="math">
<p><span class="math">frequency = \frac{number\ of\ points\ in\ the\ circle}{total\ number\ of\ points}</span></p>
</div></li>
<li><p class="first">The <em>number of points in a shape</em> is another name for the <em>surface</em>
of the shape, that is:</p>
<div class="math">
<p><span class="math">frequency = \frac{ Circle\ surface}{Square\ surface} = \frac{\pi . radius^2 }{side^2}</span></p>
</div></li>
<li><p class="first">To make things simple, take a circle with a radius of <em>1</em>, and its
containing square with a side length of <em>2</em>, the frequency is
simply</p>
<div class="math">
<p><span class="math">frequency = \frac{\pi.1^2}{2^2} = \frac{\pi}{4}</span></p>
</div><p>This means that if you can build an experiment which gives you an
approximation of the frequency then an approximation of
<span class="math">\pi</span> is <strong>four times the frequency for a random point of the
square to be in the circle</strong>.</p>
</li>
</ol>
<p>To build such an experiment, let’s picture the square, the circle, and
two random points. The square and circle are centered on zero, the
radius for the circle is 1 and it fits into the square with a side
of 2. A random point is made of two coordinates, one for the
horizontal position and one for the vertical position.</p>
<div class="figure">
<img alt="images/square-cercle.png" src="images/square-cercle.png" style="width: 453pt; height: 226pt;" />
</div>
<p>This figure makes it clear is that the point is in the circle if,
<em>and only if,</em> the distance between the point and the center is
smaller than the radius, which means here: smaller than one. The
method to compute the distance to the center has not changed for
thousands years, it is still: <span class="math">distance = \sqrt{x^2 + y^2 }</span>,
where x represents the horizontal position and y represents the
vertical position.</p>
<div class="figure">
<img alt="images/pythagoras.png" src="images/pythagoras.png" style="width: 453pt; height: 226pt;" />
</div>
<p>To sum up the recipe for <span class="math">\pi</span>, take a million random points in
the square, count the points in the circle, divide by a million and
multiply by four. Serve with a slice of lemon and a small quantity of
salt.</p>
<div class="figure">
<img alt="images/frequency.png" src="images/frequency.png" style="width: 453pt; height: 226pt;" />
</div>
</div>
<div class="section" id="a-procedural-and-a-functional-algorithm">
<h2>A procedural and a functional algorithm<a class="headerlink" href="#a-procedural-and-a-functional-algorithm" title="Permalink to this headline">¶</a></h2>
<p>That was for the theory, let’s <em>implement</em> the recipe which means
let’s make a working example. Actually, we will make two working
examples in the <a class="reference external" href="http://www.python.org/doc/2.6.4/howto/advocacy.html">Python</a> programming language and compare the styles.</p>
<p>In either styles, we will use the functions <a class="reference external" href="http://docs.python.org/dev/library/math.html#math.sqrt" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">math.sqrt()</span></tt></a> and
<a class="reference external" href="http://docs.python.org/dev/library/random.html#random.uniform" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">random.uniform()</span></tt></a>: the former returns the <em>square root</em> of the
argument given as input, the latter returns a random decimal number
<em>uniformly</em> distributed between the values of the first and the second
arguments. Also, both scripts will take the number of points (the
sample size) as the first argument, so we will need <tt class="xref py py-attr docutils literal"><span class="pre">sys.argv</span></tt>:
it holds the command line parameters of the script</p>
<div class="highlight-python"><div class="highlight"><pre><span class="c">#!/usr/bin/env python</span>
<span class="kn">from</span> <span class="nn">random</span> <span class="kn">import</span> <span class="n">uniform</span>
<span class="kn">from</span> <span class="nn">math</span> <span class="kn">import</span> <span class="n">sqrt</span>
<span class="kn">import</span> <span class="nn">sys</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">)</span>
</pre></div>
</div>
<p>The <strong>procedural algorithm</strong> consists of: as many times as there are
points in the sample, to take a random point, then to test whether the
point is within the circle or not and when it is inside: increment a
counter by one. When the loop is finished, print the counter divided
by the sample size and multiplied by 4. Here it is, written in
<em>Python</em>:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">somme</span><span class="o">=</span><span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">sqrt</span><span class="p">(</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="p">)</span> <span class="o"><</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">somme</span><span class="o">+=</span><span class="mi">1</span>
<span class="k">print</span><span class="p">(</span><span class="s">"An approximation of Pi is : </span><span class="si">%s</span><span class="s"> "</span> <span class="o">%</span> <span class="p">(</span><span class="n">somme</span> <span class="o">*</span> <span class="mf">4.0</span> <span class="o">/</span> <span class="n">n</span> <span class="p">))</span>
</pre></div>
</div>
<p>The equivalent <strong>functional algorithm</strong> is: make a function which returns
a list of random points as big a the requested sample size. Then make
another function which tests if the input point is in the
circle. Finally, print the length of the list of points filtered by
the test function, and, as before, divide by the sample size and
multiply by four:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="n">points</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">n</span> <span class="p">:</span> <span class="p">[</span> <span class="p">(</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">]</span>
<span class="n">in_circle</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">p</span> <span class="p">:</span> <span class="n">sqrt</span><span class="p">(</span> <span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">**</span><span class="mi">2</span> <span class="p">)</span> <span class="o"><</span> <span class="mi">1</span>
<span class="k">print</span><span class="p">(</span><span class="s">"Another approximation of Pi is: </span><span class="si">%s</span><span class="s"> "</span> <span class="o">%</span>
<span class="p">(</span> <span class="nb">len</span><span class="p">(</span> <span class="nb">filter</span><span class="p">(</span> <span class="n">in_circle</span><span class="p">,</span> <span class="n">points</span><span class="p">(</span> <span class="n">n</span> <span class="p">)</span> <span class="p">)</span> <span class="p">)</span> <span class="o">*</span> <span class="mf">4.0</span> <span class="o">/</span> <span class="n">n</span> <span class="p">))</span>
</pre></div>
</div>
<p>Now if we test it in a command line, it does approximate <span class="math">\pi</span>,
it gets more precise with more points but it is rather slow:</p>
<div class="highlight-sh"><div class="highlight"><pre>~<span class="nv">$ </span>./procedural.py 1000
Pi ~ 3.112
~<span class="nv">$ </span>./procedural.py 100000
Pi ~ 3.14192
~<span class="nv">$ </span>./functional.py 500000
Pi ~ 3.140128
</pre></div>
</div>
<p>In your opinion, which style fits the job best? I would say the
procedural style is a sequence of small operations, without much
structure. The functional style better splits the problem into
simpler bits whose integration solve the problem .</p>
</div>
<div class="section" id="better-performance-through-lazyness">
<h2>Better performance through <em>lazyness</em><a class="headerlink" href="#better-performance-through-lazyness" title="Permalink to this headline">¶</a></h2>
<p>When comparing the performance of the two solutions, we hit a problem
which is an opportunity to present the Python magic called
<em>generator</em>. Let’s execute the script with 200 000, one million and
five million points in the sample:</p>
<div class="highlight-sh"><div class="highlight"><pre>~<span class="nv">$ </span><span class="nb">alias time</span><span class="o">=</span><span class="s1">'/usr/bin/time --format " duration: %e seconds"'</span>
~<span class="nv">$ </span>test_it <span class="o">()</span> <span class="o">{</span> <span class="k">for </span>i in 200000 1000000 5000000; <span class="k">do </span><span class="nb">time</span> <span class="nv">$1</span> <span class="nv">$i</span> ; <span class="k">done</span> ; <span class="o">}</span>
~<span class="nv">$ </span>test_it ./procedural.py
Pi ~ 3.13974
duration: 0.56 seconds
Pi ~ 3.141572
duration: 2.19 seconds
Pi ~ 3.1412144
duration: 10.97 seconds
~<span class="nv">$ </span>test_it ./functional.py
Pi ~ 3.13992
duration: 0.61 seconds
Pi ~ 3.141356
duration: 3.39 seconds
Pi ~ 3.1416272
duration: 32.71 seconds
~<span class="nv">$ </span><span class="c"># Do not hesitate to send the stop signal if it takes too long</span>
~<span class="nv">$ </span><span class="c"># on your computer: Ctrl-C or Ctrl-Z</span>
</pre></div>
</div>
<p>Mmh, the functional version takes longer and it does not scale. The
problem stems from the fact that <tt class="xref py py-func docutils literal"><span class="pre">points()</span></tt> and <a class="reference external" href="http://docs.python.org/dev/library/functions.html#filter" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">filter()</span></tt></a>
make up lists of several million elements, all stored in the laptop
memory where the script was tested, which is too small to handle them
all efficiently. It is no use to store them all, in this problem, we
only need one at a time, as the procedural algorithm does.</p>
<p>A solution to avoid the waste of memory is to use is a <a class="reference external" href="http://docs.python.org/reference/expressions.html#yieldexpr">generator</a> , it
is a kind of Python magic which behaves like a list, but which
<em>generates</em> the element of the list on the fly when they are requested
by the function which manipulates the generator. They are not stored,
it is <em>on demand</em>. This technique is also called <em>lazy evaluation</em>.
This is the goal of the the <tt class="xref py py-meth docutils literal"><span class="pre">yield()</span></tt> Python statement, if there
is a need to create its own custom generator.</p>
<p>The <tt class="xref py py-func docutils literal"><span class="pre">points()</span></tt> function is slightly modified: this expression,
which returns a list</p>
<div class="highlight-python"><div class="highlight"><pre><span class="p">[</span> <span class="p">(</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span> <span class="n">size</span> <span class="p">)</span> <span class="p">]</span>
</pre></div>
</div>
<p>is substituted by this expression, which returns a generator:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="p">(</span> <span class="p">(</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span> <span class="n">size</span> <span class="p">)</span> <span class="p">)</span>
</pre></div>
</div>
<p>The <a class="reference external" href="http://docs.python.org/dev/library/functions.html#filter" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">filter()</span></tt></a> function is substituted by its
<em>generator-returning</em> counterpart <a class="reference external" href="http://docs.python.org/dev/library/itertools.html#itertools.ifilter" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">ifilter()</span></tt></a> in the
<a class="reference external" href="http://docs.python.org/dev/library/itertools.html#module-itertools" title="(in Python v2.7)"><tt class="xref py py-mod docutils literal"><span class="pre">itertools</span></tt></a> module. One last change: a generator has no length,
so <a class="reference external" href="http://docs.python.org/dev/library/functions.html#len" title="(in Python v2.7)"><tt class="xref py py-func docutils literal"><span class="pre">len()</span></tt></a> is substituted by a trick: sum a list of <em>ones</em> for
each point in the circle:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">itertools</span> <span class="kn">import</span> <span class="n">ifilter</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">)</span>
<span class="n">points</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">n</span> <span class="p">:</span> <span class="p">(</span> <span class="p">(</span><span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">),</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">))</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">)</span>
<span class="n">in_circle</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">p</span> <span class="p">:</span> <span class="n">sqrt</span><span class="p">(</span> <span class="n">p</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">p</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">**</span><span class="mi">2</span> <span class="p">)</span> <span class="o"><</span> <span class="mi">1</span>
<span class="k">print</span><span class="p">(</span><span class="s">"A slightly faster implementation o Pi: </span><span class="si">%s</span><span class="s"> "</span> <span class="o">%</span>
<span class="p">(</span> <span class="nb">sum</span><span class="p">(</span> <span class="mi">1</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="n">ifilter</span><span class="p">(</span> <span class="n">in_circle</span><span class="p">,</span> <span class="n">points</span><span class="p">(</span> <span class="n">n</span> <span class="p">)</span> <span class="p">)</span> <span class="p">)</span> <span class="o">*</span> <span class="mf">4.0</span> <span class="o">/</span> <span class="n">n</span> <span class="p">))</span>
</pre></div>
</div>
<p>The <tt class="xref py py-func docutils literal"><span class="pre">test_it()</span></tt> function shows that <em>lazy</em> functional
implementation operates with a performance boost of 14%, 25% and 55%
over the previous functional implementation:</p>
<div class="highlight-sh"><div class="highlight"><pre>~<span class="nv">$ </span>test_it ./harder_better_stronger_faster.py
Pi ~ 3.13988
duration: 0.54 seconds
Pi ~ 3.143804
duration: 2.62 seconds
Pi ~ 3.141496
duration: 13.10 seconds
</pre></div>
</div>
<p>At this point, the two styles are technically roughly equivalent, the
functional style is 10% slower than the procedural counterpart. Maybe,
those 10% are the small efficiencies that Knuth was telling us about:
“we should forget about small inefficiencies, say about 97% of the
time: premature optimization is the root of all evil”.</p>
<p>Let’s try to do exactly that: let’s optimize the implementation of the
same algorithm.</p>
</div>
<div class="section" id="optimization-with-processes-and-a-compiled-c-function">
<h2>Optimization with processes and a compiled C function<a class="headerlink" href="#optimization-with-processes-and-a-compiled-c-function" title="Permalink to this headline">¶</a></h2>
<p>It is interesting to note that all the previous scripts run on only
one core even if the host features many processors and cores. Python
makes it easy with the <a class="reference external" href="http://docs.python.org/dev/library/multiprocessing.html#module-multiprocessing" title="(in Python v2.7)"><tt class="xref py py-mod docutils literal"><span class="pre">multiprocessing</span></tt></a> module to run functions
into their own separate system process which are dispatched by the
kernel on the available processors and cores.</p>
<p>In the following version of the script, four processes will be run,
each handling a fourth of the requested iterations. The
<a class="reference external" href="http://docs.python.org/dev/library/multiprocessing.html#module-multiprocessing" title="(in Python v2.7)"><tt class="xref py py-mod docutils literal"><span class="pre">multiprocessing</span></tt></a> make the <tt class="xref py py-class docutils literal"><span class="pre">Queue</span></tt> available which is
reachable by each processes and safe for concurrent read and write
access.</p>
<p><em>processpool(function, args)</em> takes a function and its parameters as
an input and returns the list of results obtained by running the
function, in four separate subprocesses.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">multiprocessing</span> <span class="kn">import</span> <span class="n">Process</span><span class="p">,</span> <span class="n">Queue</span>
<span class="n">n</span> <span class="o">=</span> <span class="nb">int</span><span class="p">(</span> <span class="n">sys</span><span class="o">.</span><span class="n">argv</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="p">)</span>
<span class="n">numproc</span> <span class="o">=</span> <span class="mi">4</span>
<span class="k">def</span> <span class="nf">pi</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="n">somme</span><span class="o">=</span><span class="mi">0</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">n</span><span class="p">):</span>
<span class="k">if</span> <span class="n">sqrt</span><span class="p">(</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="o">+</span> <span class="n">uniform</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span><span class="mi">1</span><span class="p">)</span><span class="o">**</span><span class="mi">2</span> <span class="p">)</span> <span class="o"><</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">somme</span><span class="o">+=</span><span class="mi">1</span>
<span class="k">return</span> <span class="mi">4</span><span class="o">*</span><span class="nb">float</span><span class="p">(</span><span class="n">somme</span><span class="p">)</span><span class="o">/</span><span class="n">n</span>
<span class="k">def</span> <span class="nf">processpool</span><span class="p">(</span><span class="n">func</span><span class="p">,</span> <span class="o">*</span><span class="n">args</span><span class="p">):</span>
<span class="n">q</span> <span class="o">=</span> <span class="n">Queue</span><span class="p">()</span>
<span class="n">processes</span> <span class="o">=</span> <span class="p">[</span> <span class="n">Process</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="k">lambda</span> <span class="n">_</span><span class="p">:</span><span class="n">q</span><span class="o">.</span><span class="n">put</span><span class="p">(</span><span class="n">func</span><span class="p">(</span><span class="n">_</span><span class="p">)),</span><span class="n">args</span><span class="o">=</span><span class="n">args</span><span class="p">)</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">numproc</span><span class="p">)</span> <span class="p">]</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">processes</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="k">for</span> <span class="n">p</span> <span class="ow">in</span> <span class="n">processes</span><span class="p">:</span> <span class="n">p</span><span class="o">.</span><span class="n">join</span><span class="p">()</span>
<span class="k">return</span> <span class="p">[</span> <span class="n">q</span><span class="o">.</span><span class="n">get</span><span class="p">()</span> <span class="k">for</span> <span class="n">_</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="n">q</span><span class="o">.</span><span class="n">qsize</span><span class="p">())]</span>
<span class="n">subprocess_results</span> <span class="o">=</span> <span class="n">processpool</span><span class="p">(</span><span class="n">pi</span><span class="p">,</span><span class="n">n</span><span class="o">/</span><span class="n">numproc</span><span class="p">)</span>
<span class="k">print</span> <span class="s">"An approximation of Pi with 4 processes: </span><span class="si">%s</span><span class="s">"</span> <span class="o">%</span><span class="p">(</span>
<span class="nb">sum</span><span class="p">(</span><span class="n">subprocess_results</span><span class="p">)</span><span class="o">/</span><span class="nb">len</span><span class="p">(</span><span class="n">subprocess_results</span><span class="p">))</span>
</pre></div>
</div>
<p>Let’s time this version:</p>
<div class="highlight-sh"><div class="highlight"><pre>~/github/functional<span class="nv">$ </span>test_it ./procedural_with_processes.py
An approximation of Pi with 4 processes: 3.14208
duration: 0.28 seconds
An approximation of Pi with 4 processes: 3.142696
duration: 1.14 seconds
An approximation of Pi with 4 processes: 3.1417456
duration: 5.43 seconds
</pre></div>
</div>
<p>The durations are exactly halved when compared to <em>procedural.py</em>, the
two cores of the Intel Core 2 Duo, on which this article is edited,
were effectively used.</p>
<p>Python code is transformed on execution into byte compiled code, which
is composed of commands interpreted by the Python virtual machine.
The Python virtual machine is a compiled software written in C which
directly talks to the processor. For some demanding computing uses,
such as this approximation of <span class="math">\pi</span>, this interpretation is
suboptimal .</p>
<p>The identification of the command line argument, the result printing
and the split into processes are only done once so their impact on
performance are negligible, it is very practical to do it in
Python. In our example, the hard work in this script is the <em>pi</em>
function, which could not be simpler in terms of signature: it
requires an int, returns a float, raises no errors, and makes no side
effects. We can write the pi function in C so that, when compiled, it
is directly understood by the processor, sidestepping the Python
virtual machine. Here are the steps involved:</p>
<ol class="arabic">
<li><p class="first">a C function called pi is written in a file called pimodule.c:</p>
<div class="highlight-c"><div class="highlight"><pre><span class="kt">float</span> <span class="n">pi</span><span class="p">(</span><span class="kt">int</span> <span class="n">n</span><span class="p">){</span>
<span class="kt">double</span> <span class="n">i</span><span class="p">,</span><span class="n">x</span><span class="p">,</span><span class="n">y</span><span class="p">,</span><span class="n">sum</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span>
<span class="n">srand</span><span class="p">(</span><span class="n">rdtsc</span><span class="p">());</span>
<span class="k">for</span><span class="p">(</span><span class="n">i</span><span class="o">=</span><span class="mi">0</span><span class="p">;</span><span class="n">i</span><span class="o"><</span><span class="n">n</span><span class="p">;</span><span class="n">i</span><span class="o">++</span><span class="p">){</span>
<span class="n">x</span><span class="o">=</span><span class="n">rand</span><span class="p">();</span> <span class="n">y</span><span class="o">=</span><span class="n">rand</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">x</span><span class="o">*</span><span class="n">x</span><span class="o">+</span><span class="n">y</span><span class="o">*</span><span class="n">y</span><span class="o"><</span><span class="p">(</span><span class="kt">double</span><span class="p">)</span><span class="n">RAND_MAX</span><span class="o">*</span><span class="n">RAND_MAX</span><span class="p">)</span>
<span class="n">sum</span><span class="o">++</span><span class="p">;</span> <span class="p">}</span>
<span class="k">return</span> <span class="mi">4</span><span class="o">*</span><span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">sum</span><span class="o">/</span><span class="p">(</span><span class="kt">float</span><span class="p">)</span><span class="n">n</span><span class="p">;</span> <span class="p">}</span>
<span class="kt">int</span> <span class="n">rdtsc</span><span class="p">(){</span> <span class="n">__asm__</span> <span class="n">__volatile__</span><span class="p">(</span><span class="s">"rdtsc"</span><span class="p">);</span> <span class="p">}</span>
</pre></div>
</div>
</li>
<li><p class="first">A wrapper function for the pi function, which matches Python
interfaces, is defined. This wrapper receives the arguments in the
form of Python objects that it transforms to input argument for the
C function in the correct format: here, a simple int. The wrapper
also builds an Python float object from the C float approximation
of Pi returned by the <em>pi</em> C function:</p>
<div class="highlight-c"><div class="highlight"><pre><span class="k">static</span> <span class="n">PyObject</span> <span class="o">*</span> <span class="nf">pi_pi</span><span class="p">(</span><span class="n">PyObject</span> <span class="o">*</span><span class="n">self</span><span class="p">,</span> <span class="n">PyObject</span> <span class="o">*</span><span class="n">args</span><span class="p">)</span> <span class="p">{</span>
<span class="k">const</span> <span class="kt">int</span> <span class="n">n</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">PyArg_ParseTuple</span><span class="p">(</span><span class="n">args</span><span class="p">,</span> <span class="s">"i"</span><span class="p">,</span> <span class="o">&</span><span class="n">n</span><span class="p">))</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="k">return</span> <span class="n">Py_BuildValue</span><span class="p">(</span><span class="s">"f"</span><span class="p">,</span> <span class="n">pi</span><span class="p">(</span><span class="n">n</span><span class="p">));</span> <span class="p">}</span>
</pre></div>
</div>
<p>The non standard type such as <em>PyObject</em> are available through the
inclusion of the <tt class="docutils literal"><span class="pre"><python2.6/Python.h></span></tt> headers.</p>
</li>
<li><p class="first">The methods exported to Python are declared in an array of <em>PyMethodDef</em>:</p>
<div class="highlight-c"><div class="highlight"><pre><span class="k">static</span> <span class="n">PyMethodDef</span> <span class="n">PiMethods</span><span class="p">[]</span> <span class="o">=</span> <span class="p">{</span>
<span class="p">{</span><span class="s">"pi"</span><span class="p">,</span> <span class="n">pi_pi</span><span class="p">,</span> <span class="n">METH_VARARGS</span><span class="p">,</span> <span class="s">"Simple Pi Approximation"</span><span class="p">},</span>
<span class="p">{</span><span class="nb">NULL</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">}</span> <span class="cm">/* Sentinel */</span> <span class="p">};</span>
</pre></div>
</div>
<p>For each method described in the array, the first element is the
method name in Python, the second element is the pointer to the C
function, the third argument specifies if the Python method should
accept variable arguments and keyword arguments. Here no keywords
arguments are possible, only variable arguments. The last element
is the docstring for the method.</p>
</li>
<li><p class="first">An initialization function is written for the package. This
function will be executed when the module is imported in the Python
interpreter:</p>
<div class="highlight-c"><div class="highlight"><pre><span class="n">PyMODINIT_FUNC</span> <span class="nf">initpi</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">Py_InitModule</span><span class="p">(</span><span class="s">"pi"</span><span class="p">,</span> <span class="n">PiMethods</span><span class="p">);</span> <span class="p">}</span>
</pre></div>
</div>
<p>That’s it for the pimodule.c. It is ready for compilation.</p>
</li>
<li><p class="first">A separate <tt class="docutils literal"><span class="pre">setup.py</span></tt> file specifies the compilation and
deployment of this C file into a package available in the Python
path:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="kn">from</span> <span class="nn">distutils.core</span> <span class="kn">import</span> <span class="n">setup</span><span class="p">,</span> <span class="n">Extension</span>
<span class="n">mod</span> <span class="o">=</span> <span class="n">Extension</span><span class="p">(</span><span class="s">'pi'</span><span class="p">,</span> <span class="n">sources</span> <span class="o">=</span> <span class="p">[</span><span class="s">'pimodule.c'</span><span class="p">])</span>
<span class="n">setup</span> <span class="p">(</span><span class="n">name</span> <span class="o">=</span> <span class="s">'pi'</span><span class="p">,</span>
<span class="n">version</span> <span class="o">=</span> <span class="s">'0.1'</span><span class="p">,</span>
<span class="n">ext_modules</span> <span class="o">=</span> <span class="p">[</span><span class="n">mod</span><span class="p">],</span>
<span class="n">description</span> <span class="o">=</span> <span class="s">'This is simple method for approximating Pi'</span><span class="p">)</span>
</pre></div>
</div>
</li>
<li><p class="first">Building and installing the module is straightforward:</p>
<div class="highlight-sh"><div class="highlight"><pre>~<span class="nv">$ </span>python setup.py build
~<span class="nv">$ </span>sudo python setup.py install
~<span class="nv">$ </span>python
>>> import pi
>>> <span class="nb">help</span><span class="o">(</span>pi<span class="o">)</span>
<span class="o">[</span> ... <span class="o">]</span>
pi<span class="o">(</span>...<span class="o">)</span>
Simple Pi Approximation
>>> pi.pi<span class="o">(</span>1000<span class="o">)</span>
3.2040000
</pre></div>
</div>
</li>
</ol>
<p>The previous script can be copied and pasted with only a few
modifications: instead of declaring the <em>pi</em> function, it now needs to
be be imported from the <em>pi</em> module:</p>
<div class="highlight-python"><div class="highlight"><pre><span class="o">...</span>
<span class="kn">from</span> <span class="nn">pi</span> <span class="kn">import</span> <span class="n">pi</span>
<span class="n">subprocess_results</span> <span class="o">=</span> <span class="n">processpool</span><span class="p">(</span><span class="n">pi</span><span class="p">,</span><span class="n">n</span><span class="o">/</span><span class="n">numproc</span><span class="p">)</span>
<span class="o">...</span>
</pre></div>
</div>
<p>This multiprocess C version shows the following performance:</p>
<div class="highlight-sh"><div class="highlight"><pre>~<span class="nv">$ </span>test_it ./procedural_in_c.py
An approximation of Pi with 4 processes: 3.14168
duration: 0.09 seconds
An approximation of Pi with 4 processes: 3.143028
duration: 0.07 seconds
An approximation of Pi with 4 processes: 3.1415872
duration: 0.18 seconds
</pre></div>
</div>
<p>The computation is accelerated by a factor up to 30 over the previous
version is pure Python. This is not fantastic but it is indeed much
more efficient. When compared to Python, C does shine in the
processing of numbers.</p>
</div>
<div class="section" id="a-fast-algorithm-to-compute-pi">
<h2>A fast algorithm to compute Pi<a class="headerlink" href="#a-fast-algorithm-to-compute-pi" title="Permalink to this headline">¶</a></h2>
<p>If the problem really was computing <span class="math">\pi</span>, its <a class="reference external" href="http://en.wikipedia.org/wiki/Pi#Numerical_approximations">Wikipedia page</a>
has several better methods. Here is a seriously fast approximation
known as the Bailey-Borwein-Plouffe formula :</p>
<div class="math">
<p><span class="math">\pi \approx \sum_{k=0}^{\infty} \frac{1}{16^k} \left(\frac{4}{8k+1}-\frac{2}{8k+4}-\frac{1}{8k+5}-\frac{1}{8k+6}\right)</span></p>
</div><p>This translates into the <em>bbp</em> function, below, in Python, which is
correct for the first 13 digits in only ten iterations while previous
methods needed millions of iterations for the same results.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="n">bbp</span> <span class="o">=</span> <span class="k">lambda</span> <span class="n">n</span><span class="p">:</span> <span class="nb">sum</span><span class="p">(</span> <span class="mf">1.</span><span class="o">/</span><span class="p">(</span><span class="mi">16</span><span class="o">**</span><span class="n">k</span><span class="p">)</span><span class="o">*</span><span class="p">(</span><span class="mf">4.</span><span class="o">/</span><span class="p">(</span><span class="mi">8</span><span class="o">*</span><span class="n">k</span><span class="o">+</span><span class="mi">1</span><span class="p">)</span><span class="o">-</span><span class="mf">2.</span><span class="o">/</span><span class="p">(</span><span class="mi">8</span><span class="o">*</span><span class="n">k</span><span class="o">+</span><span class="mi">4</span><span class="p">)</span><span class="o">-</span><span class="mf">1.</span><span class="o">/</span><span class="p">(</span><span class="mi">8</span><span class="o">*</span><span class="n">k</span><span class="o">+</span><span class="mi">5</span><span class="p">)</span><span class="o">-</span><span class="mf">1.</span><span class="o">/</span><span class="p">(</span><span class="mi">8</span><span class="o">*</span><span class="n">k</span><span class="o">+</span><span class="mi">6</span><span class="p">))</span>
<span class="gp">... </span> <span class="k">for</span> <span class="n">k</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="n">n</span><span class="p">)</span> <span class="p">)</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="n">bbp</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="go">3.14142246642</span>
<span class="gp">>>> </span><span class="k">print</span> <span class="n">bbp</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="go">3.14159265359</span>
<span class="gp">>>> </span><span class="kn">import</span> <span class="nn">math</span>
<span class="gp">>>> </span><span class="n">bbp</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span> <span class="o">-</span> <span class="n">math</span><span class="o">.</span><span class="n">pi</span> <span class="o"><</span> <span class="mi">10</span><span class="o">**</span><span class="p">(</span><span class="o">-</span><span class="mi">14</span><span class="p">)</span>
<span class="go">True</span>
</pre></div>
</div>
<p>Timer is a class which take a callable as the argument. The method
<em>timeit</em> will return the duration in seconds for <em>one million</em>
executions of the callable.</p>
<div class="highlight-python"><div class="highlight"><pre><span class="gp">>>> </span><span class="kn">from</span> <span class="nn">timeit</span> <span class="kn">import</span> <span class="n">Timer</span>
<span class="gp">>>> </span><span class="n">Timer</span><span class="p">(</span><span class="k">lambda</span><span class="p">:</span><span class="n">bbp</span><span class="p">(</span><span class="mi">10</span><span class="p">))</span><span class="o">.</span><span class="n">timeit</span><span class="p">()</span><span class="o"><</span><span class="mi">20</span>
<span class="go">True</span>
</pre></div>
</div>
<p>Obtaining the 13 correct digits of <span class="math">\pi</span> can be done in less
than 20 microseconds in pure Python with a good algorithm. And now, on
to something fantastic: <em>On December 31st, 2009, about 2700 billion
decimal digits of Pi were computed using a single desktop
computer. This is presently the World Record for the computation of
Pi.</em> Kudos to <a class="reference external" href="http://bellard.org/pi/pi2700e9/">Fabrice Bellard</a>. He combined a fast math method, with
hardware optimization at the processor level.</p>
<p><em>19 Apr 2010</em></p>
</div>
</div>
</div>
</div>
</div>
<div class="clearer"></div>
</div>
<div class="related">
<h3>Navigation</h3>
<ul>
<li class="right" style="margin-right: 10px">
<a href="genindex.html" title="General Index"
>index</a></li>
<li class="right" >
<a href="py-modindex.html" title="Python Module Index"
>modules</a> |</li>
<li class="right" >
<a href="i18n.html" title="One application, multiple languages"
>next</a> |</li>
<li class="right" >
<a href="index.html" title="Every bit counts"
>previous</a> |</li>
<li><a href="index.html">bits v0.7 documentation</a> »</li>
</ul>
</div>
<div class="footer">
© Copyright 2009, Jean Daniel Browne.
Created using <a href="http://sphinx.pocoo.org/">Sphinx</a> 1.0.
</div>
</body>
</html>