Life About Programming
2020-04-20T00:52:17+00:00
http://airekans.github.io
Yaolong Huang
airekans@gmail.com
浅谈分布式一致性
2016-10-26T00:00:00+00:00
http://airekans.github.io/cloud-computing/2016/10/26/intro-to-distributed-consensus
<p>之前一段时间仔细学习了一些分布式一致性算法(Distributed Consensus Algorithm),并尝试实现了一下,所以打算在这篇文章简略的介绍一下分布式算法要解决的问题、使用场景(以及不需要用的场景)、基本原理以及目前用的比较多的一些算法。</p>
<h1 id="为什么要使用分布式一致性算法">为什么要使用分布式一致性算法</h1>
<p>在了解分布式一致性算法之前,我们需要搞清楚为什么我们需要它,以及在情况下我们并不需要它。</p>
<p>对于分布式系统,想要做到高可用(High Availability,简称HA),一般想到的做法就是加机器,使得其中一台机器down掉也不影响服务。</p>
<p>在分布式系统中,一般我们会分成两类:逻辑服务和数据服务。逻辑服务指的是服务本身只处理一些业务相关的逻辑,本身并不存储数据,一般情况下业务数据都会从别的数据服务里面提取,然后处理完再存回数据服务。而数据服务是说服务本身最重要的任务就是存取数据,业务逻辑本身并不在这里实现。</p>
<p>对于逻辑服务,由于其并不存储数据,所以我们可以认为它们是无状态的,也就是说只要数据一样,逻辑在具体那个机器上面跑其实并不十分重要(当然有一些例外)。所以对于逻辑服务,水分拓展的方式非常简单,就是直接的加机器。如下图所示:</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/19673122/851d117e-9aaf-11e6-897e-5e0ceb198ba9.png" alt="logic server" /></p>
<p>如果其中一台down掉,就切换到别的机器上:</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/19673159/e6b0f478-9aaf-11e6-86eb-e0d3aa6e1b99.png" alt="logic server switch" /></p>
<p>对于数据服务,想要简单的加机器并不可行。这里说的加机器,有两种方式,一种是sharding,另一种是replicate。Sharding是指通过把数据按某种方式拆分成几个区间,然后每个机器存储一个区间。Sharding方式相当于牺牲了其中某部分数据的availability(也就是当存储这个shard的机器down掉),使得剩余数据能够做到HA,但实际上sharding并没办法做到真正的HA。</p>
<p>而replicate就是说把所有数据都复制一份,并存储在另几台机器上,如果其中一台机器down了,就由另外机器顶上。但实际上,数据服务要做到真正意义上的replicate,并没有大家想象中那么简单。要用replicate,首先就要求所有replica之间都拥有相同的数据,否则数据就会出现不一致。那么如何在基本所有情况下都能保证数据一致性呢?这里就需要用到分布式一致性算法来保证了。</p>
<p>这里说句题外话,对于某些系统来说,可能会有一个master,由master来负责协调系统内部的各种逻辑。这种系统说不上是很纯粹的数据服务,因为slave并不会存储数据,但master的确会存一些slave的状态。这种系统不能同时由多个Master来做HA,因为这样会增加系统复杂度。在这种场景下要做到HA,就可以考虑用多个备master来做,同时只有一个master。由于并没有数据存储,所以master之间并不需要用一致性协议,只需要同一时间有一个master就可以了。这个听起来就用某种分布式锁来做就可以,而实际上可靠的分布式锁只需要用HA的一致数据服务就可以做到,比如<a href="http://zookeeper.apache.org/doc/trunk/recipes.html#sc_leaderElection">Zookeeper上面实现的锁</a>(如下图所示)。</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/19713751/17df977c-9b7b-11e6-81d5-f290778739e1.png" alt="distributed-lock" /></p>
<p>下面我们会从简单的单机开始,谈谈如何做到数据的HA。</p>
<h1 id="单机系统到replicate">单机系统到replicate</h1>
<p>我们都知道,单机系统是最简单的,也是很容易做到一致的。因为只有一台机,所以只要系统本身是数据一致就可以。但是单机的问题也很突出,就是随着访问量的上升,系统容易成为瓶颈,而且机器down掉,整个系统就不可用了。</p>
<p>在这种情况下,最简单的想法就是做replicate,也就是之前所说的多台机器同时存储同样的数据。当其中一台机器down掉,就用别的机器提供服务。</p>
<p>这种思路在只读的情况下是可行的,可是如果要考虑写的情况,就会出问题。如果要保证数据一致性,我们必须保证每次写都能写到所有replica上面。(当然也有不写所有机器的方案,不过在这种情况会有冲突的出现,也就需要冲突解决的方案。)如下图所示:</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/19626203/bc58f98e-995e-11e6-95bb-40c395f48de8.png" alt="write-multiple-replicas" /></p>
<p>为了保证机器A和B上面的数据是一致的,我们需要保证客户端的写需要同时保证在A和B上面都成功。可是加入其中一台机器down掉,那么写就没有办法保证都成功,那么系统接下来到底能不能继续提供服务呢?很明显要保证数据一致,那么就不能提供服务。如果不能提供服务,那根本就做不到HA,那多加的机器就是没用的。</p>
<p>所以,简单的replicate并不能让数据服务做到HA。</p>
<h1 id="二段提交">二段提交</h1>
<p>为了解决在replicate的时候其中某台机器挂掉,导致写操作可能导致的数据不一致,我们可以用<a href="https://en.wikipedia.org/wiki/Two-phase_commit_protocol">二段提交</a>。</p>
<p>二段提交的思路是,由于写操作要同时作用到所有机器上,而如果在写操作发出后,其中某些机器down了,导致写操作失败,就出现了不一致,那么我们就分两步来进行写操作:</p>
<ol>
<li>Ready请求,发到所有的机器上,如果该机器都能够接受该写请求,则返回成功;否则就返回失败。</li>
<li>如果Ready阶段所有机器都回复成功,则再把接下来的写请求发到所有机器;否则,终止该写请求。</li>
</ol>
<p>关于二段提交协议的详细过程,可以看<a href="https://exploredatabase.blogspot.jp/2014/07/two-phase-commit-protocol-in-pictures.html">这里的图示</a>。</p>
<p>初看上去二段提交似乎能够解决上面说到的机器down掉出现的数据不一致问题,但是如果机器down掉是在返回ready成功之后呢?那我们就又回到了没有二段提交的黑暗时代了。</p>
<p>所以二段提交实际上只是将数据不一致的问题稍微缓解,而并没有完全解决。那么二段提交的加强版——<a href="https://en.wikipedia.org/wiki/Three-phase_commit_protocol">三段提交</a>呢?情况其实也是差不多,三段提交也只是缓解问题,并没有完全解决.</p>
<p>那难道分布式数据一致性就没有办法解决了吗?难道我们就没有银弹了吗?幸好,计算机科学家为我们想到了解决方案——Paxos和RAFT这两个分布式一致性算法。</p>
<h1 id="分布式一致性算法">分布式一致性算法</h1>
<p>分布式一致性算法就是为了解决在多机提供数据服务的时候能够保证一致性的同时也在一定程度上提供HA的算法。目前在业界应用的最为广泛的两个一致性算法分别是<a href="https://en.wikipedia.org/wiki/Paxos_%28computer_science%29">Paxos</a>和<a href="https://raft.github.io/">RAFT</a>。这里并不会详细介绍这两个算法的内容,只会大概的介绍一下这两个算法的核心思想。</p>
<p>在Paxos和RAFT里面,有一个核心概念,就叫quorum(也可以称之为majority,也就是大多数)。如果一个系统里面有N台机器的话,那么其中N/2 + 1台机器就组成了quorum。一般实践中N会是奇数,比如5,所以在这种系统里面quorum就是3。Paxos和RAFT都保证了只要集群中还有大于等于quorum数量的机器存在,并且这些机器之间不存在partition(也就是相互之间网络是连通的),那么系统就仍然能够继续提供服务。也就是说如果N为5,那么系统能够容忍2台以内的机器down掉。</p>
<p>如果想要直观感受一下quorum是如何工作的,可以去<a href="https://raft.github.io/">这里</a>看一下动画演示,其中演示了RAFT算法的主要流程。</p>
<p>Paxos由计算机科学家Leslie Lamport提出,是最早提出的分布式一致性算法。Paxos算法本身只是对如何对一个值在系统中达成一致性进行描述,如果需要真正应用于真实系统,需要利用Multi-Paxos。而目前比较成熟的Multi-Paxos实现比较少,可能由微信开源的<a href="https://github.com/tencent-wechat/phxpaxos">PhxPaxos</a>,据说已经用在了微信的生产环境上。</p>
<p>RAFT则是由Stanford的教授在2013年提出,目标就是能够提供分布式一致性的前提下,提高算法的可读性,使得实现难度下降。相对于Paxos,目前业界用RAFT的系统慢慢在增加,其中的原因主要是RAFT实现难度比Multi-Paxos要低。</p>
<p>目前业界用到比较多的提供分布式一致性的系统有:</p>
<ol>
<li><a href="https://zookeeper.apache.org/">Zookeeper</a>:Java实现,Yahoo借鉴Google的Chubby思路实现的一致性KV系统。实际上并不是用的Paxos,而是用的一个类似的Zab协议,并没有从算法上证明分布式一致性,但是久经考验,直接使用并没有大问题。</li>
<li><a href="https://coreos.com/etcd/">etcd</a>:Go实现,CoreOS开源的利用RAFT协议的一致性KV系统,已经用在许多由Go实现的分布式系统中,最出名的有Kubernetes。</li>
</ol>
<h1 id="总结">总结</h1>
<p>在系统设计的时候,我们要看看系统是否需要做到分布式一致性,如果本身根本就不是数据系统,直接上Zookeeper来做选主就可以了。如果是数据系统,就要看看是否要做到上面说到这种一致性,分布式一致性并不是没有代价的,他会给写操作带来1个RTT的延迟,如果对于写操作延迟敏感的场景,不一定适用。根据具体场景选择合适的方案才是最好的。</p>
liburcu,一个用户态的RCU实现
2016-05-10T00:00:00+00:00
http://airekans.github.io/c/2016/05/10/dive-into-liburcu
<p>在上一篇<a href="http://airekans.github.io/c/2016/04/23/rcu-intro">RCU的介绍</a>里面,我们基本了解了RCU是如何实现Reader无锁的。
而由于RCU最开始是从Linux kernel里面实现的,kernel里面的实现非常依赖于整个内核的运行机制(比如Scheduler,软中断等),所以要把它port出来在用户态使用的话,难度并不小。
所幸目前已经有个开源的Userspace RCU实现——<a href="http://liburcu.org/">liburcu</a>,不单只实现了RCU算法,而且有几种实现方案,从侵入式的到非侵入式的。而且这个库已经在比较多的项目中用到,比如比较出名的<a href="http://lttng.org/">LTTng</a>。</p>
<p>liburcu提供了以下几种RCU实现:</p>
<ol>
<li>rcu-qsbr:性能最好的RCU实现,可以做到reader 0 zerohead,但是需要改动代码,侵入式。</li>
<li>rcu-signal:性能仅次于qsbr的实现,不需要改动代码,代价是需要牺牲一个signal给urcu实现。</li>
<li>rcu-generic:性能最差的rcu实现(但也比mutex强多了),不需要改动代码,可以作为初始的第一选择。</li>
</ol>
<p>本文会详细剖析qsbr(quiescent-state-based RCU)的实现。本文的代码来自liburcu,所以代码的协议和liburcu保持一致,使用LGPL协议。</p>
<h1 id="一个例子">一个例子</h1>
<p>假设我们有下面这个需求:</p>
<blockquote>
<p>一个全局的gs_foo指针指向一个结构体,有几个读线程不断的读这个结构体里面的数据,求和。
这个结构体可能在某些时刻被一个写线程更新。</p>
</blockquote>
<p>用liburcu的qsbr实现的话,会是下面这样的代码:</p>
<div class="language-cpp highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">Foo</span> <span class="p">{</span> <span class="kt">int</span> <span class="n">a</span><span class="p">,</span> <span class="n">b</span><span class="p">,</span> <span class="n">c</span><span class="p">,</span> <span class="n">d</span><span class="p">;</span> <span class="p">};</span>
<span class="kt">void</span> <span class="nf">ReadThreadFunc</span><span class="p">()</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span> <span class="n">foo</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">rcu_register_thread</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">100000000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rcu_read_lock</span><span class="p">();</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">rcu_dereference</span><span class="p">(</span><span class="n">gs_foo</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sum</span> <span class="o">+=</span> <span class="n">foo</span><span class="o">-></span><span class="n">a</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">b</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">c</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">d</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">rcu_read_unlock</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">rcu_quiescent_state</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">rcu_unregister_thread</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteThreadFunc</span><span class="p">()</span> <span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">gs_is_end</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span> <span class="n">foo</span> <span class="o">=</span>
<span class="p">(</span><span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">Foo</span><span class="p">));</span>
<span class="n">foo</span><span class="o">-></span><span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span> <span class="n">foo</span><span class="o">-></span><span class="n">b</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">foo</span><span class="o">-></span><span class="n">c</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span> <span class="n">foo</span><span class="o">-></span><span class="n">d</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">rcu_xchg_pointer</span><span class="p">(</span><span class="o">&</span><span class="n">gs_foo</span><span class="p">,</span> <span class="n">foo</span><span class="p">);</span>
<span class="n">synchronize_rcu</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="n">free</span><span class="p">(</span><span class="n">foo</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>这里可以看到几个关键点:</p>
<ul>
<li>对于读者
<ol>
<li>线程开始的时候需要调用<code class="language-plaintext highlighter-rouge">rcu_register_thread()</code>进行注册,线程结束的时候需要调用<code class="language-plaintext highlighter-rouge">rcu_unregister_thread()</code>进行注销。</li>
<li>对于共享数据区的访问需要用<code class="language-plaintext highlighter-rouge">rcu_read_lock()</code>和<code class="language-plaintext highlighter-rouge">rcu_read_unlock()</code>来表示临界区。</li>
<li>对于共享数据的指针,需要用<code class="language-plaintext highlighter-rouge">rcu_dereference()</code>来获取。</li>
<li>线程时不时需要调用<code class="language-plaintext highlighter-rouge">rcu_quiescent_state()</code>来声明线程在quiescent state。</li>
</ol>
</li>
<li>对于写者
<ol>
<li>新的数据初始化需要在替换指针之前就完成。</li>
<li>指针替换需要调用<code class="language-plaintext highlighter-rouge">rcu_xchg_pointer()</code>来完成。</li>
<li>替换完数据之后,需要调用<code class="language-plaintext highlighter-rouge">synchronize_rcu()</code>来等待<a href="http://airekans.github.io/c/2016/04/23/rcu-intro#grace-period">Grace Period</a>的结束。</li>
<li>在<code class="language-plaintext highlighter-rouge">synchronize_rcu()</code>结束之后,我们就可以放心的删除旧数据了。</li>
</ol>
</li>
</ul>
<p>接下来我们来看看这些函数是怎么实现的。</p>
<h1 id="qsbr关键数据结构">QSBR关键数据结构</h1>
<p>在RCU里面,最核心的就是Grace Period了。在qsbr里面,Grace Period是用一个全局的<code class="language-plaintext highlighter-rouge">unsigned long</code>(64 bits)的counter——<code class="language-plaintext highlighter-rouge">rcu_gp</code>来表示。
每新开始一个Grace Period,就往这个counter上加一。所以这个数值我们可以称之为gp号。</p>
<p>而对于每个读线程,都会有一个<code class="language-plaintext highlighter-rouge">rcu_reader</code>结构,这个结构里面存着最近一次的gp号缓存,以及一些额外的数据。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">rcu_gp</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ctr</span><span class="p">;</span>
<span class="kt">int32_t</span> <span class="n">futex</span><span class="p">;</span>
<span class="p">}</span> <span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="n">CAA_CACHE_LINE_SIZE</span><span class="p">)));</span>
<span class="k">extern</span> <span class="k">struct</span> <span class="n">rcu_gp</span> <span class="n">rcu_gp</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">rcu_reader</span> <span class="p">{</span>
<span class="cm">/* Data used by both reader and synchronize_rcu() */</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">ctr</span><span class="p">;</span>
<span class="k">struct</span> <span class="n">cds_list_head</span> <span class="n">node</span>
<span class="n">__attribute__</span><span class="p">((</span><span class="n">aligned</span><span class="p">(</span><span class="n">CAA_CACHE_LINE_SIZE</span><span class="p">)));</span>
<span class="kt">int</span> <span class="n">waiting</span><span class="p">;</span>
<span class="n">pthread_t</span> <span class="n">tid</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">registered</span><span class="o">:</span><span class="mi">1</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">extern</span> <span class="nf">DECLARE_URCU_TLS</span><span class="p">(</span><span class="k">struct</span> <span class="n">rcu_reader</span><span class="p">,</span> <span class="n">rcu_reader</span><span class="p">);</span>
</code></pre></div></div>
<p>在qsbr里面,<code class="language-plaintext highlighter-rouge">read_lock</code>和<code class="language-plaintext highlighter-rouge">read_unlock</code>都不会改变本线程的gp缓存,只有在<code class="language-plaintext highlighter-rouge">rcu_quiescent_state()</code>调用的时候,会从全局的<code class="language-plaintext highlighter-rouge">rcu_gp</code>里面获取最新的gp号,更新到本线程缓存。</p>
<p>当写线程执行到<code class="language-plaintext highlighter-rouge">synchronize_rcu()</code>的时候,实际上就会先把<code class="language-plaintext highlighter-rouge">rcu_gp</code>加一,然后等待所有的读线程的gp缓存都等于最新的gp号,然后才返回。这也就是qsbr实现的Grace Period机制。</p>
<h1 id="读线程函数">读线程函数</h1>
<p>接下来我们来看看对于读线程来说,几个关键函数是怎么实现的。</p>
<h2 id="线程注册注销">线程注册、注销</h2>
<p>在qsbr里面,每一个读线程都需要调用<code class="language-plaintext highlighter-rouge">rcu_register_thread()</code>进行注册,否则写线程并不知道该读线程的存在。而在线程结束之前也必须调用<code class="language-plaintext highlighter-rouge">rcu_unregister_thread()</code>进行注销,否则会造成写线程死锁。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">DEFINE_URCU_TLS</span><span class="p">(</span><span class="k">struct</span> <span class="n">rcu_reader</span><span class="p">,</span> <span class="n">rcu_reader</span><span class="p">);</span>
<span class="k">static</span> <span class="nf">CDS_LIST_HEAD</span><span class="p">(</span><span class="n">registry</span><span class="p">);</span>
<span class="k">static</span> <span class="n">pthread_mutex_t</span> <span class="n">rcu_registry_lock</span> <span class="o">=</span> <span class="n">PTHREAD_MUTEX_INITIALIZER</span><span class="p">;</span>
<span class="kt">void</span> <span class="nf">rcu_register_thread</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">tid</span> <span class="o">=</span> <span class="n">pthread_self</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">ctr</span> <span class="o">==</span> <span class="mi">0</span><span class="p">);</span>
<span class="n">mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="n">assert</span><span class="p">(</span><span class="o">!</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">registered</span><span class="p">);</span>
<span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">registered</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">cds_list_add</span><span class="p">(</span><span class="o">&</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">node</span><span class="p">,</span> <span class="o">&</span><span class="n">registry</span><span class="p">);</span>
<span class="n">mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="n">_rcu_thread_online</span><span class="p">();</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">rcu_unregister_thread</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">_rcu_thread_offline</span><span class="p">();</span>
<span class="n">assert</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">registered</span><span class="p">);</span>
<span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">registered</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="n">cds_list_del</span><span class="p">(</span><span class="o">&</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">node</span><span class="p">);</span>
<span class="n">mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>上面的代码首先定义了TLS变量<code class="language-plaintext highlighter-rouge">rcu_reader</code>,使得每个读线程都有一个<code class="language-plaintext highlighter-rouge">rcu_reader</code>。然后定义一个双向链表<code class="language-plaintext highlighter-rouge">registry</code>,用来保存所有读线程的<code class="language-plaintext highlighter-rouge">rcu_reader</code>。这会在写线程的<code class="language-plaintext highlighter-rouge">synchronize_rcu()</code>用到。还有一个<code class="language-plaintext highlighter-rouge">mutex</code>来保护这个链表。</p>
<p>在<code class="language-plaintext highlighter-rouge">rcu_register_thread</code>里面,主要就是往这个链表里面加入本线程的<code class="language-plaintext highlighter-rouge">rcu_reader</code>。接着调用<code class="language-plaintext highlighter-rouge">_rcu_thread_online</code>来缓存最新的<code class="language-plaintext highlighter-rouge">rcu_gp</code>。
<code class="language-plaintext highlighter-rouge">rcu_unregister_thread</code>则做相反的事,先清除一些标记,然后把本线程的<code class="language-plaintext highlighter-rouge">rcu_reader</code>从链表里面删除。</p>
<p>这两个函数用到了锁,但是由于这两个函数只在线程的开始和结束才会调用,所以对性能基本没有影响。</p>
<h2 id="读临界区">读临界区</h2>
<p>读线程在对公共数据做操作的时候,需要调用<code class="language-plaintext highlighter-rouge">rcu_read_lock</code>和<code class="language-plaintext highlighter-rouge">rcu_read_unlock</code>来标记临界区:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kr">inline</span> <span class="kt">void</span> <span class="nf">rcu_read_lock</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">urcu_assert</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">ctr</span><span class="p">);</span>
<span class="p">}</span>
<span class="kr">inline</span> <span class="kt">void</span> <span class="nf">rcu_read_unlock</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="n">urcu_assert</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">ctr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>可以看到在这两个函数里面实际上什么都没有做,只是assert,说明在O2优化下这就是个空的函数。这也就是为什么urcu-qsbr是zero overhead的原因,因为他的读临界区完全啥事没干!</p>
<h2 id="quiescent-state">Quiescent State</h2>
<p>而在qsbr里面对于读线程最核心的函数实际上是<code class="language-plaintext highlighter-rouge">rcu_quiescent_state()</code>,用来告诉写线程,该读线程已经结束了一批读临界区:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">rcu_quiescent_state</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">gp_ctr</span><span class="p">;</span>
<span class="n">urcu_assert</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">registered</span><span class="p">);</span>
<span class="k">if</span> <span class="p">((</span><span class="n">gp_ctr</span> <span class="o">=</span> <span class="n">CMM_LOAD_SHARED</span><span class="p">(</span><span class="n">rcu_gp</span><span class="p">.</span><span class="n">ctr</span><span class="p">))</span> <span class="o">==</span> <span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">ctr</span><span class="p">)</span>
<span class="k">return</span><span class="p">;</span>
<span class="n">_rcu_quiescent_state_update_and_wakeup</span><span class="p">(</span><span class="n">gp_ctr</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>这个函数首先看看当前线程的gp号是否已经是最新的,如果是,直接返回;否则调用<code class="language-plaintext highlighter-rouge">_rcu_quiescent_state_update_and_wakeup</code>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">_rcu_quiescent_state_update_and_wakeup</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">long</span> <span class="n">gp_ctr</span><span class="p">)</span> <span class="p">{</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span>
<span class="n">_CMM_STORE_SHARED</span><span class="p">(</span><span class="n">URCU_TLS</span><span class="p">(</span><span class="n">rcu_reader</span><span class="p">).</span><span class="n">ctr</span><span class="p">,</span> <span class="n">gp_ctr</span><span class="p">);</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span> <span class="cm">/* write URCU_TLS(rcu_reader).ctr before read futex */</span>
<span class="n">wake_up_gp</span><span class="p">();</span> <span class="cm">/* similar to pthread_cond_broadcast */</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">wakeup</code>函数实际上就是把刚刚读出来的最新的gp号存到当前线程的gp缓存里,接着唤醒可能在等待的写线程。这里的三个<code class="language-plaintext highlighter-rouge">cmm_smp_mb</code>调用就是memory barrier,防止这个函数之前和之后的操作可能产生的乱序,以及函数中的两步操作之间可能的乱序。</p>
<p>可以从上面看出,核心函数的操作都不复杂,基本都是一些变量的load和store,overhead非常小。</p>
<h1 id="写线程函数synchronize_rcu">写线程函数——synchronize_rcu</h1>
<p>对于写线程,最核心的函数就是<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>,等待Grace Period的结束:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">synchronize_rcu</span><span class="p">(</span><span class="kt">void</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CDS_LIST_HEAD</span><span class="p">(</span><span class="n">qsreaders</span><span class="p">);</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span>
<span class="n">mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_gp_lock</span><span class="p">);</span>
<span class="n">mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">cds_list_empty</span><span class="p">(</span><span class="o">&</span><span class="n">registry</span><span class="p">))</span>
<span class="k">goto</span> <span class="n">out</span><span class="p">;</span>
<span class="n">CMM_STORE_SHARED</span><span class="p">(</span><span class="n">rcu_gp</span><span class="p">.</span><span class="n">ctr</span><span class="p">,</span> <span class="n">rcu_gp</span><span class="p">.</span><span class="n">ctr</span> <span class="o">+</span> <span class="n">RCU_GP_CTR</span><span class="p">);</span>
<span class="n">cmm_barrier</span><span class="p">();</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span>
<span class="n">wait_for_readers</span><span class="p">(</span><span class="o">&</span><span class="n">registry</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="o">&</span><span class="n">qsreaders</span><span class="p">);</span>
<span class="n">cds_list_splice</span><span class="p">(</span><span class="o">&</span><span class="n">qsreaders</span><span class="p">,</span> <span class="o">&</span><span class="n">registry</span><span class="p">);</span>
<span class="nl">out:</span>
<span class="n">mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_registry_lock</span><span class="p">);</span>
<span class="n">mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">rcu_gp_lock</span><span class="p">);</span>
<span class="n">cmm_smp_mb</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>函数里面的<code class="language-plaintext highlighter-rouge">cmm_smp_mb</code>的作用就是为了确保<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>之前和之后的读写操作都不会乱序。
然后在函数里面分别对全局<code class="language-plaintext highlighter-rouge">rcu_gp</code>和<code class="language-plaintext highlighter-rouge">registry</code>进行了加锁,接着看看<code class="language-plaintext highlighter-rouge">registry</code>是否为空,如果空则表示没有读线程,可以直接返回。</p>
<p>如果不为空,则把<code class="language-plaintext highlighter-rouge">rcu_gp</code>增一。增一的作用就是表示一个新的Grace Period已经开始了。
接着调用<code class="language-plaintext highlighter-rouge">wait_for_readers</code>,等待Grace Period的结束。</p>
<p>下面我们来看看<code class="language-plaintext highlighter-rouge">wait_for_readers</code>的简化实现(实际实现要复杂很多,这里我们不关心具体的细节,只了解大概思路):</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">static</span> <span class="kt">void</span> <span class="nf">wait_for_readers</span><span class="p">(</span>
<span class="k">struct</span> <span class="n">cds_list_head</span> <span class="o">*</span><span class="n">input_readers</span><span class="p">,</span>
<span class="k">struct</span> <span class="n">cds_list_head</span> <span class="o">*</span><span class="n">cur_snap_readers</span><span class="p">,</span>
<span class="k">struct</span> <span class="n">cds_list_head</span> <span class="o">*</span><span class="n">qsreaders</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">rcu_reader</span> <span class="o">*</span><span class="n">index</span><span class="p">,</span> <span class="o">*</span><span class="n">tmp</span><span class="p">;</span>
<span class="n">cds_list_for_each_entry_safe</span><span class="p">(</span>
<span class="n">index</span><span class="p">,</span> <span class="n">tmp</span><span class="p">,</span> <span class="n">input_readers</span><span class="p">,</span> <span class="n">node</span><span class="p">)</span> <span class="p">{</span>
<span class="k">while</span> <span class="p">(</span><span class="n">index</span><span class="o">-></span><span class="n">ctr</span> <span class="o"><</span> <span class="n">CMM_LOAD_SHARED</span><span class="p">(</span><span class="n">rcu_gp</span><span class="p">.</span><span class="n">ctr</span><span class="p">))</span> <span class="p">{</span>
<span class="n">usleep</span><span class="p">(</span><span class="mi">100</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>函数的目的就是等待所有的读线程都更新自己的gp号到最新的gp号。</p>
<p>在<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>返回之后,我们可以知道没有任何一个读线程可以获取到旧的共享数据,所以我们可以删除旧数据。</p>
<p>以上就是一个qsbr的RCU实现最核心的代码。</p>
<h1 id="性能">性能</h1>
<p>下面我们用一个最简单的代码例子来对urcu和mutex做一下benchmark(详细代码可以看这个<a href="https://github.com/airekans/urcu-benchmark">repo</a>)。</p>
<blockquote>
<p>若干个读线程对一个共享数据不断的读取,而另外一个写线程也不断的更新数据。</p>
</blockquote>
<p>如果用urcu来实现上面的逻辑的话,大概是下面这样:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">ReadThreadFunc</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span> <span class="n">foo</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">sum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">j</span><span class="p">;</span>
<span class="n">rcu_register_thread</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">LOOP_TIMES</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">j</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">j</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span> <span class="o">++</span><span class="n">j</span><span class="p">)</span> <span class="p">{</span>
<span class="n">rcu_read_lock</span><span class="p">();</span>
<span class="n">foo</span> <span class="o">=</span> <span class="n">rcu_dereference</span><span class="p">(</span><span class="n">gs_foo</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="n">sum</span> <span class="o">+=</span> <span class="n">foo</span><span class="o">-></span><span class="n">a</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">b</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">c</span> <span class="o">+</span> <span class="n">foo</span><span class="o">-></span><span class="n">d</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">rcu_read_unlock</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">rcu_quiescent_state</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">rcu_unregister_thread</span><span class="p">();</span>
<span class="n">pthread_mutex_lock</span><span class="p">(</span><span class="o">&</span><span class="n">gs_sum_guard</span><span class="p">);</span>
<span class="n">gs_sum</span> <span class="o">+=</span> <span class="n">sum</span><span class="p">;</span>
<span class="n">pthread_mutex_unlock</span><span class="p">(</span><span class="o">&</span><span class="n">gs_sum_guard</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">WriteThreadFunc</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="o">!</span><span class="n">gs_is_end</span><span class="p">)</span> <span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span> <span class="n">foo</span> <span class="o">=</span> <span class="p">(</span><span class="k">struct</span> <span class="n">Foo</span><span class="o">*</span><span class="p">)</span> <span class="n">malloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="k">struct</span> <span class="n">Foo</span><span class="p">));</span>
<span class="n">foo</span><span class="o">-></span><span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">foo</span><span class="o">-></span><span class="n">b</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">foo</span><span class="o">-></span><span class="n">c</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="n">foo</span><span class="o">-></span><span class="n">d</span> <span class="o">=</span> <span class="mi">5</span><span class="p">;</span>
<span class="n">rcu_xchg_pointer</span><span class="p">(</span><span class="o">&</span><span class="n">gs_foo</span><span class="p">,</span> <span class="n">foo</span><span class="p">);</span>
<span class="n">synchronize_rcu</span><span class="p">();</span>
<span class="k">if</span> <span class="p">(</span><span class="n">foo</span><span class="p">)</span> <span class="p">{</span>
<span class="n">free</span><span class="p">(</span><span class="n">foo</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</code></pre></div></div>
<p>下面是在一台16核(hyper-threading 32核)机器上面的benchmark结果。横轴为读线程数,纵轴为时间(单位为秒):</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/15135954/07bc6ef2-16af-11e6-9857-c48639224e94.png" alt="URCU性能" /></p>
<p>其中:</p>
<ol>
<li>urcu_read_only是只有读线程,没有写线程的测试。相当于无锁的版本,也是性能最好的。</li>
<li>urcu_qsbr_test是用urcu-qsbr的实现</li>
<li>urcu_signal_test是用urcu-signal的实现</li>
<li>urcu_generic_test是用urcu-mb的实现</li>
<li>single_mutex_test是用一个mutex来保护共享数据的实现,也是我们最熟悉的实现。</li>
<li>mutex_per_thread_test是每一个读线程都独占一个mutex,写线程需要获取所有读线程的mutex来进入临界区。</li>
</ol>
<p>可以看到,qsbr的性能最接近于read_only,其次是signal,都要比mutex版本好至少4倍,并且时间并不随读线程数目的增加而增加。这说明urcu随着核的增多,能够scale上去。</p>
Read-Copy Update,向无锁编程进发!
2016-04-23T00:00:00+00:00
http://airekans.github.io/c/2016/04/23/rcu-intro
<p>在无锁编程的世界里,ABA问题是一个没有办法回避的实现问题。就看看实现一个最简单的<a href="https://en.wikipedia.org/wiki/ABA_problem#Examples">基于单链表的stack都有这么多的坑</a>,就知道无锁编程有多难。
难道我们追求高性能的道路就被这个拦路虎挡住了?
No,我们有Read-Copy Update(RCU)这个法宝,帮助我们方便的实现很多的无锁算法数据结构。</p>
<p>本文会首先简略介绍RCU的基本概念,然后通过例子来详细阐述RCU的读写概念,最后简单介绍RCU目前的实现方案。</p>
<h1 id="什么是rcu">什么是RCU?</h1>
<p>引用一下这篇<a href="http://lwn.net/Articles/262464/">著名的RCU科普文</a>的开头:</p>
<blockquote>
<p>Read-copy update (RCU) is a synchronization mechanism that was added to the Linux kernel in October of 2002. RCU achieves scalability improvements by allowing reads to occur concurrently with updates.</p>
</blockquote>
<p>首先,RCU是一种同步机制;其次RCU实现了读写的并行;最后,2002开始被Linux kernel所使用。
RCU利用一种Publish-Subscribe的机制,在Writer端增加一定负担,使得Reader端几乎可以<strong>Zero-overhead</strong>。</p>
<p>RCU适合用于同步基于指针实现的数据结构(例如链表,哈希表等),同时由于他的Reader 0 overhead的特性,特别适用用读操作远远大与写操作的场景。例如在Linux内核中的routing模块(与DNS非常相关)则用到来RCU来实现高性能。</p>
<h1 id="一个指针的例子">一个指针的例子</h1>
<p>假设我们有下面这个结构定义:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">foo</span> <span class="p">{</span>
<span class="kt">int</span> <span class="n">a</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">b</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">c</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">struct</span> <span class="n">foo</span> <span class="o">*</span><span class="n">gp</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
</code></pre></div></div>
<p>那么在不考虑<code class="language-plaintext highlighter-rouge">gp</code>这个指针会被改变的情况下,我们可以这样的去进行读操作:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">foo</span><span class="o">*</span> <span class="n">p</span> <span class="o">=</span> <span class="n">gp</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">p</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">do_something_with</span><span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">a</span><span class="p">,</span> <span class="n">p</span><span class="o">-></span><span class="n">b</span><span class="p">,</span> <span class="n">p</span><span class="o">-></span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>一切看起来都很简单。如果我们现在有一个Writer是像下面这样去改变<code class="language-plaintext highlighter-rouge">gp</code>呢?</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">foo</span><span class="o">*</span> <span class="n">p</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">),</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
<span class="k">struct</span> <span class="n">foo</span><span class="o">*</span> <span class="n">tmp_gp</span> <span class="o">=</span> <span class="n">gp</span><span class="p">;</span>
<span class="n">p</span><span class="o">-></span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">p</span><span class="o">-></span><span class="n">b</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">p</span><span class="o">-></span><span class="n">c</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">gp</span> <span class="o">=</span> <span class="n">p</span><span class="p">;</span>
<span class="n">free</span><span class="p">(</span><span class="n">tmp_gp</span><span class="p">);</span>
</code></pre></div></div>
<p>读者们知道会发生什么吗?</p>
<p>如果在有几个Reader在获取了旧的gp之后,被context switch,然后Writer就把这个旧的<code class="language-plaintext highlighter-rouge">gp</code>(这Writer端是<code class="language-plaintext highlighter-rouge">tmp_gp</code>)删除了。那么后面当Reader再次被调度,就会造成segfault。就也是我们在学习多线程编程里面最基本的race condition,一般来说我们就会对<code class="language-plaintext highlighter-rouge">gp</code>指针加上mutex或者是rwlock,这样就可以达到互斥的效果。</p>
<p>那么如果是RCU的话,怎么解决呢?</p>
<h1 id="rcu的读写锁">RCU的读写锁</h1>
<p>RCU里面,通过一种类似于读写锁的方式来实现互斥,在Reader端,可以用<code class="language-plaintext highlighter-rouge">rcu_read_lock/unlock</code>来保护:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">rcu_read_lock</span><span class="p">();</span>
<span class="n">p</span> <span class="o">=</span> <span class="n">rcu_dereference</span><span class="p">(</span><span class="n">gp</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">p</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span> <span class="p">{</span>
<span class="n">do_something_with</span><span class="p">(</span><span class="n">p</span><span class="o">-></span><span class="n">a</span><span class="p">,</span> <span class="n">p</span><span class="o">-></span><span class="n">b</span><span class="p">,</span> <span class="n">p</span><span class="o">-></span><span class="n">c</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">rcu_read_unlock</span><span class="p">();</span>
</code></pre></div></div>
<p>这里,需要保护的代码会在<code class="language-plaintext highlighter-rouge">read_lock/unlock</code>之间,和读写锁的读锁用法一致。</p>
<p>而在Writer端,我们会用一个<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>来等待所有的使用旧的<code class="language-plaintext highlighter-rouge">gp</code>都结束之后再删除。</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">q</span> <span class="o">=</span> <span class="n">kmalloc</span><span class="p">(</span><span class="k">sizeof</span><span class="p">(</span><span class="o">*</span><span class="n">p</span><span class="p">),</span> <span class="n">GFP_KERNEL</span><span class="p">);</span>
<span class="n">q</span><span class="o">-></span><span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">q</span><span class="o">-></span><span class="n">b</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="n">q</span><span class="o">-></span><span class="n">c</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="n">rcu_xchg_pointer</span><span class="p">(</span><span class="o">&</span><span class="n">gp</span><span class="p">,</span> <span class="n">q</span><span class="p">);</span>
<span class="n">synchronize_rcu</span><span class="p">();</span>
<span class="n">kfree</span><span class="p">(</span><span class="n">p</span><span class="p">);</span>
</code></pre></div></div>
<p>因为在<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>之后,RCU可以保证所有持有旧的gp的读锁都已经结束,所以我们可以放心的删除旧的<code class="language-plaintext highlighter-rouge">gp</code>。</p>
<p>注意上面的Writer例子里面,只允许同一时间有一个Writer的存在。如果有多个Writer的话,需要修改一下,在这出于简单的考虑,暂不考虑多Writer的情况。</p>
<p>这里最关键的就是<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>这个函数了,正是它使得RCU能正确的工作。</p>
<h1 id="grace-period">Grace Period</h1>
<p><img src="http://static.lwn.net/images/ns/kernel/rcu/GracePeriodGood.png" alt="RCU Grace Period" /></p>
<p>在RCU里面,有两个关键的时间区间,一个就是Reader Lock时间,一个是Grace Period。上面的图可以大概的了解一下这两者之间的关系。</p>
<p>Reader Lock时间顾名思义就是然后Reader在<code class="language-plaintext highlighter-rouge">rcu_read_lock/unlock</code>之间的时间。而Grace Period的意思是,从Writer开始修改受保护的数据结构开始,到所有的Reader Lock都结束了至少一次的时间段。</p>
<p>假设我们称Grace Period开始的时间点是T:</p>
<ul>
<li>如果一个Reader Lock时间横跨T,则Grace Period必然结束于这个Reader Lock结束之后。</li>
<li>如果一个Reader Lock开始于T之后,则Grace Period可能于这个Reader Lock的任意时间结束。也就是可能在Reader Lock开始之前结束,也可能在Reader Lock中间结束,也可能在Reader Lock结束之后才结束。</li>
</ul>
<p>了解了上面这两个概念之后,我们可以通过简单的证明知道,在Grace Period之后,所有的Reader都不可能获取到在T时间之前的旧数据。所以在Grace Period之后,作为Writer是可以放心的删除旧数据的。</p>
<p>所以上面例子里面Writer的<code class="language-plaintext highlighter-rouge">synchronize_rcu</code>,实际上就是等待Grace Period的结束。</p>
<h1 id="rcu的实现">RCU的实现</h1>
<p>目前RCU的实现主要是在Linux kernel里面,但是kernel里面的RCU非常依赖于其实现,对于进程调度这块有非常多的假设。
如果我们想要在userspace去用RCU的话,则需要对RCU进行一些拓展。
liburcu就是把RCU搬到了用户态,使得用户态的程序也可以利用上RCU。</p>
<p>我会在下一篇文件详细讲URCU的实现。</p>
Linux内核中的队列 kfifo
2015-10-12T00:00:00+00:00
http://airekans.github.io/c/2015/10/12/linux-kernel-data-structure-kfifo
<p>在内核中经常会有需要用到队列来传递数据的时候,而在Linux内核中就有一个轻量而且实现非常巧妙的队列实现——kfifo。
简单来说kfifo是一个有限定大小的环形buffer,借用网络上的一个图片来说明一下是最清楚的:</p>
<p><img src="http://blog.chinaunix.net/attachment/201404/10/18770639_1397093507W9w9.bmp" alt="kfifo-diagram" /></p>
<p><code class="language-plaintext highlighter-rouge">kfifo</code>本身并没有队列元素的概念,其内部只是一个buffer。在使用的时候需要用户知道其内部存储的内容,所以最好是用来存储定长对象。</p>
<p><code class="language-plaintext highlighter-rouge">kfifo</code>有一个重要的特性,就是当使用场景是单生产者单消费者(1 Producer 1 Consumer,以下简称1P1C)的情况下,不需要加锁,所以在这种情况下的性能较高。</p>
<p>本文中的所有代码均来自linux kernel 2.6.32,所以License也是GPLv2的。</p>
<h1 id="定义及api">定义及API</h1>
<p>kfifo主要定义在<code class="language-plaintext highlighter-rouge">include/linux/kfifo.h</code>里面:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">struct</span> <span class="n">kfifo</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">;</span> <span class="cm">/* the buffer holding the data */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">size</span><span class="p">;</span> <span class="cm">/* the size of the allocated buffer */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">in</span><span class="p">;</span> <span class="cm">/* data is added at offset (in % size) */</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">out</span><span class="p">;</span> <span class="cm">/* data is extracted from off. (out % size) */</span>
<span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">;</span> <span class="cm">/* protects concurrent modifications */</span>
<span class="p">};</span>
<span class="k">extern</span> <span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="nf">kfifo_init</span><span class="p">(</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">size</span><span class="p">,</span>
<span class="n">gfp_t</span> <span class="n">gfp_mask</span><span class="p">,</span> <span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="k">extern</span> <span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="nf">kfifo_alloc</span><span class="p">(</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">size</span><span class="p">,</span> <span class="n">gfp_t</span> <span class="n">gfp_mask</span><span class="p">,</span>
<span class="n">spinlock_t</span> <span class="o">*</span><span class="n">lock</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">void</span> <span class="nf">kfifo_free</span><span class="p">(</span><span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="n">fifo</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">__kfifo_put</span><span class="p">(</span><span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="n">fifo</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">);</span>
<span class="k">extern</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">__kfifo_get</span><span class="p">(</span><span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="n">fifo</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">);</span>
</code></pre></div></div>
<p>可以看到在kfifo本身的定义里面,有一个<code class="language-plaintext highlighter-rouge">spinlock_t</code>,这是用来在多线程同时修改队列的时候加锁的。而其余的成员就很明显了,是用来表示队列的当前状态的。队列本身的内容存储在<code class="language-plaintext highlighter-rouge">buffer</code>里面。</p>
<p>需要注意的是,kfifo要求队列的size是2的幂(2^n),这样在后面操作的时候求余操作可以通过与运算来完成,从而更高效。</p>
<p>初始化通过<code class="language-plaintext highlighter-rouge">kfifo_init</code>和<code class="language-plaintext highlighter-rouge">kfifo_alloc</code>完成。而对于队列操作的主要函数的是<code class="language-plaintext highlighter-rouge">kfifo_put</code>和<code class="language-plaintext highlighter-rouge">kfifo_get</code>。这两个函数会先加锁,然后调用<code class="language-plaintext highlighter-rouge">__kfifo_put</code>或者<code class="language-plaintext highlighter-rouge">__kfifo_get</code>。也就是说真正的逻辑是实现在这两个函数里。
之前也说过<code class="language-plaintext highlighter-rouge">kfifo</code>在1P1C的情况下是不需要加锁的,所以这里我们会着重看看这两个函数。</p>
<h1 id="入队">入队</h1>
<p><code class="language-plaintext highlighter-rouge">__kfifo_put</code>的定义很短:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">__kfifo_put</span><span class="p">(</span><span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="n">fifo</span><span class="p">,</span>
<span class="k">const</span> <span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">l</span><span class="p">;</span>
<span class="n">len</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">len</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="n">fifo</span><span class="o">-></span><span class="n">in</span> <span class="o">+</span> <span class="n">fifo</span><span class="o">-></span><span class="n">out</span><span class="p">);</span>
<span class="cm">/*
* Ensure that we sample the fifo->out index -before- we
* start putting bytes into the kfifo.
*/</span>
<span class="n">smp_mb</span><span class="p">();</span>
<span class="cm">/* first put the data starting from fifo->in to buffer end */</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">len</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">in</span> <span class="o">&</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">buffer</span> <span class="o">+</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">in</span> <span class="o">&</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)),</span> <span class="n">buffer</span><span class="p">,</span> <span class="n">l</span><span class="p">);</span>
<span class="cm">/* then put the rest (if any) at the beginning of the buffer */</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">buffer</span><span class="p">,</span> <span class="n">buffer</span> <span class="o">+</span> <span class="n">l</span><span class="p">,</span> <span class="n">len</span> <span class="o">-</span> <span class="n">l</span><span class="p">);</span>
<span class="cm">/*
* Ensure that we add the bytes to the kfifo -before-
* we update the fifo->in index.
*/</span>
<span class="n">smp_wmb</span><span class="p">();</span>
<span class="n">fifo</span><span class="o">-></span><span class="n">in</span> <span class="o">+=</span> <span class="n">len</span><span class="p">;</span>
<span class="k">return</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>可以看到里面加了一些memory barrier来确保1P1C场景的正确,这里我们可以暂时忽略。</p>
<p>主要的步骤如下:</p>
<ol>
<li>计算len和队列余下容量的较小值,如果队列容量不足,则只会拷贝剩余容量的大小。</li>
<li>先拷贝一部分内容到队列的尾部。</li>
<li>如果队列尾部并不能容下所有的内容,则再在队列的头部空闲空间继续拷贝。</li>
<li>把队列内容长度加上len</li>
<li>返回新增内容的长度len</li>
</ol>
<p>这里注意到in只有在<code class="language-plaintext highlighter-rouge">__kfifo_put</code>里面才会修改,而这个函数里面只会对in增加,所以in的值只会增加,不会减少。而in本身是<code class="language-plaintext highlighter-rouge">unsigned int</code>类型的,所以当in超出了2^32的时候,会自动从0开始继续。</p>
<p>同时前面也说过,<code class="language-plaintext highlighter-rouge">kfifo</code>的size是2^n。所以当<code class="language-plaintext highlighter-rouge">in > 2^n</code>的时候,<code class="language-plaintext highlighter-rouge">(in & 2^n - 1) == (in % 2^n)</code>,所以这里可以用与操作替代求余来获取in在队列中实际的位置。</p>
<h1 id="出队">出队</h1>
<p><code class="language-plaintext highlighter-rouge">__kfifo_get</code>的定义和<code class="language-plaintext highlighter-rouge">__kfifo_put</code>长度差不多:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">unsigned</span> <span class="kt">int</span> <span class="nf">__kfifo_get</span><span class="p">(</span><span class="k">struct</span> <span class="n">kfifo</span> <span class="o">*</span><span class="n">fifo</span><span class="p">,</span>
<span class="kt">unsigned</span> <span class="kt">char</span> <span class="o">*</span><span class="n">buffer</span><span class="p">,</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">len</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">l</span><span class="p">;</span>
<span class="n">len</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">len</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">in</span> <span class="o">-</span> <span class="n">fifo</span><span class="o">-></span><span class="n">out</span><span class="p">);</span>
<span class="cm">/*
* Ensure that we sample the fifo->in index -before- we
* start removing bytes from the kfifo.
*/</span>
<span class="n">smp_rmb</span><span class="p">();</span>
<span class="cm">/* first get the data from fifo->out until the end of the buffer */</span>
<span class="n">l</span> <span class="o">=</span> <span class="n">min</span><span class="p">(</span><span class="n">len</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">out</span> <span class="o">&</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)));</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">buffer</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">buffer</span> <span class="o">+</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">out</span> <span class="o">&</span> <span class="p">(</span><span class="n">fifo</span><span class="o">-></span><span class="n">size</span> <span class="o">-</span> <span class="mi">1</span><span class="p">)),</span> <span class="n">l</span><span class="p">);</span>
<span class="cm">/* then get the rest (if any) from the beginning of the buffer */</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">buffer</span> <span class="o">+</span> <span class="n">l</span><span class="p">,</span> <span class="n">fifo</span><span class="o">-></span><span class="n">buffer</span><span class="p">,</span> <span class="n">len</span> <span class="o">-</span> <span class="n">l</span><span class="p">);</span>
<span class="cm">/*
* Ensure that we remove the bytes from the kfifo -before-
* we update the fifo->out index.
*/</span>
<span class="n">smp_mb</span><span class="p">();</span>
<span class="n">fifo</span><span class="o">-></span><span class="n">out</span> <span class="o">+=</span> <span class="n">len</span><span class="p">;</span>
<span class="k">return</span> <span class="n">len</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>忽略掉memory barrier之后,主要步骤如下:</p>
<ol>
<li>计算len和队列长度的较小值,如果队列内容不够,则只拷贝较小值的大小。</li>
<li>拷贝队列尾部的内容到输出buffer里面。</li>
<li>如果仍然有部分内容没有拷贝的话,则从队列头部拷贝余下的内容。</li>
<li>队列内容长度减少len(也就是<code class="language-plaintext highlighter-rouge">out += len</code>)。</li>
<li>返回拷贝内容的长度。</li>
</ol>
<p>其实基本就是<code class="language-plaintext highlighter-rouge">__kfifo_put</code>的逆过程。</p>
<p>那这里就有一个问题了,其实队列的长度并不一定要用<code class="language-plaintext highlighter-rouge">in</code>和<code class="language-plaintext highlighter-rouge">out</code>两个变量来表示啊,也可以用一个<code class="language-plaintext highlighter-rouge">len</code>变量来表示啊。那这里就涉及到了多线程的互斥问题了。</p>
<h1 id="多线程互斥">多线程互斥</h1>
<p>这里我们只考虑最简单的多线程场景——1P1C。如果我们只用一个<code class="language-plaintext highlighter-rouge">len</code>来表示队列长度的话,那么看看<code class="language-plaintext highlighter-rouge">__kfifo_put</code>和<code class="language-plaintext highlighter-rouge">__kfifo_get</code>里面对这个变量都需要做修改,而且一个是<code class="language-plaintext highlighter-rouge">+=</code>操作,一个是<code class="language-plaintext highlighter-rouge">-=</code>。如果在不加锁的情况下,这两个操作并不是原子操作,所以如果只用一个<code class="language-plaintext highlighter-rouge">len</code>,我们必须用锁来保护,无论是多么简单的多线程场景。</p>
<p>如果我们用<code class="language-plaintext highlighter-rouge">in</code>和<code class="language-plaintext highlighter-rouge">out</code>来表示队列的读边界和写边界的话,那么队列的长度可以用<code class="language-plaintext highlighter-rouge">in - out</code>来表示。而且就像我们看到的那样,<code class="language-plaintext highlighter-rouge">in</code>只会在<code class="language-plaintext highlighter-rouge">__kfifo_put</code>里面修改,而<code class="language-plaintext highlighter-rouge">out</code>也只会在<code class="language-plaintext highlighter-rouge">__kfifo_get</code>里面修改,所以无论是<code class="language-plaintext highlighter-rouge">in</code>或<code class="language-plaintext highlighter-rouge">out</code>都只会有一个线程修改,所以不会有互斥的问题。</p>
<p>那是不是这样就线程安全了呢?并不是。</p>
<p>还记得之前忽略掉的那些memory barrier吗?如果没有了那些barrier的话,代码仍然是不安全的。因为在多线程里面,我们不单只需要确保原子性,还需要保证不会有乱序(可见性)。而在没有锁或者memory barrier的情况下,没有办法保证在所有CPU上都不会出现乱序。而上面代码里面的memory barrier就是为了确保不出现乱序而加入的。</p>
<p>简单介绍一下这几个memory barrier的作用:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">smp_rmb</code>保证读操作之间不会出现乱序</li>
<li><code class="language-plaintext highlighter-rouge">smp_wmb</code>保证写操作之间不会出现乱序</li>
<li><code class="language-plaintext highlighter-rouge">smp_mb</code>保证读写操作都不会出现乱序</li>
</ol>
<p>接着我们可以把kfifo里面对<code class="language-plaintext highlighter-rouge">in</code>、<code class="language-plaintext highlighter-rouge">out</code>和<code class="language-plaintext highlighter-rouge">buffer</code>的读写操作归类一下,那么<code class="language-plaintext highlighter-rouge">__kfifo_put</code>的是下面这样:</p>
<ol>
<li>R(in), R(out)</li>
<li>R(in), W(buffer)</li>
<li>W(in)</li>
</ol>
<p>而<code class="language-plaintext highlighter-rouge">__kfifo_get</code>则是下面这样:</p>
<ol>
<li>R(in), R(out)</li>
<li>R(out), R(buffer)</li>
<li>W(out)</li>
</ol>
<p>我们先来看<code class="language-plaintext highlighter-rouge">__kfifo_put</code>,有几个内存操作是不可以出现乱序的:</p>
<ol>
<li>R(out)和W(buffer):因为我们需要知道<code class="language-plaintext highlighter-rouge">out</code>的最新值,否则可能出现明明有队列有空间,但是我们仍写不进去数据的情况。这里因为是要保证读写操作之间的顺序,所以需要用<code class="language-plaintext highlighter-rouge">smp_mb</code>。实际上在x86/64平台,连这个barrier也可以忽略,因为在x86上面,读后写是保证不会乱序的,不过Linux内核由于需要保证各个平台都能work,所以仍然需要这里加上。</li>
<li>W(buffer)和W(in):这个顺序是必须要保证的,否则可能我们更新了<code class="language-plaintext highlighter-rouge">in</code>之后,这个时候buffer的内容其实并没有copy进去,但是这时候来了一个<code class="language-plaintext highlighter-rouge">__kfifo_get</code>,就把内容拷贝出去了,这个是不允许的。所以这里我们需要用<code class="language-plaintext highlighter-rouge">smp_wmb</code>。</li>
</ol>
<p>我们可以用下面这个图来表示<code class="language-plaintext highlighter-rouge">kfifo</code>在put的时候的状态:</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/10421549/b85f24bc-70dc-11e5-9afd-2ec2f659422f.png" alt="kfifo_put states" /></p>
<p>类似的,<code class="language-plaintext highlighter-rouge">__kfifo_get</code>也有几个内存操作不可以乱序:</p>
<ol>
<li>R(in)和R(buffer):我们需要获取最新的<code class="language-plaintext highlighter-rouge">in</code>值,否则可能会出现明明队列有内容,但是我们却读不到。这里需要用<code class="language-plaintext highlighter-rouge">smp_rmb</code>。</li>
<li>R(buffer)和W(out):这个顺序也是必须保证的,因为如果我们在读buffer之前就更新的out的话,则可能出现正要读buffer之前,该内容已经被<code class="language-plaintext highlighter-rouge">__kfifo_put</code>覆盖了,则读出来并不是我们想要的内容。这里需要用<code class="language-plaintext highlighter-rouge">smp_mb</code>。</li>
</ol>
<p><code class="language-plaintext highlighter-rouge">kfifo</code>在get的时候的状态可以用下面的图来表示:</p>
<p><img src="https://cloud.githubusercontent.com/assets/1321283/10421609/6059015a-70de-11e5-8dac-b5805e194da9.png" alt="kfifo_get states" /></p>
<p>所以有了上面kfifo的实现,也就有了一个非常高效的1P1C队列。当然如果是在其他的多线程场景,我们仍然需要用spinlock来保护<code class="language-plaintext highlighter-rouge">kfifo</code>。</p>
<h1 id="性能比较">性能比较</h1>
<p>我建了一个repo(<a href="https://github.com/airekans/kfifo-benchmark">kfifo-benchmark</a>)来简单地比较了一下kfifo的性能。
我把kfifo port到了user space,同时简单地把<code class="language-plaintext highlighter-rouge">spinlock_t</code>替换成了<code class="language-plaintext highlighter-rouge">pthread_mutex_t</code>(<code class="language-plaintext highlighter-rouge">pthread_spinlock_t</code>默认并不在pthread,需要另外配置)。</p>
<p>比较里面的三个case(可以自行到<a href="https://github.com/airekans/kfifo-benchmark/blob/master/main.cc">main.cc</a>里面去看)及性能如下(我用的是real time/wall time,所以时间越短表示越快):</p>
<ol>
<li>使用<code class="language-plaintext highlighter-rouge">__kfifo_put</code>和<code class="language-plaintext highlighter-rouge">__kfifo_get</code>的1P1C(无锁):0m3.496s</li>
<li>使用<code class="language-plaintext highlighter-rouge">kfifo_put</code>和<code class="language-plaintext highlighter-rouge">kfifo_get</code>的1P1C场景(mutex):0m13.291s</li>
<li>使用tpool里面的<code class="language-plaintext highlighter-rouge">BoundedBlockingQueue</code>默认特化的1P1C场景(mutex+condition variable):0m17.791s</li>
</ol>
<p>可以看出来,在1P1C场景下,kfifo的无锁版比加锁版本要快3.8x。而就算是kfifo的加锁版本,也比tpool中的<code class="language-plaintext highlighter-rouge">BoundedBlockingQueue</code>要快33%。</p>
Google benchmark:一个简单易用的C++ benchmark库
2015-04-18T00:00:00+00:00
http://airekans.github.io/cpp/2015/04/18/google-benchmark
<p>在写C++程序的时候,经常需要对某些函数或者某些类的方法进行benchmark。一般来说,我们可以写一些简单的程序来进行测试,
然后跑一定的次数(比如10w次),看看跑了多久。</p>
<p>比如我写了下面这个从<code class="language-plaintext highlighter-rouge">int</code>到<code class="language-plaintext highlighter-rouge">string</code>的转换程序:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="n">string</span> <span class="nf">uint2str</span><span class="p">(</span><span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">num</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ostringstream</span> <span class="n">oss</span><span class="p">;</span>
<span class="n">oss</span> <span class="o"><<</span> <span class="n">num</span><span class="p">;</span>
<span class="k">return</span> <span class="n">oss</span><span class="p">.</span><span class="n">str</span><span class="p">();</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>那么我们可以写下面这个程序:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1000000</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">uint2str</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>然后在命令用time跑,看看跑了多少时间,但是这样做有一个问题,如果我们需要和另外一个函数做比较,
则main函数需要写一个分支来跑这个函数,或者干脆重新写一个程序。另外如果我们需要比较在不同的数据规模下函数会跑多快,
则这个benchmark程序写起来就比较麻烦了。</p>
<p>正好最近看见Google开源的<a href="https://github.com/google/benchmark">benchmark C++库</a>,且自己也在写<code class="language-plaintext highlighter-rouge">HashMap</code>,所以也就实践了用benchmark库来进行benchmark,
发现它有下面几个不错的feature:</p>
<ol>
<li>简单易用,如果用过gtest的人,写起来会非常熟悉。</li>
<li>对于不同的data size进行benchmark支持很好,可以很简单的用同一个代码段跑不同的data size。</li>
<li>输出的benchmark结果直接就是真实时间和CPU时间,且很方便的导入excel进行数据分析。</li>
<li>支持多线程benchmark(这个我还没用到)。</li>
</ol>
<p>这篇文章就会简单介绍一下如果用benchmark来写我们自己的benchmark程序。</p>
<h1 id="简单使用">简单使用</h1>
<p>其实在benchmark这个库的README就已经有比较详细的介绍了,这里还是以上面的例子来做benchmark。
首先我们把benchmark下载下来,然后用cmake进行编译。然后我们在c++里面写下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="cp">#include <benchmark/benchmark.h>
</span>
<span class="k">static</span> <span class="kt">void</span> <span class="nf">BM_uint2str</span><span class="p">(</span><span class="n">benchmark</span><span class="o">::</span><span class="n">State</span><span class="o">&</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
<span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">num</span> <span class="o">=</span> <span class="mi">1234</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">KeepRunning</span><span class="p">())</span>
<span class="p">(</span><span class="kt">void</span><span class="p">)</span> <span class="n">uint2str</span><span class="p">(</span><span class="n">num</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Register the function as a benchmark</span>
<span class="n">BENCHMARK</span><span class="p">(</span><span class="n">BM_uint2str</span><span class="p">);</span>
<span class="n">BENCHMARK_MAIN</span><span class="p">();</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>有了上面的程序,然后编译链接,就可以直接跑了。需要注意在链接的时候要把<code class="language-plaintext highlighter-rouge">-lpthread</code>也加上,否则可能会有runtime exception。
跑这个程序,会有下面的输出:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Run on (4 X 2504.66 MHz CPU s)
2015-04-18 19:55:26
Benchmark Time(ns) CPU(ns) Iterations
--------------------------------------------
BM_uint2str 428 425 1617472
</code></pre></div></div>
<p>怎么样,很直观吧?</p>
<p>有一个小地方需要注意的是,benchmark需要跑在一个循环里面,因为一般来说函数的时间会有一定的波动,
所以benchmark需要用一个state来表示是不是需要继续跑,一般来说,耗时短的函数会跑的多一些,
耗时长的函数会跑的少一些,总体来说每个benchmark都会跑差不多时间。</p>
<h1 id="使用不同的参数跑benchmark">使用不同的参数跑benchmark</h1>
<p>假设我们写了下面的函数:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="kt">void</span> <span class="nf">vuint2vstr</span><span class="p">(</span><span class="k">const</span> <span class="n">vector</span><span class="o"><</span><span class="kt">unsigned</span> <span class="kt">int</span><span class="o">>&</span> <span class="n">vint</span><span class="p">,</span> <span class="n">vector</span><span class="o"><</span><span class="n">string</span><span class="o">>&</span> <span class="n">vstr</span><span class="p">)</span> <span class="p">{</span>
<span class="n">vstr</span><span class="p">.</span><span class="n">clear</span><span class="p">();</span>
<span class="k">for</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">vint</span><span class="p">.</span><span class="n">size</span><span class="p">();</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">vstr</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">uint2str</span><span class="p">(</span><span class="n">vint</span><span class="p">[</span><span class="n">i</span><span class="p">]));</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>我们可以用类似之前提到的方法来写benchmark,但是如果我想从不同的vector大小来测试上面的函数的性能呢?
直接用Range函数就可以了:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="k">static</span> <span class="kt">void</span> <span class="nf">BM_vuint2vstr</span><span class="p">(</span><span class="n">benchmark</span><span class="o">::</span><span class="n">State</span><span class="o">&</span> <span class="n">state</span><span class="p">)</span> <span class="p">{</span>
<span class="n">vector</span><span class="o"><</span><span class="kt">unsigned</span> <span class="kt">int</span><span class="o">></span> <span class="n">vuint</span><span class="p">;</span>
<span class="k">for</span> <span class="p">(</span><span class="n">std</span><span class="o">::</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="n">state</span><span class="p">.</span><span class="n">range_x</span><span class="p">();</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="n">vuint</span><span class="p">.</span><span class="n">push_back</span><span class="p">(</span><span class="n">i</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">vector</span><span class="o"><</span><span class="n">string</span><span class="o">></span> <span class="n">vstr</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="n">state</span><span class="p">.</span><span class="n">KeepRunning</span><span class="p">())</span>
<span class="n">vuint2vstr</span><span class="p">(</span><span class="n">vuint</span><span class="p">,</span> <span class="n">vstr</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// Register the function as a benchmark</span>
<span class="n">BENCHMARK</span><span class="p">(</span><span class="n">BM_vuint2vstr</span><span class="p">)</span><span class="o">-></span><span class="n">Range</span><span class="p">(</span><span class="mi">8</span><span class="p">,</span> <span class="mi">8</span> <span class="o"><<</span> <span class="mi">10</span><span class="p">);</span>
<span class="n">BENCHMARK_MAIN</span><span class="p">();</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>对!就是直接在<code class="language-plaintext highlighter-rouge">BENCHMARK</code>宏后面加上Range就可以了!第一个参数是起始值,第二个参数是终止值。
而在benchmark里面通过<code class="language-plaintext highlighter-rouge">state.range_x()</code>来获取实际的值。</p>
<p>用法非常简单,极大的简化了程序员的工作啊。</p>
<h1 id="一个小tips">一个小Tips</h1>
<p>其实上面的例子,都可以在benchmark的README里面找到,而且还有更多的例子,比如说模版支持,线程支持等。
不过在实际的使用中,我自己是发现了一个使用上的tips。</p>
<p>在benchmark里面,如果每个迭代会有一些额外的setup,我们可能会需要在循环里面做。
但是一般来说我们想要在benchmark时间统计里面把这部分去掉。
而在benchmark里面,刚好有两个函数可以做这个事情:<code class="language-plaintext highlighter-rouge">PauseTiming()</code>和<code class="language-plaintext highlighter-rouge">ResumeTiming()</code>。
咋一看好像不错,有builtin支持。
不过如果你真的在循环里面用了的话,那么在输出结果里面你可能会看到意外的结果——时间额外多了很多。</p>
<p>如果翻看benchmark的代码的话,你会发现在这两个函数的注释里写着这两个函数非常heavy weight,
最好不要在benchmark的循环里面用。
这是因为这两个函数里面有加锁和读<code class="language-plaintext highlighter-rouge">/proc</code>文件系统的操作,相对与纯CPU的操作,overhead还是有不少的。
所以在循环里面最好还是不要使用这两个函数。</p>
C++中的Empty Base Optimization
2014-08-08T00:00:00+00:00
http://airekans.github.io/cpp/2014/08/08/cpp-empty-base-optimization
<h1 id="什么是empty-base-optimization">什么是Empty Base Optimization?</h1>
<p>说到C++中的Empty Base Optimization(简称ebo)可能大家还是比较陌生,但是C++中每天都在用的<code class="language-plaintext highlighter-rouge">std::string</code>中就用到了ebo。</p>
<p>那么到底什么是ebo呢?
其实ebo就是当一个类的对象理想内存占用可以为0的时候,把这个类的对象作为另一个类的成员时,把其内存占用变为0的一种优化方法。
说起来可能有点绕,还是用一个例子来说明一下吧,看下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre><span class="cp">#include <iostream>
</span><span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">Base</span>
<span class="p">{};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(Base) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Base</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">Base</span> <span class="n">obj1</span><span class="p">;</span>
<span class="n">Base</span> <span class="n">obj2</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"addr obj1 "</span> <span class="o"><<</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="n">obj1</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"addr obj2 "</span> <span class="o"><<</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="n">obj2</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>大家能猜到上面的代码的输出吗?<code class="language-plaintext highlighter-rouge">sizeof(Base)</code>会是0吗?<code class="language-plaintext highlighter-rouge">obj1</code>的地址会和<code class="language-plaintext highlighter-rouge">obj2</code>的一样吗?</p>
<p>自己编译上面的代码,运行一下,会得到类似下面的输出(第2、3行会略有不同):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sizeof(Base) 1
addr obj1 0xbfdc9033
addr obj2 0xbfdc9032
</code></pre></div></div>
<p>看见了吧?就算<code class="language-plaintext highlighter-rouge">Base</code>不包含任何的成员,编译器也会让<code class="language-plaintext highlighter-rouge">Base</code>占1 byte。
这是因为如果一个类的内存占用为0,那么连续的分配对象有可能会有同一个内存地址,这个是不合理的。
所以编译器为了避免这种情况,让空的类也会占有1 byte的大小。</p>
<p>那么如果我要用<code class="language-plaintext highlighter-rouge">Base</code>作为另一个类的成员变量呢,比如下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">class</span> <span class="nc">TestCls</span>
<span class="p">{</span>
<span class="n">Base</span> <span class="n">m_obj</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">m_num</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(TestCls) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">TestCls</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>知道上面的输出会是多少吗?5?
在32位的机器上面是8,因为编译器为了存取的方便,会在<code class="language-plaintext highlighter-rouge">m_obj</code>的后面产生3 byte的padding,以和机器字对齐。
总之答案不会是4。</p>
<p>但是在内存非常紧张的情况下,还真的会想要让<code class="language-plaintext highlighter-rouge">TestCls</code>的size是4。有办法吗?
这里就可以用到今天介绍的<code class="language-plaintext highlighter-rouge">ebo</code>了,看下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">class</span> <span class="nc">TestCls</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">m_num</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(TestCls) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">TestCls</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>这次能猜到输出是多少吗?没错,就是我们想要的4!
当我们把空的类作为基类的时候,编译器就会把这个基类的size去掉,做了优化,
从而使得整个对象占有真正需要的size。</p>
<p>那么如果这个子类除了基类之外,没有别的成员呢?如下面:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">class</span> <span class="nc">TestCls</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span>
<span class="p">{};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(TestCls) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">TestCls</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>上面的代码输出仍然是1,因为如果这个类本身除了空基类之外没别的成员,
说明这个类本身也是一个空类,所以最开始说的情况就适用于这里。
编译器就给空类给了1的size。</p>
<p>上面说的就是Empty Base Optimization了。那么现实中哪里使用到了这个技巧呢?
除了最开始提到的<code class="language-plaintext highlighter-rouge">std::string</code>之外,Google的<a href="https://code.google.com/p/cpp-btree/">cpp-btree</a>也用到了这个技巧。
下面我们来看看这两个现实中的例子。</p>
<h1 id="stl中的string">STL中的string</h1>
<p>C++每天都用的string中就用到了ebo。我们来看看string是如何定义成员的(省略函数定义,以下代码源自gcc 4.1.2 c++):</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">_CharT</span><span class="p">,</span> <span class="k">typename</span> <span class="n">_Traits</span><span class="p">,</span> <span class="k">typename</span> <span class="n">_Alloc</span><span class="o">></span>
<span class="k">class</span> <span class="nc">basic_string</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="k">mutable</span> <span class="n">_Alloc_hider</span> <span class="n">_M_dataplus</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>注意<code class="language-plaintext highlighter-rouge">string</code>实际上是模板类<code class="language-plaintext highlighter-rouge">basic_string</code>的一个特化类。而<code class="language-plaintext highlighter-rouge">basic_string</code>只包含了一个成员<code class="language-plaintext highlighter-rouge">_M_dataplus</code>,
其类型为<code class="language-plaintext highlighter-rouge">_Alloc_hider</code>。</p>
<p>我们来看看<code class="language-plaintext highlighter-rouge">_Alloc_hider</code>是怎么定义:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">_CharT</span><span class="p">,</span> <span class="k">typename</span> <span class="n">_Traits</span><span class="p">,</span> <span class="k">typename</span> <span class="n">_Alloc</span><span class="o">></span>
<span class="k">class</span> <span class="nc">basic_string</span>
<span class="p">{</span>
<span class="nl">private:</span>
<span class="k">struct</span> <span class="n">_Alloc_hider</span> <span class="o">:</span> <span class="n">_Alloc</span> <span class="c1">// Use ebo</span>
<span class="p">{</span>
<span class="n">_CharT</span><span class="o">*</span> <span class="n">_M_p</span><span class="p">;</span> <span class="c1">// The actual data.</span>
<span class="p">};</span>
<span class="p">};</span></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">_Alloc_hider</code>继承于模板参数类<code class="language-plaintext highlighter-rouge">_Alloc</code>(并且还是私有继承),还有一个自己的成员<code class="language-plaintext highlighter-rouge">_M_p</code>。
<code class="language-plaintext highlighter-rouge">_M_p</code>是用来存放实际数据的,而<code class="language-plaintext highlighter-rouge">_Alloc</code>呢?熟悉STL的人可能还记得STL里面有一个allocator。
这个allocator一般的实现都是没有任何的数据成员,只有static函数的。
所以这个类是一个空类。
默认的string就是将这个allocator当作模板参数传递到<code class="language-plaintext highlighter-rouge">_Alloc</code>。
所以<code class="language-plaintext highlighter-rouge">_Alloc</code>大多数情况下都是空类,而string经常会在程序中用到,
还很经常会大量的使用,比如在容器中,这个时候就需要考虑内存占用了。
所以在这里就是用了ebo的优化。</p>
<p>可能会有人会问,<code class="language-plaintext highlighter-rouge">string</code>里面实际上只有<code class="language-plaintext highlighter-rouge">char*</code>,但是不是说<code class="language-plaintext highlighter-rouge">string</code>还记录了size,
还用到了<em>copy on write</em>技术的吗?那怎么只有一个<code class="language-plaintext highlighter-rouge">char*</code>呢?
这个和<code class="language-plaintext highlighter-rouge">string</code>的实现中的内存布局相关,其中Copy on write是g++的stl中实现的策略,
想要了解g++的string的内存布局,可以看看<a href="http://blog.csdn.net/solstice/article/details/7364406">陈硕的这篇文章</a>。</p>
<h1 id="cpp-btree中的ebo">cpp-btree中的ebo</h1>
<p><a href="https://code.google.com/p/cpp-btree/">cpp-btree</a>是Google出的一个基于B树的模板容器类库。如果有不熟悉B树的童鞋,可以移步<a href="https://www.cs.usfca.edu/~galles/visualization/BTree.html">这里</a>
看一看这个数据结构的动画演示。</p>
<p>B树是一种平衡树结构,一般常用于数据库的磁盘文件数据结构(不过一般会用其变体B+树)。而cpp-btree则是全内存的,和<code class="language-plaintext highlighter-rouge">std::map</code>类似的一种容器实现,其对于大量元素(>100w)的存取效率要高于<code class="language-plaintext highlighter-rouge">std::map</code>的红黑树实现,并且还节省内存。</p>
<p>关于cpp-btree的广告就卖到这里,我们看看他哪里使用了ebo。
在cpp-btree里面提供了<code class="language-plaintext highlighter-rouge">btree_set</code>和<code class="language-plaintext highlighter-rouge">btree_map</code>两个容器类,
而他们的公共实现都在<code class="language-plaintext highlighter-rouge">btree</code>这个类里面。
<code class="language-plaintext highlighter-rouge">btree</code>这个类实现了主要的B树的功能,而其成员定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">Params</span><span class="o">></span>
<span class="k">class</span> <span class="nc">btree</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Params</span><span class="o">::</span><span class="n">key_compare</span> <span class="p">{</span>
<span class="nl">private:</span>
<span class="k">typedef</span> <span class="k">typename</span> <span class="n">Params</span><span class="o">::</span><span class="n">allocator_type</span> <span class="n">allocator_type</span><span class="p">;</span>
<span class="k">typedef</span> <span class="k">typename</span> <span class="n">allocator_type</span><span class="o">::</span><span class="k">template</span> <span class="n">rebind</span><span class="o"><</span><span class="kt">char</span><span class="o">>::</span><span class="n">other</span>
<span class="n">internal_allocator_type</span><span class="p">;</span>
<span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">Base</span><span class="p">,</span> <span class="k">typename</span> <span class="n">Data</span><span class="o">></span>
<span class="k">struct</span> <span class="n">empty_base_handle</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span> <span class="p">{</span>
<span class="n">empty_base_handle</span><span class="p">(</span><span class="k">const</span> <span class="n">Base</span> <span class="o">&</span><span class="n">b</span><span class="p">,</span> <span class="k">const</span> <span class="n">Data</span> <span class="o">&</span><span class="n">d</span><span class="p">)</span>
<span class="o">:</span> <span class="n">Base</span><span class="p">(</span><span class="n">b</span><span class="p">),</span>
<span class="n">data</span><span class="p">(</span><span class="n">d</span><span class="p">)</span> <span class="p">{</span>
<span class="p">}</span>
<span class="n">Data</span> <span class="n">data</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">empty_base_handle</span><span class="o"><</span><span class="n">internal_allocator_type</span><span class="p">,</span> <span class="n">node_type</span><span class="o">*></span> <span class="n">root_</span><span class="p">;</span>
<span class="p">};</span></code></pre></figure>
<p>可以看见<code class="language-plaintext highlighter-rouge">btree</code>这个类里面只包含了<code class="language-plaintext highlighter-rouge">root_</code>这一个成员,其类型为<code class="language-plaintext highlighter-rouge">empty_base_handle</code>。
<code class="language-plaintext highlighter-rouge">empty_base_handle</code>是一个继承于Base的类,在这里,
<code class="language-plaintext highlighter-rouge">Base</code>特化成<code class="language-plaintext highlighter-rouge">internal_allocator_type</code>。
从名字可以看出<code class="language-plaintext highlighter-rouge">internal_allocator_type</code>是一个allocator,
而在默认的<code class="language-plaintext highlighter-rouge">btree_map</code>实现中,这个allocator就是<code class="language-plaintext highlighter-rouge">std::allocator</code>。
所以一般情况下,<code class="language-plaintext highlighter-rouge">Base</code>也是一个空类。</p>
<p>这里<code class="language-plaintext highlighter-rouge">btree</code>也利用了ebo节省了内存占用。</p>
<h1 id="一个例外">一个例外</h1>
<p>在编译器判断是否做ebo的时候,有这么一个例外,就是虽然继承于一个空类,
但是子类的第一个非static成员的类型也是这个空类或者是这个类的一个子类。
在这种情况下,编译器是不会做ebo的。</p>
<p>有点绕,我们看看下面的代码就明白了:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="cp">#include <iostream>
</span><span class="k">using</span> <span class="k">namespace</span> <span class="n">std</span><span class="p">;</span>
<span class="k">class</span> <span class="nc">Base</span>
<span class="p">{};</span>
<span class="k">class</span> <span class="nc">TestCls</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">Base</span> <span class="n">m_obj</span><span class="p">;</span> <span class="c1">// <<<<</span>
<span class="kt">int</span> <span class="n">m_num</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(Base) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Base</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"sizeof(TestCls) "</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">TestCls</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">TestCls</span> <span class="n">obj</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"addr obj "</span> <span class="o"><<</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="n">obj</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"addr obj.m_obj "</span> <span class="o"><<</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="p">(</span><span class="n">obj</span><span class="p">.</span><span class="n">m_obj</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"addr obj.m_num "</span> <span class="o"><<</span> <span class="p">(</span><span class="kt">void</span><span class="o">*</span><span class="p">)</span> <span class="o">&</span><span class="p">(</span><span class="n">obj</span><span class="p">.</span><span class="n">m_num</span><span class="p">)</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span></code></pre></figure>
<p>运行一下上面的代码,你会看到,<code class="language-plaintext highlighter-rouge">TestCls</code>的size是8,并且<code class="language-plaintext highlighter-rouge">obj</code>的地址和<code class="language-plaintext highlighter-rouge">obj.m_obj</code>的地址并不一样。
这说明了ebo并没有进行。</p>
用gperftools对C/C++程序进行profile
2014-07-04T00:00:00+00:00
http://airekans.github.io/cpp/2014/07/04/gperftools-profile
<h1 id="什么是perftools">什么是perftools</h1>
<p>在Linux的C/C++编程的世界里,性能调优一直是个让人头疼的事。最出名的<code class="language-plaintext highlighter-rouge">gprof</code>虽然大家都知道,
其用法比较单一(只支持程序从启动到结束的profile),而且对程序的运行时间会有比较大的影响,
所以其profile不一定准确。</p>
<p>而<code class="language-plaintext highlighter-rouge">valgrind</code>功能十分强大,但profile也一般针对整个程序的运行,很难只对程序运行中的某段时间进行profile。
而且也多少会影响程序的运行,且使用的难度也较大,所以我目前还没尝试。</p>
<p>除去上面的两个常见的工具,之前在公司的项目见过使用Google的<a href="https://code.google.com/p/gperftools/">gperftools</a>
进行profile的,当时就被他简单的使用方法吸引。而最近维护的服务器也有性能问题,需要做性能调优。
在尝试了多种原始的profile方式之后,我选择了<code class="language-plaintext highlighter-rouge">gperftools</code>。</p>
<h1 id="如何profile">如何profile</h1>
<p>在gperftools的文档中,就简单的说了下面的方式来进行profile:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc [...] -o myprogram -lprofiler
CPUPROFILE=/tmp/profile ./myprogram
</code></pre></div></div>
<p>是的,在编译和安装了<code class="language-plaintext highlighter-rouge">gperftools</code>之后,只需要上面的步骤就可以进行profile了,非常简单。
而profile的结果就保存在<code class="language-plaintext highlighter-rouge">/tmp/profile</code>。查看结果只需要用<code class="language-plaintext highlighter-rouge">gperftools</code>自带的一个<code class="language-plaintext highlighter-rouge">pprof</code>脚本来看就可以:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pprof --text ./myprogram /tmp/profile
14 2.1% 17.2% 58 8.7% std::_Rb_tree::find
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">pprof</code>的输出也很直观,不过也还不够好,从这个输出中还不好看出调用关系,包括caller和callee。
而pprof也可以输出图示,还可以输出callgrind兼容的格式,这样就可以用<code class="language-plaintext highlighter-rouge">kcachegrind</code>来看profile结果了。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ pprof --callgrind ./myprogram /tmp/profile > callgrind.res
</code></pre></div></div>
<p>然后利用<code class="language-plaintext highlighter-rouge">kcachegrind</code>打开这个callgrind.res文件就可以看到类似下面的画面(图片来自kcachegrind官网):</p>
<p><img src="http://kcachegrind.sourceforge.net/html/pics/KcgShot1.png" alt="kcachegrind demo" /></p>
<p>这样调优起来就非常直观了。而且这种方式的最大优点是非侵入式,也就是不需要改动一行代码就能够进行profile了。</p>
<h2 id="动态profile">动态profile</h2>
<p>上面说到的方式是通过环境变量来触发profile,而跨度也是整个程序的生命周期。
那如果是想要在程序运行的某段时间进行profile呢?如果我想在程序不结束的情况下就拿到profile的结果呢?</p>
<p>这种情况下就需要用到动态profile的方式了。要实现这种方式,就需要改动程序的代码了,不过也比较简单:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="cp">#include <gperftools/profiler.h>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">ProfilerStart</span><span class="p">(</span><span class="s">"/tmp/profile"</span><span class="p">);</span>
<span class="n">some_func_to_profile</span><span class="p">();</span>
<span class="n">ProfilerStop</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>没错,你只需要在你想要profile的函数的开头和结尾加上<code class="language-plaintext highlighter-rouge">ProfilerStart</code>和<code class="language-plaintext highlighter-rouge">ProfilerStop</code>调用就可以了。
在<code class="language-plaintext highlighter-rouge">ProfilerStop</code>结束之后,profile的结果就会保存在<code class="language-plaintext highlighter-rouge">/tmp/profile</code>里面了。
利用这种方式就可以在指定的时间点对程序进行profile了。</p>
<p>最后需要说的一点是,gperftools的profile过程采用的是采样的方式,<strong>而且对profile中的程序性能影响极小</strong>,
这对于在线或者离线profile都是一个极其重要的特点。</p>
<h1 id="对服务器进行profile">对服务器进行profile</h1>
<p>对于后端程序员,每天都要和后台服务器打交道。而服务器的特点是长时间运行而不停止,
在这种情况下要对程序进行profile就比较麻烦。</p>
<p>在这我提供一种方式,使得profile服务器可以很方便,也可以按需profile。</p>
<p>首先要注意的一点是,gperftools提供了两种链接方式——动态库和静态库。
其中动态库链接的方式可以用环境变量和改动代码两种方式进行profile,而静态库只能使用改代码的方式。
乍看起来好像是动态库库的方式比较方便,不过在陈硕的<a href="http://book.douban.com/subject/20471211/">《Linux多线程服务端编程》</a>
中就说过,对于服务器来说,静态编译的方式对于于动态链接有优势,并且部署上也比较方便。
而我自己也是使用的静态链接的方式来使用gperftools的,所以以下假定都是用静态编译。</p>
<p>对于服务器来说,一般的模式是事件循环,而我们也需要在某段时间之内进行profile。
一个很直观的思路是在接受到某种请求的时候开始profile,而接受到另一种请求之后就结束。
那我们就可以用类似下面的代码来实现:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre><span class="cp">#include <gperftools/profiler.h>
</span>
<span class="kt">void</span> <span class="nf">on_request</span><span class="p">(</span><span class="n">Request</span><span class="o">*</span> <span class="n">req</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">static</span> <span class="kt">bool</span> <span class="n">is_profile_started</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">req</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">START_PROFILE</span> <span class="o">&&</span> <span class="o">!</span><span class="n">is_profile_started</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ProfilerStart</span><span class="p">(</span><span class="s">"/tmp/profile"</span><span class="p">);</span>
<span class="n">is_profile_started</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">req</span><span class="o">-></span><span class="n">type</span> <span class="o">==</span> <span class="n">STOP_PROFILE</span> <span class="o">&&</span> <span class="n">is_profile_started</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ProfilerStop</span><span class="p">();</span>
<span class="n">is_profile_started</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="c1">// normal request processing here</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>利用来面的代码,我们可以在想要profile的时间段内分别向服务器发送特殊的请求,
这样就可以在不停止服务器的情况下,对服务器进行profile。</p>
<p>当然这种方式会产生安全问题,在有外网请求的服务器上是不能这么用的。
而且gperftools的文档上也说明了,在线上的服务器最好是不要开启profile,而对测试服务器用就好了。</p>
<h1 id="总结">总结</h1>
<p>gperftools对于Linux下的服务器profile进行了很大的简化。能够在不改代码或者改极少代码并且
不增加太多的依赖的情况下,对服务器进行在线profile。有了gperftools,Linux程序员的生活可以又轻松一些了!</p>
一次调试C++程序的艰苦历程
2014-03-11T00:00:00+00:00
http://airekans.github.io/cpp/2014/03/11/cpp-debug-01
<h1 id="项目背景">项目背景</h1>
<p>某天在用C++做一个feature的时候,发现一个对象的成员变量无论如何都写不对,而用gdb调试之,竟然发现print出来值又是对的……
为了最简化这个bug的背景,我在github上直接创建了一个<a href="https://github.com/airekans/cpp-debug-01">简化的repo</a>,大家可以看看。</p>
<p>简单来说,这个项目的结构大致如下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>cpp-debug-01/
├── App.h
├── Base1.h
├── Base.h
├── Child.cpp
├── Child.h
├── main.cpp
└── Makefile
</code></pre></div></div>
<p>编译完之后,运行编译出来的<code class="language-plaintext highlighter-rouge">./test_app</code>就会输出下面的crash信息:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>test_app: Child.cpp:11: SuperChild::SuperChild(unsigned int*): Assertion `data != __null' failed.
Aborted (core dumped)
</code></pre></div></div>
<p>OK,既然crash了,那我们来看看出问题的代码(Child.cpp):</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="cp">#include "Child.h"
#include "App.h"
#include <cstdio>
</span>
<span class="cp">#include <cassert>
</span>
<span class="n">SuperChild</span><span class="o">::</span><span class="n">SuperChild</span><span class="p">(</span><span class="kt">unsigned</span><span class="o">*</span> <span class="n">d</span><span class="p">)</span>
<span class="o">:</span> <span class="n">Child</span><span class="p">(</span><span class="mi">0</span><span class="p">,</span> <span class="n">d</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">assert</span><span class="p">(</span><span class="n">data</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p><code class="language-plaintext highlighter-rouge">data</code>是<code class="language-plaintext highlighter-rouge">SuperChild</code>的父类<code class="language-plaintext highlighter-rouge">Child</code>的一个<code class="language-plaintext highlighter-rouge">protected</code>成员,所以说明<code class="language-plaintext highlighter-rouge">data</code>没有初始化?</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="c1">// Child构造函数定义</span>
<span class="n">Child</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">s</span><span class="p">,</span> <span class="kt">unsigned</span><span class="o">*</span> <span class="n">d</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">seq</span> <span class="o">=</span> <span class="n">s</span><span class="p">;</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">d</span><span class="p">;</span> <span class="c1">// <<<<</span>
<span class="n">i_data</span> <span class="o">=</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>咦?代码中明明初始化了啊,怎么会是<code class="language-plaintext highlighter-rouge">NULL</code>呢?难道我穿进来的这个<code class="language-plaintext highlighter-rouge">d</code>指针的值不对?
看来要祭出<code class="language-plaintext highlighter-rouge">gdb</code>这个杀器才行。</p>
<h1 id="gdb调试">gdb调试</h1>
<p>我们<code class="language-plaintext highlighter-rouge">gdb</code>一下我们的程序,在<code class="language-plaintext highlighter-rouge">main</code>函数调用<code class="language-plaintext highlighter-rouge">SuperChild</code>那设一个断点好了。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>airekans@test-host:~/programming/test/cpp-debug-01$ gdb -q ./test_app
Reading symbols from ~/programming/test/cpp-debug-01/test_app...done.
(gdb) b main.cpp:8
Breakpoint 1 at 0x40052d: file main.cpp, line 8.
(gdb) run
Starting program: ~/programming/test/cpp-debug-01/test_app
Breakpoint 1, main () at main.cpp:8
8 SuperChild c1(&i);
(gdb)
</code></pre></div></div>
<p>太好了,我们看看现在穿进去的这个指针的值:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p i
$1 = 345
(gdb) p &i
$2 = (unsigned int *) 0x7fffffffdc6c
</code></pre></div></div>
<p>嗯,一切正常的样子。我们继续进去<code class="language-plaintext highlighter-rouge">SuperChild</code>的构造函数看看:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) s
SuperChild::SuperChild (this=0x7fffffffdc40, d=0x7fffffffdc6c) at Child.cpp:9
9 : Child(0, d, 1)
(gdb) p d
$3 = (unsigned int *) 0x7fffffffdc6c
</code></pre></div></div>
<p>我打印了一下穿进来的指针<code class="language-plaintext highlighter-rouge">d</code>,值是对的。好的,那我们进去父类<code class="language-plaintext highlighter-rouge">Child</code>的构造函数继续看:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) s
Child::Child (this=0x7fffffffdc40, s=0, d=0x7fffffffdc6c, i=1) at Child.h:10
10 {
(gdb) n
11 seq = s;
(gdb) n
12 data = d;
(gdb) n
13 i_data = i;
(gdb) p d
$4 = (unsigned int *) 0x7fffffffdc6c
(gdb) p data
$5 = (unsigned int *) 0x7fffffffdc6c
(gdb) p this->data
$6 = (unsigned int *) 0x7fffffffdc6c
</code></pre></div></div>
<p>嗯,构造函数已经把这个指针的值赋给了成员<code class="language-plaintext highlighter-rouge">data</code>。而我也确认了指针和成员的值都是对的。
嗯,看起来程序都很正常,难道刚才的crash只是个美丽的误会?哈哈哈哈,好吧,试一下继续好了:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) c
Continuing.
test_app: Child.cpp:11: SuperChild::SuperChild(unsigned int*): Assertion `data != __null' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff7a51425 in raise () from /lib/x86_64-linux-gnu/libc.so.6
</code></pre></div></div>
<p>啊?<code class="language-plaintext highlighter-rouge">assert</code>还是失败了?先看看data的值……</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#4 0x00000000004005e8 in SuperChild::SuperChild (this=0x7fffffffdc40,
d=0x7fffffffdc6c) at Child.cpp:11
11 assert(data != NULL);
(gdb) p data
$7 = (unsigned int *) 0x0
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">data</code>竟然是0!!!天啊,难道我遇见了<em>薛定鄂的bug</em>?!刚才明明还是正常的啊,为什么这里就变成0了呢……
是谁改变了我的<code class="language-plaintext highlighter-rouge">data</code>啊?</p>
<p>要看数据怎么变化,看来这次是要用数据断点了。</p>
<h2 id="数据断点">数据断点</h2>
<p>重启在gdb里面运行一次程序,这次我们到了<code class="language-plaintext highlighter-rouge">Child</code>的构造函数里面之后,对于<code class="language-plaintext highlighter-rouge">data</code>成员设定一个数据断点看看:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p &data
$14 = (unsigned int **) 0x7fffffffdc50
(gdb) watch *(unsigned*) 0x7fffffffdc50
Hardware watchpoint 4: *(unsigned*) 0x7fffffffdc50
</code></pre></div></div>
<p>可以看到<code class="language-plaintext highlighter-rouge">data</code>成员的地址是<code class="language-plaintext highlighter-rouge">0x7fffffffdc50</code>,然后我们看看在<code class="language-plaintext highlighter-rouge">Child</code>里面设置的时候会不会停下来:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) c
Continuing.
Hardware watchpoint 4: *(unsigned*) 0x7fffffffdc50
Old value = 0
New value = 4294958188
Child::Child (this=0x7fffffffdc40, s=0, d=0x7fffffffdc6c, i=1) at Child.h:13
13 i_data = i;
</code></pre></div></div>
<p>嗯,停下来的,说明的确是改变了值,我们查看一下,确认一下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) p d
$15 = (unsigned int *) 0x7fffffffdc6c
(gdb) p data
$16 = (unsigned int *) 0x7fffffffdc6c
</code></pre></div></div>
<p>好的,那从<code class="language-plaintext highlighter-rouge">Child</code>返回之后,我们再看看。是没有停下来的,说明数据没有被改变。这个时候我们来看看<code class="language-plaintext highlighter-rouge">data</code>的值:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>(gdb) n
SuperChild::SuperChild (this=0x7fffffffdc40, d=0x7fffffffdc6c) at Child.cpp:11
11 assert(data != NULL);
(gdb) p data
$17 = (unsigned int *) 0x0
</code></pre></div></div>
<p>!!!T_T 是见鬼了吗,明明连数据断点都没有触发啊,但是为什么值会改变了啊……</p>
<p>连数据断点都不管用了,这回我只能老老实实的乖乖看汇编代码了……</p>
<h2 id="反汇编">反汇编</h2>
<p>用gdb里面的<code class="language-plaintext highlighter-rouge">disassemble</code>可以用来查看当前栈帧的函数反汇编结果。
下面我们来看看<code class="language-plaintext highlighter-rouge">SuperChild</code>的构造函数的汇编(只看到assert调用):</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code"><pre><span class="o">(</span>gdb<span class="o">)</span> disassemble
Dump of assembler code <span class="k">for function </span>SuperChild::SuperChild<span class="o">(</span>unsigned int<span class="k">*</span><span class="o">)</span>:
0x0000000000400598 <+0>: push %rbp
0x0000000000400599 <+1>: mov %rsp,%rbp
0x000000000040059c <+4>: sub <span class="nv">$0x10</span>,%rsp
0x00000000004005a0 <+8>: mov %rdi,-0x8<span class="o">(</span>%rbp<span class="o">)</span>
0x00000000004005a4 <+12>: mov %rsi,-0x10<span class="o">(</span>%rbp<span class="o">)</span>
0x00000000004005a8 <+16>: mov <span class="nt">-0x8</span><span class="o">(</span>%rbp<span class="o">)</span>,%rax
0x00000000004005ac <+20>: mov <span class="nt">-0x10</span><span class="o">(</span>%rbp<span class="o">)</span>,%rdx
0x00000000004005b0 <+24>: mov <span class="nv">$0x1</span>,%ecx
0x00000000004005b5 <+29>: mov <span class="nv">$0x0</span>,%esi
0x00000000004005ba <+34>: mov %rax,%rdi
0x00000000004005bd <+37>: callq 0x400552 <Child::Child<span class="o">(</span>unsigned int, unsigned int<span class="k">*</span>, int<span class="o">)></span>
<span class="o">=></span> 0x00000000004005c2 <+42>: mov <span class="nt">-0x8</span><span class="o">(</span>%rbp<span class="o">)</span>,%rax
0x00000000004005c6 <+46>: mov 0x8<span class="o">(</span>%rax<span class="o">)</span>,%rax
0x00000000004005ca <+50>: <span class="nb">test</span> %rax,%rax
0x00000000004005cd <+53>: jne 0x4005e8 <SuperChild::SuperChild<span class="o">(</span>unsigned int<span class="k">*</span><span class="o">)</span>+80>
0x00000000004005cf <+55>: mov <span class="nv">$0x400720</span>,%ecx
0x00000000004005d4 <+60>: mov <span class="nv">$0xb</span>,%edx
0x00000000004005d9 <+65>: mov <span class="nv">$0x400700</span>,%esi
0x00000000004005de <+70>: mov <span class="nv">$0x40070a</span>,%edi
0x00000000004005e3 <+75>: callq 0x400400 <__assert_fail@plt>
0x00000000004005e8 <+80>: leaveq
0x00000000004005e9 <+81>: retq
</pre></td></tr></tbody></table></code></pre></figure>
<p>从代码中我们看到第11行是<code class="language-plaintext highlighter-rouge">callq</code>,也就是调用<code class="language-plaintext highlighter-rouge">Child</code>的构造函数。
而第15行则是一个<code class="language-plaintext highlighter-rouge">jne</code>指令,后面的地址是<code class="language-plaintext highlighter-rouge">SuperChild</code>构造函数加80,我们看到是构造函数的结束的地方。
这有点奇怪,<code class="language-plaintext highlighter-rouge">SuperChild</code>的构造函数只有一个<code class="language-plaintext highlighter-rouge">assert</code>,怎么会出现经常在<code class="language-plaintext highlighter-rouge">if</code>里面才有的<code class="language-plaintext highlighter-rouge">jne</code>指令呢?</p>
<p>其实assert的实现正是用的一个<code class="language-plaintext highlighter-rouge">if</code>。可以在<code class="language-plaintext highlighter-rouge">/usr/include/assert.h</code>里面看到通常情况下的<code class="language-plaintext highlighter-rouge">assert</code>是一个
宏(各个版本的libc的实现可能会有稍微的差别):</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="cp"># define assert(expr) \
((expr) \
? __ASSERT_VOID_CAST (0) \
: __assert_fail (__STRING(expr), __FILE__, __LINE__, __ASSERT_FUNCTION))</span></code></pre></figure>
<p>这解释了上面的<code class="language-plaintext highlighter-rouge">jne</code>指令。所以jne之前的应该是载入<code class="language-plaintext highlighter-rouge">data</code>成员的值的汇编语句。
所以关键的就是下面的这几句:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre>mov -0x8(%rbp),%rax
mov 0x8(%rax),%rax
test %rax,%rax
</pre></td></tr></tbody></table></code></pre></figure>
<p>这几句的意思对应着下面的C++语句:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>this->data
</code></pre></div></div>
<p>首先我们知道,<code class="language-plaintext highlighter-rouge">this</code>指针在C++里面实际上是相当于第一个参数传递进成员函数的,就算构造函数也不例外。
而<code class="language-plaintext highlighter-rouge">-0x8(%rbp)</code>存放在什么值呢?我们看到在上面的第6行,有这么一句:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mov %rdi,-0x8(%rbp)
</code></pre></div></div>
<p>哦,原来是从<code class="language-plaintext highlighter-rouge">rdi</code>寄存器赋值过来的。而<code class="language-plaintext highlighter-rouge">rdi</code>在x64的函数调用规则里面是用来在函数调用的时候,
存放第一个整形(或者指针)参数的。(这里多说一句,由于我机器是64位的,所以汇编跟32位的会有差别)
哦?那就正好是<code class="language-plaintext highlighter-rouge">this</code>指针也!太好了,那么就是说现在rax就已经放着<code class="language-plaintext highlighter-rouge">this</code>指针了。
接下来的一句<code class="language-plaintext highlighter-rouge">mov 0x8(%rax),%rax</code>,就是说从<code class="language-plaintext highlighter-rouge">this</code>的地方位移8的地方取出值。
嗯,这正好就是<code class="language-plaintext highlighter-rouge">data</code>的偏移值。
看起来没什么问题。好吧,取值的地方没问题,那我们看看赋值到<code class="language-plaintext highlighter-rouge">data</code>的地方吧,也就是<code class="language-plaintext highlighter-rouge">Child</code>的构造函数。</p>
<h3 id="child构造函数">Child构造函数</h3>
<p>我们看看Child的构造函数的汇编代码:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre> 0x0000000000400552 <+0>: push %rbp
0x0000000000400553 <+1>: mov %rsp,%rbp
0x0000000000400556 <+4>: sub $0x20,%rsp
0x000000000040055a <+8>: mov %rdi,-0x8(%rbp)
0x000000000040055e <+12>: mov %esi,-0xc(%rbp)
0x0000000000400561 <+15>: mov %rdx,-0x18(%rbp)
0x0000000000400565 <+19>: mov %ecx,-0x10(%rbp)
0x0000000000400568 <+22>: mov -0x8(%rbp),%rax
0x000000000040056c <+26>: mov %rax,%rdi
0x000000000040056f <+29>: callq 0x400548 <Base::Base()>
0x0000000000400574 <+34>: mov -0x8(%rbp),%rax
0x0000000000400578 <+38>: mov -0xc(%rbp),%edx
0x000000000040057b <+41>: mov %edx,0x8(%rax)
=> 0x000000000040057e <+44>: mov -0x8(%rbp),%rax
0x0000000000400582 <+48>: mov -0x18(%rbp),%rdx
0x0000000000400586 <+52>: mov %rdx,0x10(%rax)
0x000000000040058a <+56>: mov -0x8(%rbp),%rax
0x000000000040058e <+60>: mov -0x10(%rbp),%edx
0x0000000000400591 <+63>: mov %edx,0x18(%rax)
0x0000000000400594 <+66>: leaveq
0x0000000000400595 <+67>: retq
</pre></td></tr></tbody></table></code></pre></figure>
<p>嗯,根据上面的经验,第10行是调用父类的构造函数。
而接下来的6行,每3行对应着C++中的一个赋值语句。那么我们关注一下<code class="language-plaintext highlighter-rouge">data</code>成员的赋值:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas">mov -0x8(%rbp),%rax
mov -0x18(%rbp),%rdx
mov %rdx,0x10(%rax)</code></pre></figure>
<p>第一行是载入<code class="language-plaintext highlighter-rouge">this</code>指针到<code class="language-plaintext highlighter-rouge">rax</code>寄存器,第二行是将<code class="language-plaintext highlighter-rouge">d</code>的值载入到<code class="language-plaintext highlighter-rouge">rdx</code>。
所以第三行的就是将<code class="language-plaintext highlighter-rouge">d</code>的值赋给<code class="language-plaintext highlighter-rouge">data</code>成员。等等,咦?为什么<code class="language-plaintext highlighter-rouge">data</code>会是<code class="language-plaintext highlighter-rouge">0x10(%rax)</code>?
上面在<code class="language-plaintext highlighter-rouge">SuperChild</code>里面,明明<code class="language-plaintext highlighter-rouge">data</code>是<code class="language-plaintext highlighter-rouge">0x8(%rax)</code>啊!怎么会相差了8的位移呢?</p>
<p>为什么会位移不一样呢?是父类的size不对吗?怎么能够看见父类的size呢?</p>
<h1 id="模板静态断言">模板静态断言</h1>
<p>为了能够判断一个类的大小,一般来说就使用<code class="language-plaintext highlighter-rouge">sizeof</code>来看了。比如说像下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="k">sizeof</span><span class="p">(</span><span class="n">Base</span><span class="p">)</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span></code></pre></figure>
<p>但是这得在运行期才能看得见,而我希望在编译的时候就能够看见,有没有什么办法呢?
是有的,在C++中,能够利用一个模板技巧来达到静态断言的效果。
先来看看怎么做。我在<code class="language-plaintext highlighter-rouge">Child</code>的构造函数中利用静态断言来断言<code class="language-plaintext highlighter-rouge">Base</code>的大小为8,
因为在<code class="language-plaintext highlighter-rouge">Base.h</code>里面就是两个<code class="language-plaintext highlighter-rouge">unsigned</code>的大小嘛,而每个<code class="language-plaintext highlighter-rouge">unsigned</code>大小是4。
所以就有了下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="k">template</span><span class="o"><</span><span class="kt">unsigned</span> <span class="n">size</span><span class="o">></span>
<span class="k">class</span> <span class="nc">TestSize</span><span class="p">;</span>
<span class="k">template</span><span class="o"><></span>
<span class="k">class</span> <span class="nc">TestSize</span><span class="o"><</span><span class="mi">8</span><span class="o">></span> <span class="p">{};</span>
<span class="k">class</span> <span class="nc">Child</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Base</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">Child</span><span class="p">(</span><span class="kt">unsigned</span> <span class="n">s</span><span class="p">,</span> <span class="kt">unsigned</span><span class="o">*</span> <span class="n">d</span><span class="p">,</span> <span class="kt">int</span> <span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">TestSize</span><span class="o"><</span><span class="k">sizeof</span><span class="p">(</span><span class="n">Base</span><span class="p">)</span><span class="o">></span> <span class="n">test_size</span><span class="p">;</span>
<span class="c1">// ...</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的代码关键的就是模版<code class="language-plaintext highlighter-rouge">TestSize</code>。他利用了模板的偏特化特性,特化了一个只对8有效的特化类。
而其他的值是没有办法产生对象的,因为其他的值并没有具体的定义。
而在<code class="language-plaintext highlighter-rouge">Child</code>的构造函数里面,我们用<code class="language-plaintext highlighter-rouge">sizeof(Base)</code>来作为模板参数,所以只有当<code class="language-plaintext highlighter-rouge">sizeof(Base)</code>为8的时候,
编译才可以通过,也这就是静态断言的一种用法。
(这种用法在<a href="http://www.amazon.com/Modern-Design-Generic-Programming-Patterns/dp/0201704315">“Modern C++ Design”</a>有详细的介绍)</p>
<p>好吧,有了上面的断言,我们来编译一下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>g++ -g -c -o main.o main.cpp -I.
g++ -g -c -o Child.o Child.cpp -I.
In file included from Child.cpp:1:0:
Child.h: In constructor ‘Child::Child(unsigned int, unsigned int*, int)’:
Child.h:19:32: error: aggregate ‘TestSize<4u> test_size’ has incomplete type and cannot be defined
make: *** [Child.o] Error 1
</code></pre></div></div>
<p>哦!真的在Child.cpp编译出现了错误,可以看见对于<code class="language-plaintext highlighter-rouge">Child.cpp</code>来说,<code class="language-plaintext highlighter-rouge">Base</code>竟然大小是4,而不是8!</p>
<p>为什么会这样呢?明明<code class="language-plaintext highlighter-rouge">main.cpp</code>是没问题的,为什么<code class="language-plaintext highlighter-rouge">Child.cpp</code>却有问题呢?难道他们包含的<code class="language-plaintext highlighter-rouge">Base</code>不是同一个吗?</p>
<p>我们仔细看看<code class="language-plaintext highlighter-rouge">main.cpp</code>的头文件包含:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "App.h"
#include "Child.h"
</code></pre></div></div>
<p>然后我们再来看看<code class="language-plaintext highlighter-rouge">Child.cpp</code>的头文件包含:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "Child.h"
#include "App.h"
</code></pre></div></div>
<p>发现对于<code class="language-plaintext highlighter-rouge">App.h</code>和<code class="language-plaintext highlighter-rouge">Child.h</code>的包含顺序是反过来的。那么这两个头文件有什么玄机呢?
我们看看<code class="language-plaintext highlighter-rouge">App.h</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "Base.h"
#include "Child.h"
</code></pre></div></div>
<p>嗯,在头文件的部分,是先include的<code class="language-plaintext highlighter-rouge">Base.h</code>,然后include的<code class="language-plaintext highlighter-rouge">Child.h</code>。那Child.h呢?</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#include "Base1.h"
</code></pre></div></div>
<p>咦?<code class="language-plaintext highlighter-rouge">Child.h</code>竟然include了一个<code class="language-plaintext highlighter-rouge">Base1.h</code>?这是啥?跟<code class="language-plaintext highlighter-rouge">Base.h</code>有什么关系?
我们来看看<code class="language-plaintext highlighter-rouge">Base.h</code>和<code class="language-plaintext highlighter-rouge">Base1.h</code>的diff:</p>
<figure class="highlight"><pre><code class="language-diff" data-lang="diff"><span class="gd">--- Base.h 2014-03-17 23:19:05.980027339 +0800
</span><span class="gi">+++ Base1.h 2014-03-17 23:19:05.980027339 +0800
</span><span class="p">@@ -1,11 +1,10 @@</span>
#ifndef _BASE_H_
#define _BASE_H_
class Base
{
protected:
unsigned u_data;
<span class="gd">- unsigned m_data;
</span> };
#endif</code></pre></figure>
<p>好嘛,<code class="language-plaintext highlighter-rouge">Base1.h</code>除了少了一个<code class="language-plaintext highlighter-rouge">m_data</code>之外,竟然其他都全部一样的!!
这回真相大白了。</p>
<h1 id="bug分析">Bug分析</h1>
<p>为什么上面的<code class="language-plaintext highlighter-rouge">Base1.h</code>会导致bug呢?首先<code class="language-plaintext highlighter-rouge">Base1.h</code>和<code class="language-plaintext highlighter-rouge">Base.h</code>基本一样,连<a href="http://en.wikipedia.org/wiki/Include_guard">include guard</a>都一样。
然后我们看出问题的<code class="language-plaintext highlighter-rouge">Child.cpp</code>的include链是什么样子的(顺序按从左到右):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Child.cpp
| \
Child.h App.h
| | \
Base1.h Base.h Child.h
</code></pre></div></div>
<p>由于<code class="language-plaintext highlighter-rouge">Base1.h</code>和<code class="language-plaintext highlighter-rouge">Base.h</code>的include guard是一样的,所以由于<code class="language-plaintext highlighter-rouge">Base1.h</code>在<code class="language-plaintext highlighter-rouge">Base.h</code>之前include,
所以只会include <code class="language-plaintext highlighter-rouge">Base1.h</code>,而<code class="language-plaintext highlighter-rouge">Base.h</code>的内容会直接被忽略。
所以对于<code class="language-plaintext highlighter-rouge">Child.cpp</code>来说,整个<code class="language-plaintext highlighter-rouge">SuperChild</code>的继承体系是这样的(我把成员写在类名的后面):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SuperChild []
|
Child [seq, data, i_data]
|
Base (in Base1.h) [u_data]
</code></pre></div></div>
<p>好吧,那对于<code class="language-plaintext highlighter-rouge">main.cpp</code>来说呢?include链就会是下面的样子:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code> main.cpp
| \
App.h Child.h
| \ \
Base.h Child.h Base1.h
</code></pre></div></div>
<p>这里因为<code class="language-plaintext highlighter-rouge">Base.h</code>在<code class="language-plaintext highlighter-rouge">Base1.h</code>的前面,所以<code class="language-plaintext highlighter-rouge">Base1.h</code>就直接被忽略了。
所以从<code class="language-plaintext highlighter-rouge">main.cpp</code>的角度来看,<code class="language-plaintext highlighter-rouge">SuperChild</code>的继承体系就是这个样子:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SuperChild []
|
Child [seq, data, i_data]
|
Base (in Base.h) [u_data, m_data]
</code></pre></div></div>
<p>看见了吧?从两个不同的编译单元来看,这个<code class="language-plaintext highlighter-rouge">SuperChild</code>的大小根本就是不一样的!
最直接的原因就是<code class="language-plaintext highlighter-rouge">Base</code>被定义在两个不同的头文件,而且大小也不一样,
导致了在<code class="language-plaintext highlighter-rouge">Child</code>的构造函数中,看见的<code class="language-plaintext highlighter-rouge">data</code>成员的偏移值和在<code class="language-plaintext highlighter-rouge">SuperChild</code>中看见的是不一样的。
这就导致了我们说的这个bug。</p>
<p>首先从<code class="language-plaintext highlighter-rouge">main.cpp</code>看见的<code class="language-plaintext highlighter-rouge">Child</code>类,我们看看他的内存布局:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>起始地址 成员 类型
0 | u_data | unsigned
4 | m_data | unsigned
8 | seq | unsigned
16 | data | unsigned*
24 | i_data | int
</code></pre></div></div>
<p>而对于<code class="language-plaintext highlighter-rouge">Child.cpp</code>看见的<code class="language-plaintext highlighter-rouge">Child</code>类,内存布局是下面这样的:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>起始地址 成员 类型
0 | u_data | unsigned
4 | seq | unsigned
8 | data | unsigned*
16 | i_data | int
</code></pre></div></div>
<p>所以在<code class="language-plaintext highlighter-rouge">Child</code>中的构造函数中赋值给<code class="language-plaintext highlighter-rouge">data</code>是会赋值到16这个偏移中的,
而在<code class="language-plaintext highlighter-rouge">SuperChild</code>中取<code class="language-plaintext highlighter-rouge">data</code>,是会取到偏移8的,也就是对应到原来的<code class="language-plaintext highlighter-rouge">seq</code>的值。
而我们在<code class="language-plaintext highlighter-rouge">SuperChild</code>给传的<code class="language-plaintext highlighter-rouge">seq</code>的初始值正正就是0。
这终于解释了为什么在<code class="language-plaintext highlighter-rouge">SuperChild</code>的<code class="language-plaintext highlighter-rouge">data</code>的值总是0了。</p>
<p>这真是非常愚蠢的bug啊……但是一般愚蠢的bug,都需要极其变态的debug过程才能找的出来……</p>
<h2 id="如何fix这个bug">如何fix这个bug</h2>
<p>其实修复这个bug非常简单……只需要把<code class="language-plaintext highlighter-rouge">Child.h</code>中的<code class="language-plaintext highlighter-rouge">include "Base1.h"</code>改成<code class="language-plaintext highlighter-rouge">include "Base.h"</code>,
也就是无论如何都用<code class="language-plaintext highlighter-rouge">Base.h</code>就好了。</p>
<p>当然,这个bug的最主要的原因就是出现了<code class="language-plaintext highlighter-rouge">Base1.h</code>,这个很有可能是一个源文件的两个不同版本。
这里就需要在用SCM的时候,更新的时候就直接修改源代码,不要copy一份,这样非常容易出问题……</p>
<p>同时,注意到google的c++编程规范中,给出了一个<a href="http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Names_and_Order_of_Includes">关于头文件的规定</a>,
其中有一个<em>隐含单没有明说</em>的规定,就是包含头文件的时候尽可能的不要用相对路径,
而是直接从项目根目录一直写下来,比如<code class="language-plaintext highlighter-rouge">common/base.h</code>这样写,而不是<code class="language-plaintext highlighter-rouge">base.h</code>并加上<code class="language-plaintext highlighter-rouge">-I</code>的编译选项。
这是在工程实践中非常重要的点,因为不用<code class="language-plaintext highlighter-rouge">-I</code>而是写全路径,可以加快编译速度,
并且还可以最大程度的避免我遇到的这个bug。
为什么这么说呢?因为如果出问题而相同的两个文件同名但在不同的目录下的话,
如果在编译的时候利用<code class="language-plaintext highlighter-rouge">-I</code>选项是很可能在两个不同的编译单元包含了不同的头文件的。
而如果用全路径的话,是完全不会出现这个bug。可以看到google的很多开源项目都是遵循这个规定的,
比如protobuf。</p>
<h1 id="一个插曲">一个插曲</h1>
<p>不知道大家上面有没有注意到一个细节,就是下面的内存布局中,<code class="language-plaintext highlighter-rouge">data</code>的起始地址:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>起始地址 成员 类型
0 | u_data | unsigned
4 | m_data | unsigned
8 | seq | unsigned
16 | data | unsigned*
24 | i_data | int
</code></pre></div></div>
<p>为什么命名<code class="language-plaintext highlighter-rouge">seq</code>是一个<code class="language-plaintext highlighter-rouge">unsigned</code>类型,也就是大小是4的成员,但是<code class="language-plaintext highlighter-rouge">data</code>的对象的地址却比<code class="language-plaintext highlighter-rouge">seq</code>多了8。
多了的4个byte去了哪里呢?其实对于有经验的人来说,很快就会猜到,这应该是padding造成的。
但是padding不是一般是以4为单位的吗?这里刚好是4啊,不需要padding啊。</p>
<p>要注意到padding的单位是和CPU的word长度一致的,在32位系统上面,word的大小是4,所以padding也是4。
但是在64位系统上,word的大小是8,这表示什么呢?这表示一个指针的大小是8 byte,
并且他的地址必须是8的倍数。所以在上面的例子中,就产生了一个padding。
具体的内存如下图所示:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>0 4 8 12 16 24
| u_data | m_data | seq | padding | data | i_data
</code></pre></div></div>
<p>至此,终于完成的修复并分析完了这个bug。</p>
Scheme Interpreter In Scheme(3)
2014-02-01T00:00:00+00:00
http://airekans.github.io/scheme/2014/02/01/scheme-interpreter-in-scheme3
<p><a href="scheme/2012/11/26/scheme-in-scheme-2/">上一篇</a>我介绍了如何用Scheme实现atom的解析。
目前为止我们可以解析<code class="language-plaintext highlighter-rouge">Number</code>,<code class="language-plaintext highlighter-rouge">String</code>和<code class="language-plaintext highlighter-rouge">bool</code>类型的值。而在接下来的这篇文章里,
我会讲述如何实现变量的定义。</p>
<h1 id="变量定义">变量定义</h1>
<p>首先我们需要明确一下如何定义一个变量。在之前的文章里面,我已经提到过在Scheme里面的变量定义如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><span class="p">(</span><span class="k">define</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">b</span> <span class="s">"abc"</span><span class="p">)</span></code></pre></figure>
<p>上面的代码里面分别定义变量<code class="language-plaintext highlighter-rouge">a</code>和<code class="language-plaintext highlighter-rouge">b</code>为数字<code class="language-plaintext highlighter-rouge">1</code>和字符串<code class="language-plaintext highlighter-rouge">"abc"</code>。</p>
<p>如果从数据的角度来看,一个定义就是一个<code class="language-plaintext highlighter-rouge">list</code>,这个list包含3个元素:
其中第一个元素是symbol <code class="language-plaintext highlighter-rouge">define</code>;第二个元素也是symbol,不过是表示变量名字;
而第三个元素是这个变量的初始值,是一个atom。</p>
<p>而atom的解析,我们已经在前面的文章搞定了,但是symbol的解析我们还没有弄。
接下来我们先把symbol的解析搞定!</p>
<h2 id="解析symbol">解析Symbol</h2>
<p>在之前的讲解里面,我一直没有很明确的讲到底Symbol是什么。实际上在除了Scheme/Lisp语言之外,
很少有语言会有专门的类型处理Symbol(Ruby算是主流语言里有Symbol类型的一个了)。</p>
<p>在Scheme里面,Symbol就是一个”没有用引号的字符串“。实际上在<code class="language-plaintext highlighter-rouge">(define a 1)</code>里面,
<code class="language-plaintext highlighter-rouge">define</code>和<code class="language-plaintext highlighter-rouge">a</code>都是symbol,而他们是一个list里面的第一和第二个元素。</p>
<p>而一个Symbol的<strong>表示形式</strong>就是它本身,但是他的<strong>输入形式</strong>是这样的:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><span class="p">(</span><span class="k">quote</span> <span class="nv">a</span><span class="p">)</span> <span class="c1">; This is symbol a.</span>
<span class="nv">a</span> <span class="c1">; This is variable a reference.</span></code></pre></figure>
<p>也就是说上面的表达式表示<code class="language-plaintext highlighter-rouge">a</code>这个symbol。为什么我们不能用<code class="language-plaintext highlighter-rouge">a</code>直接表示symbol呢?
原因是Scheme会默认把一个symbol解析成变量引用,就像上面的第二行。</p>
<p>而对于symbol,要判断两个symbol是否相等,可以用<code class="language-plaintext highlighter-rouge">eq?</code>函数(没错,Scheme里面<code class="language-plaintext highlighter-rouge">?</code>是合法的变量字符)。
就像下面这样:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><span class="p">(</span><span class="k">define</span> <span class="nv">a</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">b</span><span class="p">))</span>
<span class="p">(</span><span class="nb">eq?</span> <span class="nv">a</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">b</span><span class="p">))</span> <span class="c1">; returns #t, which means true.</span></code></pre></figure>
<p>有了<code class="language-plaintext highlighter-rouge">eq?</code>,我们就可以判断一个symbol是不是我们想要的。</p>
<h2 id="define表达式的判断">define表达式的判断</h2>
<p>重温一下,一个变量定义最简单是下面的形式:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><span class="p">(</span><span class="k">define</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">b</span> <span class="s">"abc"</span><span class="p">)</span></code></pre></figure>
<p>那么我们可以用下面的方式来判断一个list是不是<code class="language-plaintext highlighter-rouge">define</code>表达式:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">cond</span> <span class="c1">; eval other expression types mentioned before</span>
<span class="p">((</span><span class="k">and</span> <span class="p">(</span><span class="nb">pair?</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">define</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">eval-definition</span> <span class="nv">exp</span><span class="p">))</span>
<span class="p">(</span><span class="k">else</span> <span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>对于<code class="language-plaintext highlighter-rouge">define</code>的判断,首先判断这个表达式是不是一个非空<code class="language-plaintext highlighter-rouge">list</code>,
然后判断它的第一个元素是不是symbol <code class="language-plaintext highlighter-rouge">define</code>。
当表达式满足上面的条件,就用<code class="language-plaintext highlighter-rouge">eval-definition</code>来解析整个<code class="language-plaintext highlighter-rouge">define</code>表达式。</p>
<p>既然<code class="language-plaintext highlighter-rouge">define</code>是用来定义变量的,那么定义变量需要做些什么呢?</p>
<h2 id="程序的运行environment">程序的运行environment</h2>
<p>在一个程序运行的时候,会有一个运行时环境伴随著它变化,我们可以称之为environment。
而这个environment里面,其实就是包含着所有的变量定义。
而如何表示environment,是每个解析器都需要解决的核心问题之一。</p>
<p>就当前来说,我们可以假设,所有的变量定义都是全局的。
那么我们可以用一个有两个元素的列表来表示environment,其中第一个元素是变量的symbol,
而第二个元素是变量的当前值。</p>
<p>所以我们可以用下面的代码来定义environment:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="nv">the-global-environment</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="k">quote</span> <span class="p">())</span> <span class="p">(</span><span class="k">quote</span> <span class="p">())))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">define-variable!</span> <span class="nv">var</span> <span class="nv">val</span> <span class="nv">env</span><span class="p">)</span>
<span class="p">(</span><span class="nb">set-car!</span> <span class="nv">env</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">var</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">env</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">set-cdr!</span> <span class="nv">env</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">val</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">env</span><span class="p">))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>我们用<code class="language-plaintext highlighter-rouge">the-global-environment</code>来表示全局的environment,而它的初始值是一个
包含了两个空列表的<code class="language-plaintext highlighter-rouge">cons cell</code>。
我们也定义了<code class="language-plaintext highlighter-rouge">define-variable!</code>来给解析器定义一个新的变量。
这里出现了两个新的函数<code class="language-plaintext highlighter-rouge">set-car!</code>和<code class="language-plaintext highlighter-rouge">set-cdr!</code>,分别用来设置一个<code class="language-plaintext highlighter-rouge">cons cell</code>的
第一个元素和第二个元素的值。</p>
<h2 id="define表达式的解析">define表达式的解析</h2>
<p>因为定义一个变量需要修改environment,所以我们在<code class="language-plaintext highlighter-rouge">eval-definition</code>里面肯定需要用到它。
下面我们看看怎么定义<code class="language-plaintext highlighter-rouge">eval-definition</code>。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">eval-definition</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="nf">define-variable!</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">exp</span><span class="p">))</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">exp</span><span class="p">)))</span>
<span class="nv">the-global-environment</span><span class="p">)</span>
<span class="p">(</span><span class="k">quote</span> <span class="nv">ok</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>在<code class="language-plaintext highlighter-rouge">eval-definition</code>里面,我们只是简单的调用了一下<code class="language-plaintext highlighter-rouge">define-variable!</code>,
并返回一个<code class="language-plaintext highlighter-rouge">ok</code>。而在调用的<code class="language-plaintext highlighter-rouge">define-variable!</code>的时候,
我们从表达式里面取出第二个元素作为变量名,取出第三个元素作为变量值,
并把<code class="language-plaintext highlighter-rouge">the-global-environment</code>传递进去作为environment。</p>
<p>而返回<code class="language-plaintext highlighter-rouge">ok</code>,其实只是表示这个定义的表达式成功了,并没有太多的意义。
返回什么都是可以的,因为定义变量这个表达式本身的值不应该被使用。</p>
<p>最后我们把所有的代码都串起来,看看是什么样子。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="nv">the-global-environment</span> <span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="k">quote</span> <span class="p">())</span> <span class="p">(</span><span class="k">quote</span> <span class="p">())))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">define-variable!</span> <span class="nv">var</span> <span class="nv">val</span> <span class="nv">env</span><span class="p">)</span>
<span class="p">(</span><span class="nb">set-car!</span> <span class="nv">env</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">var</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">env</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">set-cdr!</span> <span class="nv">env</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">val</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">env</span><span class="p">))))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">cond</span> <span class="p">((</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="nb">string?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="k">or</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">true</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">false</span><span class="p">)</span> <span class="nv">exp</span><span class="p">))</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="k">and</span> <span class="p">(</span><span class="nb">pair?</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">define</span><span class="p">)))</span>
<span class="p">(</span><span class="nf">eval-definition</span> <span class="nv">exp</span><span class="p">))</span>
<span class="p">(</span><span class="k">else</span> <span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">eval-definition</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="nf">define-variable!</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">exp</span><span class="p">))</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">cdr</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">exp</span><span class="p">)))</span>
<span class="nv">the-global-environment</span><span class="p">)</span>
<span class="p">(</span><span class="k">quote</span> <span class="nv">ok</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>Wow,看起来非常高大上啊!!我们现在试试用这个解析器解析一些变量定义看看:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"(define a 1)"</span><span class="p">)))</span> <span class="c1">; returns "ok"</span>
<span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"(define b \"abc\")"</span><span class="p">)))</span> <span class="c1">; returns "ok"</span></code></pre></figure>
<p>嗯,看起来不错,运行非常良好,可惜还不能引用这些定义了的变量。
接下来我会在第4篇里面讲述如何实现变量引用的解析。</p>
MapReduce简介
2014-01-25T00:00:00+00:00
http://airekans.github.io/cloud-computing/2014/01/25/mapreduce-intro
<h1 id="什么是mapreduce">什么是MapReduce?</h1>
<p>自从Google公开了他的<code class="language-plaintext highlighter-rouge">MapReduce</code>框架之后,<code class="language-plaintext highlighter-rouge">MapReduce</code>这个单词就一直频繁的出现。
但是到底什么是<code class="language-plaintext highlighter-rouge">MapReduce</code>呢?</p>
<p><code class="language-plaintext highlighter-rouge">MapReduce</code>严格来说是一种编程的范式,这种范式是从函数式编程里面的<code class="language-plaintext highlighter-rouge">map</code>和<code class="language-plaintext highlighter-rouge">reduce</code>函数演化来的。
而不同语言和不同公司都有对于<code class="language-plaintext highlighter-rouge">MapReduce</code>都有的不同实现,
比如<a href="http://research.google.com/archive/mapreduce.html">Google的MapReduce</a>、
<a href="http://hadoop.apache.org/">Apache的Hadoop</a>。
所以从这种角度来说,<code class="language-plaintext highlighter-rouge">MapReduce</code>也是一种框架。</p>
<h2 id="一个简单例子">一个简单例子</h2>
<p>先让我们来看看<code class="language-plaintext highlighter-rouge">MapReduce</code>是怎么用的。假设有10亿个url,而我们想统计出总共有多少个域名,
每个域名出现了多少次。下面我用Python的<code class="language-plaintext highlighter-rouge">map</code>和<code class="language-plaintext highlighter-rouge">reduce</code>写下计算的流程。
为了简单起见,我们建设url都不以<code class="language-plaintext highlighter-rouge">http://</code>开头,并且都是<code class="language-plaintext highlighter-rouge">weibo.com/airekans</code>这种格式。</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="n">urls</span> <span class="o">=</span> <span class="p">[</span><span class="n">url1</span><span class="p">,</span> <span class="n">url2</span><span class="p">,</span> <span class="o">...</span> <span class="p">]</span>
<span class="c1"># We get all domains here
</span><span class="n">domains</span> <span class="o">=</span> <span class="nb">map</span><span class="p">(</span><span class="k">lambda</span> <span class="n">u</span><span class="p">:</span> <span class="n">u</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'/'</span><span class="p">)[</span><span class="mi">0</span><span class="p">],</span> <span class="n">urls</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">get_domain_stat</span><span class="p">(</span><span class="n">stat</span><span class="p">,</span> <span class="n">domain</span><span class="p">):</span>
<span class="k">if</span> <span class="n">domain</span> <span class="ow">not</span> <span class="ow">in</span> <span class="n">stat</span><span class="p">:</span>
<span class="n">stat</span><span class="p">[</span><span class="n">domain</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">stat</span><span class="p">[</span><span class="n">domain</span><span class="p">]</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">return</span> <span class="n">stat</span>
<span class="c1"># We get the stat of domains here
</span><span class="n">domain_stat</span> <span class="o">=</span> <span class="nb">reduce</span><span class="p">(</span><span class="n">get_domain_stat</span><span class="p">,</span> <span class="n">domains</span><span class="p">,</span> <span class="p">{})</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>从上面的例子可以看到,通过<code class="language-plaintext highlighter-rouge">map</code>我们从url得到了所有的域名,
而通过<code class="language-plaintext highlighter-rouge">reduce</code>,我们得到了所有域名的统计。
而这里最主要的一点是,map是无状态的,而reduce的状态转变非常简单,
这也说明<code class="language-plaintext highlighter-rouge">map</code>和<code class="language-plaintext highlighter-rouge">reduce</code>要并行化非常简单(事实上reduce可以利用hash也做成无状态)。
我们可以根据需要,在<code class="language-plaintext highlighter-rouge">map</code>的实现里面开10个线程,或者是用分布式系统做成10个worker。
而<code class="language-plaintext highlighter-rouge">MapReduce</code>正是利用了这一点,把<code class="language-plaintext highlighter-rouge">map</code>和<code class="language-plaintext highlighter-rouge">reduce</code>做进了分布式系统。</p>
<h2 id="利用mapreduce重写">利用MapReduce重写</h2>
<p><code class="language-plaintext highlighter-rouge">MapReduce</code>实际上就是定义了两个接口:<code class="language-plaintext highlighter-rouge">Map</code>和<code class="language-plaintext highlighter-rouge">Reduce</code>。用户只需要提供Map函数用以转化输入得到中间结果,
和<code class="language-plaintext highlighter-rouge">Reduce</code>函数用从中间结果转化到结果。而当用户指定了输入之后,就可以很简单的通过参数指定<code class="language-plaintext highlighter-rouge">Map</code>和<code class="language-plaintext highlighter-rouge">Reduce</code>
的并行数量,而<code class="language-plaintext highlighter-rouge">MapReduce</code>则帮你搞定了分布式任务调度分发和提供高可靠性。</p>
<p>这里我用假想的一个Python <code class="language-plaintext highlighter-rouge">MapReduce</code>框架来说明一下如果写<code class="language-plaintext highlighter-rouge">Map</code>和<code class="language-plaintext highlighter-rouge">Reduce</code>(说不定之后我会真的写一个,这里先挖个坑)。
假设我们的输入的10亿个url都保存在<code class="language-plaintext highlighter-rouge">urls.txt</code>文件,而每一行包含一个url。下面是定义的<code class="language-plaintext highlighter-rouge">MyMap</code>和<code class="language-plaintext highlighter-rouge">MyReduce</code>函数。</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">MyMap</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">output</span><span class="p">):</span>
<span class="n">domain</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">Value</span><span class="p">()</span><span class="o">.</span><span class="n">split</span><span class="p">(</span><span class="s">'/'</span><span class="p">)</span>
<span class="n">output</span><span class="o">.</span><span class="n">OutputWithKey</span><span class="p">(</span><span class="n">domain</span><span class="p">,</span> <span class="s">''</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">MyReduce</span><span class="p">(</span><span class="nb">input</span><span class="p">,</span> <span class="n">output</span><span class="p">):</span>
<span class="n">domain_stat</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">domain</span> <span class="o">=</span> <span class="nb">input</span><span class="o">.</span><span class="n">Key</span><span class="p">()</span>
<span class="k">for</span> <span class="n">v</span> <span class="ow">in</span> <span class="nb">input</span><span class="o">.</span><span class="n">Value</span><span class="p">():</span>
<span class="n">domain_stat</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">output</span><span class="o">.</span><span class="n">Output</span><span class="p">(</span><span class="s">'</span><span class="si">%</span><span class="s">s </span><span class="si">%</span><span class="s">d'</span> <span class="o">%</span> <span class="p">(</span><span class="n">domain</span><span class="p">,</span> <span class="n">domain_stat</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>从上面可以看到,函数的输入都用<code class="language-plaintext highlighter-rouge">input</code>表示,输出都用<code class="language-plaintext highlighter-rouge">output</code>来表示。
其中<code class="language-plaintext highlighter-rouge">MyMap</code>里的<code class="language-plaintext highlighter-rouge">input.Value()</code>获取输入文件中的一行,<code class="language-plaintext highlighter-rouge">output.OutputWithKey</code>是以
第一个参数为key,第二个参数为value的输出。
而<code class="language-plaintext highlighter-rouge">MyReduce</code>的<code class="language-plaintext highlighter-rouge">input</code>是对应的,而输出则是用<code class="language-plaintext highlighter-rouge">output.Output</code>直接输出一行。</p>
<p>有了上面的代码,我们就可以用下面的命令启动这个<code class="language-plaintext highlighter-rouge">MapReduce</code>程序,
其中指定了<code class="language-plaintext highlighter-rouge">Map</code>的数量为100和<code class="language-plaintext highlighter-rouge">Reduce</code>的量为50。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ mapreduce --input=/path/to/urls.txt --mapper=MyMap --reducer=MyReduce
--mapper-num=100 --reducer-num=50 --output=/path/to/output.txt
</code></pre></div></div>
<h1 id="mapreduce需要解决什么问题">MapReduce需要解决什么问题?</h1>
<p>看了上面的例子,也许有人会问,这么简单的事情,貌似并不需要用<code class="language-plaintext highlighter-rouge">MapReduce</code>?
其实如果尝试过处理大树据量,比如上G甚至上T的数据的时候,
这个时候单机的处理速度就会非常慢,甚至是以天为单位的。
但是如果利用<code class="language-plaintext highlighter-rouge">MapReduce</code>进行并行化,则整个处理数度就会降低非常多,
降低到小时级甚至是分钟级别的。</p>
<p>所以<code class="language-plaintext highlighter-rouge">MapReduce</code>主要是用来进行一些大树据量的处理,而且处理过程能够用<code class="language-plaintext highlighter-rouge">MapReduce</code>范式
进行较为简单的描述的过程,比如说搜索中的网页索引处理、或者是一些存储数据的统计等。</p>
<p>既然<code class="language-plaintext highlighter-rouge">MapReduce</code>为我们提供了一个这么易用的分布式框架,那么它自身又面临一些什么样的挑战呢?
简单来说有下面几种问题(在Google的<code class="language-plaintext highlighter-rouge">MapReduce</code>论文里面也有描述,这里只是在我自己的理解上再阐述一遍):</p>
<ol>
<li>整体架构:如何分布式的处理<code class="language-plaintext highlighter-rouge">Map</code>和<code class="language-plaintext highlighter-rouge">Reduce</code>?如何分发任务?对于这个问题,常见的实现是利用经典的
一主多从结构,也就是一个Master负责任务的调度和分发,还有一些状态的维护也放在Master上。
这样设计的优点是状态的维护很简单,一个Master的状态可以省去多主的一些状态不一致。</li>
<li>数据如何流动:从最简单的模型来看,应该是数据先从本地到<code class="language-plaintext highlighter-rouge">Mapper</code>,然后再到<code class="language-plaintext highlighter-rouge">Reducer</code>。
中间的数据是如何流动比较有效呢?还是说有更有效的方式?比如用NFS,或者是类似的方案,
比如说Google的GFS或者是Hadoop的HDFS?</li>
<li>高可靠性:可靠性是每一个分布式系统都需要考虑的问题,其中在这种主从结构的系统里面,
可靠性就包括两方面:Master的可靠性和Slave的可靠性。在Google的<code class="language-plaintext highlighter-rouge">MapReduce</code>实现中,
Master可靠性是考集群管理系统的自动拉起及Checkpoint机制来实现的。
而Slave可靠性是也主要是靠checkpoint来做的,Master会检查Slave的健康情况,
调整任务的调度。而Google的<code class="language-plaintext highlighter-rouge">MapReduce</code>对于不同级别的Master/Slave失败都定义了对应的处理措施。</li>
<li>任务调度:既然<code class="language-plaintext highlighter-rouge">MapReduce</code>的Slave是要进行<code class="language-plaintext highlighter-rouge">Map</code>和<code class="language-plaintext highlighter-rouge">Reduce</code>的操作,而这些任务都是由Master分发的,
那么Master如何调度任务则又是一个很重要的问题。在任务调度中,最重要的几个点包括负载均衡、
输入局部性(locality)和Slave失败的处理。其中locality是最重要的一点,locality说的是,
在分派任务时,着重考虑下发的任务的输入是否和任务本身处在同一台机器上。因为如果是一台机器,
则任务的处理速度相比不同机器的环境,延时要低很多。这个概念和我们写本地程度是一样的。</li>
<li>Straggler处理:在任务处理中,在最后的阶段,往往有几个任务,在slave上面跑,但是耗时却很长,
从而延长了整个<code class="language-plaintext highlighter-rouge">MapReduce</code>的执行时长。在Google的paper中,称这几个任务为straggler。
而对于这种任务的处理,可以通过下发straggler到多个worker中进行执行,先执行完的则标识整个
<code class="language-plaintext highlighter-rouge">MapReduce</code>执行完。这是因为straggler执行慢,往往是因为执行任务的slave,网络、磁盘、内存等
出了问题。通过这种多slave执行,可以避免这个问题。</li>
<li>Value的排序:注意到在上面的url例子里面,顺序对于我们来说貌似不是太重要,但是如果我们
就是想做一个分发的排序呢?貌似用<code class="language-plaintext highlighter-rouge">MapReduce</code>的模型解决不了啊?在这里,Google的论文中
则给出了答案,就是默认对同一个<code class="language-plaintext highlighter-rouge">Reducer</code>的输入进行排序。这样当我们相对结果进行某种排序的时候,
会方便非常多。而在Hadoop中,这个排序的过程叫做_Shuffle_。Shuffle意味着从不同的<code class="language-plaintext highlighter-rouge">Mapper</code>
拿到对应<code class="language-plaintext highlighter-rouge">Reducer</code>的结果,同时进行排序的过程。在后续的文章中,我会对Shuffle的过程进行讲述。</li>
<li>坏记录的处理:在代码没有写好的情况下,在<code class="language-plaintext highlighter-rouge">Mapper</code>或<code class="language-plaintext highlighter-rouge">Reducer</code>遇到特定的输入时会crash。
但是因为这些记录而导致整个<code class="language-plaintext highlighter-rouge">MapReduce</code>没有办法跑下去通常是不合理,也就是说忽略这些坏记录
是一种更好的做法。</li>
<li>状态的实时监控:因为<code class="language-plaintext highlighter-rouge">MapReduce</code>执行时间通常是数十分钟或者是几小时,这个时候如果能够通过
某些接口查询整个<code class="language-plaintext highlighter-rouge">MapReduce</code>的状态是非常方便的。通常提供一个Http服务器或者类似的Web API
提供给用户查询,就可以达到目的了。</li>
</ol>
<p>上面的几个问题,都是作为一个高可靠的<code class="language-plaintext highlighter-rouge">MapReduce</code>系统需要面临和解决的。在开源的Hadoop里面,
我们能够看到对应的解决方案。而在大数据处理方面,除了<code class="language-plaintext highlighter-rouge">MapReduce</code>解决的计算问题之外,
还有数据如何存储的问题,这也就是Google剩下的两大法宝<code class="language-plaintext highlighter-rouge">GFS</code>和<code class="language-plaintext highlighter-rouge">Bigtable</code>所要解决的问题。
除了这些之外,整个集群如何管理,机器资源如何分配也是需要解决的,这方面Google有<code class="language-plaintext highlighter-rouge">Borg</code>(未开源),
Hadoop里面有<code class="language-plaintext highlighter-rouge">Yarn</code>,而Twitter也有<code class="language-plaintext highlighter-rouge">Mesos</code>。在后面我还会这几块进行一些深入的讲解。</p>
<p>最后,在这里用Google的Paper里面给出的<code class="language-plaintext highlighter-rouge">MapReduce</code>架构图让大家了解一下整个<code class="language-plaintext highlighter-rouge">MapReduce</code>的宏观结构。
(图片本身引用CSDN)。</p>
<p><img src="http://img.my.csdn.net/uploads/201204/26/1335443612_8438.jpg" alt="MapReduce Architecture" /></p>
OpenStack Tempest整体剖析
2013-08-12T00:00:00+00:00
http://airekans.github.io/cloud-computing/2013/08/12/openstack-tempest
<h1 id="tempest是什么项目">Tempest是什么项目</h1>
<p>Tempest是一个OpenStack的测试集,主要是用来对OpenStack的API做smoke test以及压力测试,也包含了对CLI client的测试和场景测试。</p>
<p>Tempest使用nose来驱动,其测试的主要风格是按照pyunit来写的,同时使用了testtools和testresources等几个测试工具库。</p>
<h1 id="如何使用tempest">如何使用Tempest</h1>
<p>要使用tempest来测试一个搭建好的OpenStack环境,首先要有一个设置了各个OpenStack参数的配置文件供tempest使用,在etc/文件夹下有个tempest.conf.sample供参考使用。
有了配置文件之后,就可以直接通过nosetests tempest命令来跑所有的测试了,也可以通过指定tempest包里某个测试类来单独跑某一个测试。</p>
<h2 id="tempest的结构">Tempest的结构</h2>
<p>Tempest的文件结构主要是下面这样:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>tempest
├── api # API的测试集
├── cli # OpenStack的命令行工具测试集
├── common # 一些公共的工具类和函数
├── scenario # 对OpenStack的常用场景进行测试,包括基本的启动VM,挂载volumn和网络配置等
├── services # tempest自己实现的OpenStack API Client,自己实现是为了不让一些bug隐藏在官方实现的Client里面。
├── stress # 压力测试集,利用multiprocessing来启动多个进程来同时对OpenStack发起请求。
├── thirdparty # EC2兼容的API测试集
├── whitebox # 白盒测试集,主要是对DB操作,然后发起请求,然后比对结果
</code></pre></div></div>
<p>其中tempest是一个顶层目录,下面各个目录包含的文件主要是上面说的功能。</p>
<p><code class="language-plaintext highlighter-rouge">tempest.api</code>、<code class="language-plaintext highlighter-rouge">tempest.scenario</code>、<code class="language-plaintext highlighter-rouge">tempest.thirdparty</code>和<code class="language-plaintext highlighter-rouge">tempest.whitebox</code>里面的测试类都是基于<code class="language-plaintext highlighter-rouge">tempest.test.BaseTestCase</code>。
<code class="language-plaintext highlighter-rouge">BaseTestCase</code>声明了<code class="language-plaintext highlighter-rouge">config</code>属性,也就是读取配置文件类,还声明了<code class="language-plaintext highlighter-rouge">setUpClass</code>方法,在类初始化的时候调用。
<code class="language-plaintext highlighter-rouge">BaseTestCase</code>的子类<code class="language-plaintext highlighter-rouge">tempest.test.TestCase</code>就声明了很多工具函数,供它的子类调用。包括<code class="language-plaintext highlighter-rouge">setUpClass</code>(初始化OpenStack的各个服务的Client并设置成类的属性),资源管理函数(<code class="language-plaintext highlighter-rouge">get/set/remove_resource</code>)和<code class="language-plaintext highlighter-rouge">status_timeout</code>(等待资源到达某个期望的状态)。</p>
<p>有了上面的工具,测试就可以比较方便的编写。</p>
<p>下面介绍一下tempest里面主要的几个package。</p>
<h3 id="tempestapi">tempest.api</h3>
<p>这个package包含了OpenStack几乎所有native API的测试。每个一个服务都自己有一个独立的包,比如<code class="language-plaintext highlighter-rouge">tempest.api.compute</code>。
下面以<code class="language-plaintext highlighter-rouge">tempest.api.compute</code>作为例子。</p>
<p>每一个测试,都有两个实现,一个是测试JSON格式,一个是测试XML格式的。这个是通过类的<code class="language-plaintext highlighter-rouge">_interface</code>属性类设置。而在基类<code class="language-plaintext highlighter-rouge">BaseComputeTest</code>里面,会利用这个属性构造不同的API实现。不过目前XML格式的测试基本上都是空的实现,所以主要的测试都是在JSON格式上。</p>
<p>以<code class="language-plaintext highlighter-rouge">tempest.api.compute.flavors.test_flavors</code>为例,<code class="language-plaintext highlighter-rouge">FlavorTestJson</code>继承了<code class="language-plaintext highlighter-rouge">BaseComputeTest</code>,所以在类初始化的时候,就会把tempest自己实现的API client赋值给类的属性。然后在具体的测试函数里面,<code class="language-plaintext highlighter-rouge">FlavorTestJson</code>就利用这个client的函数来对OpenStack进行查询,并且验证查询的结果。
如下面的函数:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="o">@</span><span class="n">attr</span><span class="p">(</span><span class="nb">type</span><span class="o">=</span><span class="s">'smoke'</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">test_list_flavors_with_detail</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="c1"># Detailed list of all flavors should contain the expected flavor
</span> <span class="n">resp</span><span class="p">,</span> <span class="n">flavors</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">list_flavors_with_detail</span><span class="p">()</span>
<span class="n">resp</span><span class="p">,</span> <span class="n">flavor</span> <span class="o">=</span> <span class="bp">self</span><span class="o">.</span><span class="n">client</span><span class="o">.</span><span class="n">get_flavor_details</span><span class="p">(</span><span class="bp">self</span><span class="o">.</span><span class="n">flavor_ref</span><span class="p">)</span>
<span class="bp">self</span><span class="o">.</span><span class="n">assertTrue</span><span class="p">(</span><span class="n">flavor</span> <span class="ow">in</span> <span class="n">flavors</span><span class="p">)</span></code></pre></figure>
<p>上面就是利用flavorclient来获取所有的flavor列表,然后再取具体的某个flavor,然后验证这个flavor的确是在所有的flavor里面的。
注意到这个函数是用<code class="language-plaintext highlighter-rouge">attr</code>修饰器进行修饰的,这个修饰器是<code class="language-plaintext highlighter-rouge">tempest.test.attr</code>,它利用nose和testtools里面类似的功能,给不同的test打上tag,这样在跑测试的时候可以通过tag来进行筛选,如跑gate测试,或者跑smoke测试。</p>
<p>而上面用的Client是tempest自己实现的RESTful API client,他们实现在<code class="language-plaintext highlighter-rouge">tempest.services</code>里面,是利用<code class="language-plaintext highlighter-rouge">httplib2</code>来实现的简单RESTful client。</p>
<h3 id="tempestscenario">tempest.scenario</h3>
<p><code class="language-plaintext highlighter-rouge">scenario</code>包含了几个简单的OpenStack完整的使用场景,来对OpenStack进行集成测试。也是初学者对于整个OpenStack的使用进行初步了解的一个入口。</p>
<p>每个场景测试类都继承于<code class="language-plaintext highlighter-rouge">tempest.scenario.manager.OfficialClientTest</code>,而<code class="language-plaintext highlighter-rouge">OfficialClientTest</code>本身又继承于<code class="language-plaintext highlighter-rouge">tempest.test.TestCase</code>。<code class="language-plaintext highlighter-rouge">OfficialClientTest</code>的特殊之处在于他的所有API Client都是官方的client而不是tempest自己实现的client。而且它声明了<code class="language-plaintext highlighter-rouge">tearDownClass</code>,在类销毁的时候会将所有已经申请的资源都删除掉,以达到每个测试集都是独立的效果。
而每个测试集都会在申请资源之后利用<code class="language-plaintext highlighter-rouge">TestCase</code>的接口向类里面注册资源,这样<code class="language-plaintext highlighter-rouge">OfficialClientTest</code>就可以自动的将注册过的资源释放了。</p>
<p>一个典型的场景测试是测试创建VM,然后挂载volumn,然后ssh上VM取看看是否挂载成功:</p>
<figure class="highlight"><pre><code class="language-python" data-lang="python"><span class="k">def</span> <span class="nf">test_minimum_basic_scenario</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">glance_image_create</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_keypair_add</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_boot</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_list</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_show</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cinder_create</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cinder_list</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cinder_show</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_volume_attach</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">cinder_show</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_reboot</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_floating_ip_create</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_floating_ip_add</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_security_group_rule_create</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">ssh_to_server</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">check_partitions</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">nova_volume_detach</span><span class="p">()</span></code></pre></figure>
<h3 id="tempeststress">tempest.stress</h3>
<p>stress实际上是一个简单的针对OpenStack的压力测试框架。这个包里面声明了驱动函数,<code class="language-plaintext highlighter-rouge">fixture</code>类,一set的sample配置函数。
压力测试通过这个命令来跑:<code class="language-plaintext highlighter-rouge">run_stress.py etc/sample-test.json -d 30</code></p>
<p>而在主要的驱动函数是<code class="language-plaintext highlighter-rouge">tempest.stress.driver.stress_openstack</code>。
在<code class="language-plaintext highlighter-rouge">stress_openstack</code>会从配置文件里面读组需要跑的操作,以及需要跑的进程数。然后函数就会起对应数目的进程,然后在进程里面调用对用的<code class="language-plaintext highlighter-rouge">execute</code>函数。</p>
<p>最后函数会在指定时间之后统计函数的结果,然后呈现给用户。
而具体的动作,就在<code class="language-plaintext highlighter-rouge">tempest.stress.actions</code>里面定义,里面定义了不同的测试。测试都继承了<code class="language-plaintext highlighter-rouge">tempest.stress.stressaction.StressAction</code>,这个类声明了<code class="language-plaintext highlighter-rouge">execute</code>函数。所以需要新增一个压力测试的Action,就声明一个新的类继承于<code class="language-plaintext highlighter-rouge">StressAction</code>并重写run函数。</p>
<p>除了上面说明的几个包,tempest的另外几个包比如cli和whitebox都相对简单,这里就不做详述了。</p>
<p>总的来说tempest是一个包含了各种测试的包,结构不是太复杂,适合了解OpenStack的各种组件。</p>
Installing Devstack On Ubuntu
2013-07-25T00:00:00+00:00
http://airekans.github.io/cloud-computing/2013/07/25/devstack-installation
<h1 id="what-is-devstack">What Is Devstack?</h1>
<p><a href="http://www.openstack.org/">Devstack</a>是开源云计算IaaS项目<a href="http://www.openstack.org/">OpenStack</a>的一个开发者版本。
Devstack能够在单一的PC上面部署,使得开发者在开发的过程中能够方便的部署和测试。
这对于开发者或者是想入门的人来说都非常方便。本文将介绍如何在Ubuntu x86-64 server
上面安装Devstack。其实这样的文章网上也有一些,不过在我安装的过程中遇到一些问题,
在网上也有遇到过,而且还没有看到有人把解决方法详细讲解,所以在这里记录一下。</p>
<h1 id="安装环境">安装环境</h1>
<p>我的环境在Ubuntu desktop 12.04上面装了VirtualBox,然后建了一个虚拟机。
整个Devstack是在虚拟机里面安装的。虚拟机的配置如下:</p>
<ul>
<li>64位CPU</li>
<li>2G内存</li>
<li>40G硬盘</li>
<li>Ubuntu 12.04.2 x64 Server</li>
<li>网络是使用的Bridge Adapter</li>
</ul>
<p>系统安装什么的,全部按默认的来,硬盘分区也是用的最原始的原始分区,没有用LVM。安装Devstack的时候用户是一个有sudo权限的非root用户,同时网速应该保持比较好的水平。
装好之后。接下来就是Devstack了。</p>
<h1 id="安装devstack">安装Devstack</h1>
<p>按照Devstack的<a href="http://devstack.org/guides/single-vm.html">官方文档</a>,其实就是下面的几个命令:</p>
<figure class="highlight"><pre><code class="language-bash" data-lang="bash">apt-get <span class="nb">install</span> <span class="nt">-qqy</span> git
git clone https://github.com/openstack-dev/devstack.git
<span class="nb">cd </span>devstack
./stack.sh</code></pre></figure>
<p>如上,在开始运行了<code class="language-plaintext highlighter-rouge">stack.sh</code>之后,会提示输入几个相关部件的密码。
这里我都输入同一个,假设是<code class="language-plaintext highlighter-rouge">123456</code>。</p>
<p>如果网络不错的话,在安装完一些默认的包之后,会走到keystone的设置。
这个时候,<code class="language-plaintext highlighter-rouge">stack.sh</code>抱了下面的错误:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>++ keystone service-create --name keystone --type identity --description 'Keystone Identity Service'
Unable to communicate with identity service: {"error": {"message": "An unexpected error prevented the server from fulfilling your request. (OperationalError) (1045, \"Access denied for user 'root'@'localhost' (using password: YES)\") None None", "code": 500, "title": "Internal Server Error"}}. (HTTP 500)
</code></pre></div></div>
<p>从错误信息里面大致能够看出,是一个权限错误。而<code class="language-plaintext highlighter-rouge">keystone</code>本身是一个OpenStack里面的身份验证服务,后台使用数据库作为数据的存储。
在我的环境里是用了MySQL作为DB Backend的。
所以尝试了一下用<code class="language-plaintext highlighter-rouge">root</code>用户登录MySQL,的确是没有办法登录,错误也和上面的错误是一样的。看来是我密码没有初始化好?总之<code class="language-plaintext highlighter-rouge">keystone</code>估计是没有办法登录数据库,从而造成了错误了。
所以我只有充值MySQL的root密码了。
从<a href="http://dev.mysql.com/doc/refman/5.0/en/resetting-permissions.html">这里</a>查到了重置密码的方法,所以按照里面的方法重置了root密码,密码也是前面设置的<code class="language-plaintext highlighter-rouge">123456</code>。</p>
<p>重置完之后,重新跑一下<code class="language-plaintext highlighter-rouge">stack.sh</code>,等个10来分钟左右,就会看到下面的输出:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Horizon is now available at http://192.168.1.108/
Keystone is serving at http://192.168.1.108:5000/v2.0/
Examples on using novaclient command line is in exercise.sh
The default users are: admin and demo
The password: 123456
This is your host ip: 192.168.1.108
stack.sh completed in 760 seconds.
</code></pre></div></div>
<p>如果你看到了这一行,说明devstack已经完全配置好,已经启动起来了。现在你用浏览器打开<code class="language-plaintext highlighter-rouge">http://192.168.1.108</code>就能够看到OpenStack的管理界面了。现在可以开始折腾了!</p>
<h2 id="重新启动openstack">重新启动OpenStack</h2>
<p>当你已经正常启动过一次Devstack之后,下次想启动Devstack就可以不用在下载需要的软件包了。
只要在<code class="language-plaintext highlighter-rouge">localrc</code>里面加入下面的语句就可以:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>OFFLINE=True
</code></pre></div></div>
<p>然后你再跑<code class="language-plaintext highlighter-rouge">stack.sh</code>,就可以完全在无网络的环境下启动Devstack了</p>
<p>最后,希望大家能玩得开心。</p>
Python OOP
2013-05-13T00:00:00+00:00
http://airekans.github.io/python/2013/05/13/python-oop
<iframe src="http://slid.es/airekans/python-oop/embed" width="100%" height="500" scrolling="no" frameborder="0" webkitallowfullscreen="1" mozallowfullscreen="1" allowfullscreen="1"> </iframe>
Git Clone From Github Failed
2013-02-25T00:00:00+00:00
http://airekans.github.io/git/2013/02/25/git-clone-from-github-failed
<h1 id="缘由">缘由</h1>
<p>这几天在电脑上从公司网络尝试从github pull代码下来的时候,遇到了下面的错误。我是用的https的连接。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ git pull origin
error: SSL certificate problem, verify that the CA cert is OK. Details:
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed while accessing https://github.com/airekans/Reshaper.git/info/refs
fatal: HTTP request failed
</code></pre></div></div>
<p>这导致我没有办法同步github上面的代码。所以上网找了一下原因和解决方案,在这里记录一下。</p>
<h1 id="原因">原因</h1>
<p>因为我用的是HTTPS协议,而HTTPS有一个证书是需要验证的。HTTPS协议要求server给我们发回一个证书,
而客户端负责验证这个证书是否有效。这个验证可以防止假冒网站的问题。而这个验证过程一般是在浏览器里面完成的,
而浏览器对于证书的验证一般是通过第三方受信网站来进行证书验证的。</p>
<p>而对于git,他使用的是<code class="language-plaintext highlighter-rouge">curl</code>,一个Linux下面的命令行浏览器。而在<code class="language-plaintext highlighter-rouge">curl</code>在访问https的网站的时候,
进行证书验证的是在<code class="language-plaintext highlighter-rouge">/usr/ssl/certs</code>里面的证书(有一些命令行浏览器是在<code class="language-plaintext highlighter-rouge">/etc/ssl/certs</code>里面存证书的)。
所以为什么我们用浏览器比如chrome或者firefox,可以正常的访问github,而用git在命令行访问就出问题,
原因就是浏览器使用的网上的证书验证,而git使用的是本地的证书验证。而我的本地并没有github的证书,
所以不能验证,从而导致了上面的错误。</p>
<h1 id="解决方案">解决方案</h1>
<p>根据<a href="http://stackoverflow.com/a/4454754">SO</a>上面的答案,我暂时是用了下面的命令行来进行解决的:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ env GIT_SSL_NO_VERIFY=true git clone https://github...
</code></pre></div></div>
<p>上面命令的前提是,你能保证你访问到的网站没有问题(其实就是人工的进行验证罢了)。
不过这不是最好的解决办法,因为这样很容易被假冒的网站(比如DNS污染)骗过。</p>
<p>最好的办法是能在命令行的情况下也能和浏览器一样利用第三方可信授权方,
根据这个<a href="http://stackoverflow.com/a/13325898">SO答案</a>,如果你是用ubuntu或者debian的话,
就安装<code class="language-plaintext highlighter-rouge">ca-certificates</code>这个包。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ sudo apt-get install ca-certificates
</code></pre></div></div>
Scheme Interpreter In Scheme(2)
2012-11-26T00:00:00+00:00
http://airekans.github.io/scheme/2012/11/26/scheme-in-scheme-2
<p>在之前的<a href="scheme/2012/11/18/scheme-in-scheme-1/">介绍</a>里面,
我讲了我们想要实现的Scheme语言的定义,并且用这个定义好的语言写了一些例子程序。
那么在这篇文章里面,我会讲讲大概的解析器是什么样子的。</p>
<h1 id="前提lexer">前提:Lexer</h1>
<p>在编译原理里面,介绍编译器的时候,一般都会介绍前端的一个重要的组成部分是Lexer的模块。
Lexer是词法分析器,也就是讲输入的字符流转换成语法定义的Token流。
一般的实现都是用状态机来实现,而在我们的解析器里面,为了简化实现的难度,我们利用Scheme
内置的<code class="language-plaintext highlighter-rouge">read</code>函数,它相当与Scheme的Lexer。它每次都从input-stream输入一个S表达式。</p>
<p>举个例子,看下面的代码:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"(define a 1)"</span><span class="p">))</span> <span class="c1">; read from stdin</span>
<span class="c1">;;; 上面的表达式返回(define a 1),</span>
<span class="c1">;;; 这个表达式也可以用下面的表达式来获得</span>
<span class="p">(</span><span class="nb">cons</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">define</span><span class="p">)</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">a</span><span class="p">)</span> <span class="mi">1</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的代码也能看出一个Lisp的重要特性——代码即数据。在Lisp里面,
Lisp代码可以很容易的看成是Lisp里面的数据,基本不用什么特别的处理。
这个特性让Lisp语言的拓展性相比起其他语言来有很大的优势。</p>
<p>接下来我们的解析器,都用<code class="language-plaintext highlighter-rouge">read</code>来进行输入的转换。基于<code class="language-plaintext highlighter-rouge">read</code>,
我们就能假设输入进来的Lisp代码,可以用相对于atom或者list的操作来进行处理,
而不用用字符操作来进行处理。</p>
<p>上面的说明是什么意思?用下面的代码来说明一下应该最好:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="nv">l</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"(define a 1)"</span><span class="p">)))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">define</span><span class="p">)</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">l</span><span class="p">))</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"It's definition!"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"It's not definition!"</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">number?</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">car</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">l</span><span class="p">))))</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"It's number!"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"It's not number!"</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的代码里面,我将用<code class="language-plaintext highlighter-rouge">read</code>读进来的表达式用<code class="language-plaintext highlighter-rouge">car</code>取出第一个symbol,
然后用<code class="language-plaintext highlighter-rouge">eq?</code>来进行比对。<code class="language-plaintext highlighter-rouge">eq?</code>是一个用来判断两个symbol是否一样的函数。
而<code class="language-plaintext highlighter-rouge">number?</code>就是一个用来判断参数是不是Number类型的函数。
除了上面两个函数之外,还有<code class="language-plaintext highlighter-rouge">string?</code>函数,
它可以用来判断参数是不是String类型的。</p>
<p>看了上面的代码,估计你心中已经大概有了一点概念了吧?</p>
<h1 id="解析器的基本结构">解析器的基本结构</h1>
<p>有了前面的说明,接下来我们就要想想怎么写解析器才可以实现之前说的语言了。</p>
<p>既然我们是写解析器,解析器实际上就是一个evaluate表达式的过程,
我就把这个解析器的函数命名为eval。</p>
<p>假设现在我们只需要解析最基本的atom,比如<code class="language-plaintext highlighter-rouge">1</code>, <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">define</code>的话,
那么<code class="language-plaintext highlighter-rouge">eval</code>要怎么写呢?首先在scheme里面,有一个函数是<code class="language-plaintext highlighter-rouge">pair?</code>,
是用来判断一个表达式是不是list的。</p>
<p>比如:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">pair?</span> <span class="mi">1</span><span class="p">)</span> <span class="c1">; false</span>
<span class="p">(</span><span class="nb">pair?</span> <span class="p">(</span><span class="nb">cons</span> <span class="mi">1</span> <span class="mi">2</span><span class="p">))</span> <span class="c1">; true</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>有了<code class="language-plaintext highlighter-rouge">pair?</code>之后,我们就可以很方便判断一个S表达式是不是atom了。
下面是一个只解析atom的解析器:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nb">pair?</span> <span class="nv">exp</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span>
<span class="nv">exp</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">)))</span>
<span class="p">(</span><span class="nb">eval</span> <span class="mi">1</span><span class="p">)</span> <span class="c1">; returns 1</span>
<span class="p">(</span><span class="nb">eval</span> <span class="mi">10</span><span class="p">)</span> <span class="c1">; returns 10</span>
<span class="p">(</span><span class="nb">eval</span> <span class="s">"hello"</span><span class="p">)</span> <span class="c1">; display "Unknown type"</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>看到上面的代码中,实际上<code class="language-plaintext highlighter-rouge">eval</code>的定义可以简化成只用一个<code class="language-plaintext highlighter-rouge">number?</code>判断,
因为<code class="language-plaintext highlighter-rouge">number?</code>就是一个类型检查。如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span>
<span class="nv">exp</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">)))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>如果现在加入对字符类型的atom进行解析的话,要怎么写呢?还记得之前我们有<code class="language-plaintext highlighter-rouge">string?</code>
来对参数进行String的类型判断么?对,我们就用<code class="language-plaintext highlighter-rouge">string?</code>就可以了,如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span>
<span class="nv">exp</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">string?</span> <span class="nv">exp</span><span class="p">)</span>
<span class="nv">exp</span>
<span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))))</span>
<span class="p">(</span><span class="nb">eval</span> <span class="mi">11</span><span class="p">)</span> <span class="c1">; returns 11</span>
<span class="p">(</span><span class="nb">eval</span> <span class="s">"hello"</span><span class="p">)</span> <span class="c1">; returns "hello"</span>
<span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">a</span><span class="p">))</span> <span class="c1">; display "Unknown type"</span>
</pre></td></tr></tbody></table></code></pre></figure>
<h1 id="cond表达式">cond表达式</h1>
<p>在上面的eval里面,我们用了两个if,而if越多,嵌套就越多,
那么想想我们如果要处理的表达式类型越多,那么我们嵌套不就……
在C里面,可以用<code class="language-plaintext highlighter-rouge">switch</code>或者连续的<code class="language-plaintext highlighter-rouge">if</code>来避免深层的嵌套,比如:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">i</span><span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">i</span><span class="o">--</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">i</span> <span class="o">+=</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其实在Scheme里面,有一个<code class="language-plaintext highlighter-rouge">cond</code>表达式,它的作用和上面C里面的<code class="language-plaintext highlighter-rouge">if</code>类似。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">cond</span> <span class="p">((</span><span class="nb">=</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span> <span class="nv">a</span><span class="p">)</span>
<span class="p">((</span><span class="nb">></span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">+</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">))</span>
<span class="p">(</span><span class="k">else</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的表达式应该不难看懂吧?我们用C来表示一次,你应该就是明白了:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="k">if</span> <span class="p">(</span><span class="n">a</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span> <span class="nf">if</span> <span class="p">(</span><span class="n">a</span> <span class="o">></span> <span class="mi">1</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>有了<code class="language-plaintext highlighter-rouge">cond</code>表达式,那么我们用<code class="language-plaintext highlighter-rouge">cond</code>来“重构”一下我们的解析器吧。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">cond</span> <span class="p">((</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="nb">string?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">else</span> <span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>现在我们的解析器已经可以处理Number和String了,那么还有什么是atom呢?
还有Boolean我们没有处理。那我们现在来分析一下怎么处理Boolean吧。首先需要注意的是,
在mit-scheme里面,boolean的值是<code class="language-plaintext highlighter-rouge">#t</code>和<code class="language-plaintext highlighter-rouge">#f</code>,而在我们要实现的解析器里面,
boolean的值是<code class="language-plaintext highlighter-rouge">true</code>和<code class="language-plaintext highlighter-rouge">false</code>。这里的关系,和用C来实现Scheme是类似的,
实现语言C里面的boolean是1和0,而被实现的语言里面的boolean是<code class="language-plaintext highlighter-rouge">true</code>和<code class="language-plaintext highlighter-rouge">false</code>。
而这里<code class="language-plaintext highlighter-rouge">true</code>和<code class="language-plaintext highlighter-rouge">false</code>从解析器的角度来说是symbol类型的,所以我们可以用<code class="language-plaintext highlighter-rouge">eq?</code>
来进行判断。</p>
<p>有了上面的说明之后,那么我们现在来加入对boolean的解析吧。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nb">eval</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">cond</span> <span class="p">((</span><span class="nb">number?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="nb">string?</span> <span class="nv">exp</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">((</span><span class="k">or</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">true</span><span class="p">)</span> <span class="nv">exp</span><span class="p">)</span> <span class="p">(</span><span class="nb">eq?</span> <span class="p">(</span><span class="k">quote</span> <span class="nv">false</span><span class="p">)</span> <span class="nv">exp</span><span class="p">))</span> <span class="nv">exp</span><span class="p">)</span>
<span class="p">(</span><span class="k">else</span> <span class="p">(</span><span class="nb">display</span> <span class="s">"Unknown type"</span><span class="p">))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的<code class="language-plaintext highlighter-rouge">or</code>和C里面的<code class="language-plaintext highlighter-rouge">||</code>或者Python里面的<code class="language-plaintext highlighter-rouge">or</code>是一样的作用的。
OK,有了上面解析器,那么现在我们玩一玩吧!</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"1"</span><span class="p">)))</span> <span class="c1">; 等同于(eval 1)</span>
<span class="p">(</span><span class="nb">eval</span> <span class="s">"hello"</span><span class="p">)</span> <span class="c1">; returns "hello"</span>
<span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"true"</span><span class="p">)))</span> <span class="c1">; returns true</span>
<span class="p">(</span><span class="nb">eval</span> <span class="p">(</span><span class="nb">read</span> <span class="p">(</span><span class="nf">open-input-string</span> <span class="s">"false"</span><span class="p">)))</span> <span class="c1">; return false</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>好了,现在我们已经有了一个能用的解析器了,虽然它现在只能解析atom,但是在接下来的几节中,
我们还会继续的丰富这个解析器,它就能慢慢地解析更多的东西啦。</p>
Scheme Interpreter In Scheme(1)
2012-11-18T00:00:00+00:00
http://airekans.github.io/scheme/2012/11/18/scheme-in-scheme-1
<p>在这个系列里面,我会用scheme语言来实现一个scheme语言的解析器。
我们会在实现中学习到很多程序语言相关的概念和相关的实现,
这对于我们理解我们常用的语言也有很大的帮助。</p>
<h1 id="scheme-a-little-bit-history">Scheme: A little bit history</h1>
<p>Scheme语言是lisp语言的其中一个变种。Lisp语言可以说是计算机历史上第二长寿的语言了,
第一是Fortran。Lisp语言早期主要是应用在人工智能方面,
70年代至80年代由于人工智能的大繁荣,Lisp得到了很大的发展,但是后来由于人工智能的冬天,
Lisp的应用也随之进入了冬天。而就在这段冬天里,Scheme就在MIT诞生了。</p>
<p>Scheme作为Lisp最大的两个变种之一(另外一个是Common Lisp),在最近得到了很多的关注,
因为最近Scheme的其中一个JVM方言<a href="http://clojure.org">Clojure</a>在业界得到了比较多的
应用。Scheme在诞生之初就有很多的创新,而其中最大的特征的就是Scheme是一门以minimalist
为设计思想的语言,也就是说Scheme的核心非常的小,但是里面却包含了许多强大的语言思想。</p>
<p>简单来说,Scheme包含了以下的特性:</p>
<ol>
<li>鼓励函数式编程。与传统的Imperative Programming不同,
函数式编程鼓励无副作用的编程方式,整个计算的过程可以用数学函数来描述,
从而达到简介表达高级程序逻辑的目的。(关于FP我也还在学习中)</li>
<li>使用Lexical scoping。由于使用了Lexical scoping,所以实现闭包是非常简单的一件事。</li>
<li>函数的尾递归(Tail recursion)优化。在函数式编程里面,
循环是比较不鼓励的一种编程style,
取而代之的是递归调用。而递归调用在平常的语言里面的开销比循环要大,但是有了尾递归之后,
循环和递归某种程度上是等价的。</li>
<li>函数是first class object。这个在目前的很多语言中也都已经实现了。</li>
</ol>
<p>除了上面的特性之外,Scheme还有延续(continuation)等其他的高级特性,在这里就不多说了。
如果感兴趣的话,可以移步<a href="http://en.wikipedia.org/wiki/Scheme_programming_language">维基百科</a>看详细的介绍。</p>
<h1 id="我们要实现的语言scheme的定义">我们要实现的语言——Scheme的定义</h1>
<p>讲了那么多,那么我们要实现的语言到底是怎么样的一个语言呢?</p>
<p>接下来我会讲述我们实现的Scheme包含的特性。而实现这个解析器的语言同时也可以用它来描述。</p>
<h2 id="语法s表达式">语法:S表达式</h2>
<p>一个具有下面性质的表达式,可以称之为S表达式:</p>
<ol>
<li>一个不包含括号的原子表达式,比如1、”hello”、true、false等。</li>
<li>一个用括号”()”括住的表达式,其中括号之间包含0个或以上的S表达式。</li>
</ol>
<p>可以看到S表达式是一个递归的定义,所以下面的几个表达式都是S表达式:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>1 "hello" () (1 2) (("hello") 2) (+ 1 2)
</code></pre></div></div>
<p>而在Scheme里面,所有的表达式都是S表达式。其中第一种形式的S表达式称为atom,
而有括号的S表达式称为列表(list)。其中当表达式是列表形式的时候,
这个列表表示函数调用,其中第一个元素是函数的名字,后面的就是这个函数调用的实参。
也就是说<code class="language-plaintext highlighter-rouge">(+ 1 2)</code>表示的是<code class="language-plaintext highlighter-rouge">1 + 2</code>的意思。这种表示形式称为前缀表达式。</p>
<p>作为函数调用的另一个例子,假设mod是一个取模函数,就是第一个参数除于第二个参数的余数。
那么这个函数调用在Scheme里面就是写作<code class="language-plaintext highlighter-rouge">(mod 4 3)</code>就是在C里面的对应写法就是<code class="language-plaintext highlighter-rouge">mod(4, 3)</code>。</p>
<h2 id="基本类型">基本类型</h2>
<p>我们编写的基本类型一共有以下几种:</p>
<ol>
<li>Number,包括interger、floating point number。例如2,2.1。</li>
<li>String,和C里面的string是一样的,如”hello”。</li>
<li>Symbol,这个类型在Lisp里面比较常见,如abc。在Scheme里面,
要得到abc这个symbol,就用(quote abc)表示。</li>
<li>Boolean, 包括两个值,true和false。</li>
<li>List,这个和Python里面的List是类似的,不过写法是<code class="language-plaintext highlighter-rouge">(1 2 a)</code>。
并且Scheme里面的List不是数组,是单链表。而构造list的写法有几种:
<ol>
<li><code class="language-plaintext highlighter-rouge">(quote (1 2 a))</code>。注意到空的list表示为<code class="language-plaintext highlighter-rouge">(quote ())</code></li>
<li><code class="language-plaintext highlighter-rouge">(cons 1 (cons 2 (cons a (quote ()))))</code>。注意到,
元素的添加是通过<code class="language-plaintext highlighter-rouge">cons</code>来构造的。<code class="language-plaintext highlighter-rouge">(cons a b)</code>表示构造一个2个元素的list,
其中第一个元素是a,余下的元素是b。</li>
<li>元素的取出是两个操作:car和cdr。假设a的值是<code class="language-plaintext highlighter-rouge">(cons 1 2)</code>,
那么<code class="language-plaintext highlighter-rouge">(car a)</code>的值是1,<code class="language-plaintext highlighter-rouge">(cdr a)</code>的值是2。所以那上面的例子来说,
<code class="language-plaintext highlighter-rouge">(car (quote (1 2 a)))</code>的值是1,<code class="language-plaintext highlighter-rouge">(cdr (quote (1 2 a)))</code>的值是
<code class="language-plaintext highlighter-rouge">(quote (2 a))</code>。</li>
</ol>
</li>
</ol>
<p>其中Number、String和Boolean称为<em>Atom</em>。</p>
<h2 id="lambda">lambda</h2>
<p>用过Python的人都知道Python里面有个keyword叫做lambda。但是Python里面的lambda功能很弱,
只能写一行的匿名函数。而Scheme里面的lambda就要强大多了,是一个功能完备的函数。</p>
<p>Scheme里面的lambda定义语法如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">args</span><span class="p">)</span>
<span class="p">(</span><span class="nf">body</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>比如说下面的都是lambda</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">x</span> <span class="nv">y</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">x</span><span class="p">)</span>
<span class="nv">x</span><span class="p">)</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">p</span><span class="p">)</span>
<span class="p">(</span><span class="nf">p</span> <span class="nv">p</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<h2 id="定义">定义</h2>
<p>定义包括变量定义和函数定义。其中变量定义是的语法如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="nv">a</span> <span class="mi">1</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的表达式是定义了一个名为a的变量,他的值是1。</p>
<p>而函数定义的语法如下:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">add</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中add是函数名,而x、y是这个函数的参数,而这个函数体是<code class="language-plaintext highlighter-rouge">(+ x y)</code>,
也就是求两个参数的和。</p>
<p>而实际上,函数定义和变量定义是一样的,也就是函数定义等价于下面的语句:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="nv">add</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">x</span> <span class="nv">y</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>也就是函数实际上是一个值为lambda的变量。</p>
<p>还有一点值得说明的是,在函数的定义里面可以有嵌套的定义,例如:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">fact-iter</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">iter</span> <span class="nv">x</span> <span class="nv">result</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">=</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
<span class="nv">result</span>
<span class="p">(</span><span class="nf">iter</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">x</span> <span class="nv">result</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">iter</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面例子中的iter函数就是定义在fact-iter里面的。他的scope就是在fact-iter里面。
如果fact-iter外面有定义iter的话,那么外面的iter就会被里面的这个iter覆盖掉。
注意到例子中的程序是用来“循环”计算factorial的。</p>
<h2 id="闭包与lexical-scoping">闭包与Lexical scoping</h2>
<p>闭包的准确定义是包含了其环境的函数,但是但从这句话里面我们很难明白到底什么是闭包。
用例子来解释应该是最简单的了。</p>
<p>比如说下面的Scheme代码:</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">x</span><span class="p">)</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">y</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的例子里,lambda里面的函数体是另外一个lambda,而这个里面的lambda使用了x,
这个x的定义并不在里面的lambda里面,而在外面的lambda,这个外面的lambda就是里面的
lambda的一个lexical的环境。那么我们就称里面这个lambda是一个闭包。</p>
<p>这里还涉及了一个叫做lexical scope的概念,它是和dynamic scope相对应的一个概念。
lexical scope的意思是,闭包里面的变量的取值是根据其定义的地方的环境来进行取值。</p>
<p>比如说例子里面的x取值就是外面的lambda的参数x的值。</p>
<p>而dynamic scope的意思就是说,闭包里面的变量值是根据调用的时候的环境来进行取值。
比如说下面的例子里面,</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">inc</span> <span class="nv">x</span><span class="p">)</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">(</span><span class="nf">y</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="nv">y</span><span class="p">)))</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">test</span> <span class="nv">x</span><span class="p">)</span>
<span class="p">((</span><span class="nf">inc</span> <span class="mi">3</span><span class="p">)</span> <span class="mi">4</span><span class="p">))</span>
<span class="p">(</span><span class="nf">test</span> <span class="mi">2</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>对于dynamic scope的语言来说,上面的<code class="language-plaintext highlighter-rouge">(test 2)</code>的值是6,
但是对于lexical scope的语言来说,他的值是7。</p>
<h2 id="if条件语句">if条件语句</h2>
<p>最基本的if条件语句是下面这样的</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">=</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="nb">+</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
<span class="nv">x</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的语句和下面的C语句是等价的。</p>
<figure class="highlight"><pre><code class="language-c" data-lang="c"><span class="n">x</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">?</span> <span class="n">x</span> <span class="o">+</span> <span class="mi">1</span> <span class="o">:</span> <span class="n">x</span></code></pre></figure>
<h1 id="example">Example</h1>
<p>OK,说了上面那么多,接下来我用上面的语法说明写一个例子程序,并说明预期的输出。
接下来的几章,我们就会用这个例子程序来验证我们的解析器的正确性。</p>
<figure class="highlight"><pre><code class="language-scheme" data-lang="scheme"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
</pre></td><td class="code"><pre><span class="c1">;;;; 测试递归</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">fact</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb"><</span> <span class="nv">n</span> <span class="mi">2</span><span class="p">)</span>
<span class="nv">n</span>
<span class="p">(</span><span class="nb">*</span> <span class="nv">n</span> <span class="p">(</span><span class="nf">fact</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">)))))</span>
<span class="p">(</span><span class="nf">fact</span> <span class="mi">3</span><span class="p">)</span> <span class="c1">; 6</span>
<span class="p">(</span><span class="nf">fact</span> <span class="mi">4</span><span class="p">)</span> <span class="c1">; 24</span>
<span class="p">(</span><span class="nf">fact</span> <span class="mi">5</span><span class="p">)</span> <span class="c1">; 120</span>
<span class="c1">;;;; 测试嵌套定义</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">fact-iter</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">iter</span> <span class="nv">x</span> <span class="nv">result</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">=</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span>
<span class="nv">result</span>
<span class="p">(</span><span class="nf">iter</span> <span class="p">(</span><span class="nb">-</span> <span class="nv">x</span> <span class="mi">1</span><span class="p">)</span> <span class="p">(</span><span class="nb">*</span> <span class="nv">x</span> <span class="nv">result</span><span class="p">))))</span>
<span class="p">(</span><span class="nf">iter</span> <span class="nv">n</span> <span class="mi">1</span><span class="p">))</span>
<span class="p">(</span><span class="nf">fact-iter</span> <span class="mi">3</span><span class="p">)</span>
<span class="p">(</span><span class="nf">fact-iter</span> <span class="mi">4</span><span class="p">)</span>
<span class="c1">;;;; 测试闭包</span>
<span class="p">(</span><span class="k">define</span> <span class="p">(</span><span class="nf">get-number-closure</span> <span class="nv">n</span><span class="p">)</span>
<span class="p">(</span><span class="k">lambda</span> <span class="p">()</span> <span class="nv">n</span><span class="p">))</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">get-1</span> <span class="p">(</span><span class="nf">get-number-closure</span> <span class="mi">1</span><span class="p">))</span>
<span class="p">(</span><span class="nf">get-1</span><span class="p">)</span> <span class="c1">; 1</span>
<span class="p">(</span><span class="k">define</span> <span class="nv">get-100</span> <span class="p">(</span><span class="nf">get-number-closure</span> <span class="mi">100</span><span class="p">))</span>
<span class="p">(</span><span class="nf">get-100</span><span class="p">)</span> <span class="c1">; 100</span>
</pre></td></tr></tbody></table></code></pre></figure>
<h1 id="编程环境">编程环境</h1>
<p>OK,有了前面的基础,我们就剩下编程的环境了。
这里就以我自己用的环境为准。
我自己使用的Scheme是<a href="http://www.gnu.org/s/mit-scheme">mit-scheme</a>,
因为它是<a href="http://mitpress.mit.edu/sicp/full-text/book/book.html">《SICP》</a>里面使用的
教学版本。而mit-scheme和Emacs配合的也比较好,利用mit-scheme源码包里面的xscheme.el
来替换掉Emacs自身的scheme-mode可以很高效的进行Scheme的编程。所以我用的环境就是
mit-scheme + Emacs + xscheme.el。</p>
<p>我会在下一节讲解释器的基本结构。</p>
Pimpl Idiom in C++
2012-10-20T00:00:00+00:00
http://airekans.github.io/cpp/2012/10/20/pimpl-idiom-in-c
<h1 id="introduction">Introduction</h1>
<p>在C++里面, 经常出现的情况就是头文件里面的类定义太庞大了,而这个类的成员变量涉及了很多
其他文件里面的类,从而导致了其他引用这个类的文件也依赖于这些成员变量的定义。
在这种情况下,就出现了在C++里面特有的一个idiom,叫做Pimpl idiom。</p>
<p>考虑一下下面的情况,假设有一个类A,它包含了成员变量b和c,类型分别为B和C,而如果D类
要使用A类的话,那也变相依赖了B和C。如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre><span class="cp">#include "B.h"
#include "C.h"
</span>
<span class="k">class</span> <span class="nc">A</span>
<span class="p">{</span>
<span class="nl">private:</span>
<span class="n">B</span> <span class="n">b</span><span class="p">;</span>
<span class="n">C</span> <span class="n">c</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>这个时候如果D要使用A类的话,那么D就要像下面那样去写:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="cp">#include "A.h"
</span>
<span class="k">class</span> <span class="nc">D</span>
<span class="p">{</span>
<span class="nl">private:</span>
<span class="n">A</span> <span class="n">a</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>虽然形式上是只需要include A.h,但是在链接程序的时候,却需要把B和C的模块也一并链接进去。</p>
<p>初步的解决方案可以是把A里面的b和c变成指针类型,然后利用指针声明的时候类型可以是不完全类型,
从而在A.h里面不用include B.h和C.h。当然,这也只是解决的部分的问题。
如果A里面需要用到十几个成员变量的话,这个时候头文件的size就会变得很大,这也是一个问题。
而且有些时候,变成指针类型也不一定是可行的。这个时候,一个简单的想法就是把所有私有的
成员变量的声明都放到cpp文件里面去,这样使用A的类就可以完全不用知道A类的成员变量了。</p>
<h1 id="pimpl-idiom">Pimpl Idiom</h1>
<p>而Pimpl idiom就是这样的解决方案。所谓的Pimpl idiom,就是声明一个类中类,
然后再声明一个成员变量,类型是这个类中类的指针。用上面的例子来说明一下会清楚一下,
代码如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">A</span>
<span class="p">{</span>
<span class="nl">private:</span>
<span class="k">struct</span> <span class="n">Pimpl</span><span class="p">;</span>
<span class="n">Pimpl</span><span class="o">*</span> <span class="n">m_pimpl</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>有了上面的定义,那么D类就可以完全不用知道A类的细节,而且链接的时候也可以完全不用管B和C了。
然后在A.cpp里面,我们就像下面这样去定义就好了:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="k">struct</span> <span class="n">A</span><span class="o">::</span><span class="n">Pimpl</span>
<span class="p">{</span>
<span class="n">B</span> <span class="n">b</span><span class="p">;</span>
<span class="n">C</span> <span class="n">c</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">A</span><span class="o">::</span><span class="n">A</span><span class="p">()</span>
<span class="o">:</span> <span class="n">m_pimpl</span><span class="p">(</span><span class="k">new</span> <span class="n">Pimpl</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_impl</span><span class="o">-></span><span class="n">b</span><span class="p">;</span> <span class="c1">// 使用b</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而现在我们STL有auto_ptr,boost有shared_ptr,再要自己来管理内存好像
就有写多次一举了。所以在Herb Sutter的<a href="http://www.gotw.ca/publications/using_auto_ptr_effectively.htm">Using auto_ptr Effectively</a>里面,
也提到了用auto_ptr来进行“经典”的Pimpl的编写。</p>
<p>也就是如下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="cp">#include <memory>
</span>
<span class="k">class</span> <span class="nc">A</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">A</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="k">struct</span> <span class="n">Pimpl</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">auto_ptr</span><span class="o"><</span><span class="n">Pimpl</span><span class="o">></span> <span class="n">m_pimpl</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>可以当你写了上面的代码之后,编译,Bang! 编译器给你报一个错,说是Pimpl是incomplete
type。这下你就蒙了吧?!(在新版本的C++ STL里面,加上了<code class="language-plaintext highlighter-rouge">#pragma GCC system_header</code>,
所以是不会报错的。如果自己copy出<code class="language-plaintext highlighter-rouge">auto_ptr</code>,那还是会报错。
参看<a href="http://gcc.gnu.org/onlinedocs/cpp/System-Headers.html#System-Headers">这里</a>)</p>
<p>其实要fix上面的编译错误,你只需要加上A的destructor的声明,然后在cpp文件里面实现一个
空的destructor就可以了。</p>
<p>但是这个是为什么呢?</p>
<h2 id="auto_ptr的模板特化">auto_ptr的模板特化</h2>
<p>其实上面问题的原因,是跟模板特化的这个C++变态特性有关的。</p>
<p>我们先来看一下auto_ptr的简化定义:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</pre></td><td class="code"><pre><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">auto_ptr</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">auto_ptr</span><span class="p">()</span>
<span class="o">:</span> <span class="n">m_ptr</span><span class="p">(</span><span class="nb">NULL</span><span class="p">)</span>
<span class="p">{}</span>
<span class="n">auto_ptr</span><span class="p">(</span><span class="n">T</span><span class="o">*</span> <span class="n">p</span><span class="p">)</span>
<span class="o">:</span> <span class="n">m_ptr</span><span class="p">(</span><span class="n">p</span><span class="p">)</span>
<span class="p">{}</span>
<span class="o">~</span><span class="n">auto_ptr</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">m_ptr</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">delete</span> <span class="n">m_ptr</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="n">T</span><span class="o">*</span> <span class="n">m_ptr</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>我们看到auto_ptr在他的构造函数里面自动的delete了他的m_ptr,这个就是比较经典的
利用RAII实现的智能指针了。</p>
<p>然后还要知道,auto_ptr是一个模板类,而模板类的一个特点是,
<strong>当他的成员函数只有在被调用的时候才会真正的做函数特化</strong>。</p>
<p>也就是说,如果有下面的这样一个模板类:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
</pre></td><td class="code"><pre><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">TemplateClass</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">Foo</span><span class="p">()</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">return</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">Bar</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">this</span><span class="o">-></span><span class="n">m_ptr</span> <span class="o">=</span> <span class="s">"syntax correct, but semantic incorrect."</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="n">TemplateClass</span><span class="o"><</span><span class="kt">int</span><span class="o">></span> <span class="n">a</span><span class="p">;</span>
<span class="n">a</span><span class="p">.</span><span class="n">Foo</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的代码,是可以通过编译并且正确运行的。可以看到Foo这个函数是正确的,而Bar函数虽然
语法上是正确的,但是他的语义是错的。但是由于我们只调用了Foo,没有调用Bar,
所以只有Foo被真正的特化并且做了完全的编译,而Bar只是做了语法上的检查,
并没有做语义的检查。所以上面的代码在C++里面是100%的正确的。</p>
<p>所以auto_ptr里面的成员函数,包括构造和析构函数,都是在被调用的时候才进行真正的特化。</p>
<h2 id="default-destructor">Default Destructor</h2>
<p>还记得在学C++的刚开始的时候书上这么说过,不定义构造函数或者析构函数,
那么编译器会帮我们造一个默认的。而这个默认的构造或者析构函数只会做成员变量还有父类的
默认初始化或者析构,其他什么都不会做。</p>
<p>那么我们看回利用了Pimpl的A的定义。在这个定义里面,由于我没有写析构函数的声明,
所以编译器自动帮我定义了一个。而A里面有一个auto_ptr成员变量,所以在这个默认的
析构函数里面会析构这个成员变量。所谓的析构,其实就是调用析构函数而已。
所以,在这个默认的析构函数里面,调用了auto_ptr的析构函数,这个时候,
auto_ptr的析构函数就被编译器特化了。</p>
<p>而在auto_ptr的析构函数里面,delete了模板参数的指针类型的成员变量。
而在A这个例子里面,模板参数就是Pimpl。而在特化的这一瞬间,Pimpl是被声明了,
但是还没有被定义。</p>
<p>所以例子里面的A在经过编译后是和下面的代码等价的:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">A</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">A</span><span class="p">();</span>
<span class="o">~</span><span class="n">A</span><span class="p">()</span>
<span class="p">{</span>
<span class="o">~</span><span class="n">auto_ptr</span><span class="o"><</span><span class="n">Pimpl</span><span class="o">></span><span class="p">(</span><span class="n">m_pimpl</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="k">struct</span> <span class="n">Pimpl</span><span class="p">;</span>
<span class="n">std</span><span class="o">::</span><span class="n">auto_ptr</span><span class="o"><</span><span class="n">Pimpl</span><span class="o">></span> <span class="n">m_pimpl</span><span class="p">;</span>
<span class="p">};</span>
<span class="n">auto_ptr</span><span class="o"><</span><span class="n">Pimpl</span><span class="o">>::~</span><span class="n">auto_ptr</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">delete</span> <span class="n">m_ptr</span><span class="p">;</span> <span class="c1">// m_ptr的类型是Pimpl*</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>那为什么当我加上A的析构函数的声明之后,编译就可以通过呢?因为当我们声明了A的析构函数之后,
编译器就不会自动生成析构函数的实现了,而由于我们会在cpp文件里面去写析构函数的实现,
而在此之前,我们就会在cpp文件的开头定义好Pimpl的实现。所以当我们自己写的A的析构函数
被编译器看见的时候,Pimpl就是一个已经定义好的类型,所以就没有问题了。</p>
<h1 id="pimpl-by-boostshared_ptr">Pimpl by boost::shared_ptr</h1>
<p>其实使用auto_ptr来实现Pimpl Idiom并不是唯一的方法,Pimpl还可以用
boost::scoped_ptr和boost::shared_ptr来实现。而scoped_ptr和auto_ptr
其实是一样的,也是需要用户手工的声明一个析构函数来实现Pimpl Idiom,这里就不说了。</p>
<p>但是通过shared_ptr来实现的话,我们就连析构函数都可以省略!也就是说,
如果我写下面的代码,是完全正确的:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">A</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">A</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="k">struct</span> <span class="n">Pimpl</span><span class="p">;</span>
<span class="n">boost</span><span class="o">::</span><span class="n">shared_ptr</span><span class="o"><</span><span class="n">Pimpl</span><span class="o">></span> <span class="n">m_pimpl</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>需要注意的是,虽然析构函数可以省略,但是构造函数还是必须明确声明的。
这又是为什么呢?为什么auto_ptr不行,但是shared_ptr就可以呢?</p>
<p>答案就在shared_ptr的实现里面。</p>
<p>相信shared_ptr应该是每个较为深入学过C++的人都会理解原理的一个类了,其中shared_ptr
的实现又可以分为侵入式和非侵入式的,而boost::shared_ptr的实现是非侵入式的。
也就是说要用shared_ptr的类不需要任何改动就可以使用了。</p>
<p>来看看简化之后的shared_ptr的实现吧:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">sp_counted_base</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="k">virtual</span> <span class="o">~</span><span class="n">sp_counted_base</span><span class="p">(){}</span>
<span class="p">};</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">sp_counted_base_impl</span> <span class="o">:</span> <span class="k">public</span> <span class="n">sp_counted_base</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="n">sp_counted_base_impl</span><span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span><span class="o">:</span><span class="n">t_</span><span class="p">(</span><span class="n">t</span><span class="p">){}</span>
<span class="o">~</span><span class="n">sp_counted_base_impl</span><span class="p">(){</span><span class="k">delete</span> <span class="n">t_</span><span class="p">;}</span>
<span class="nl">private:</span>
<span class="n">T</span> <span class="o">*</span><span class="n">t_</span><span class="p">;</span>
<span class="p">};</span>
<span class="k">class</span> <span class="nc">shared_count</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="k">static</span> <span class="kt">int</span> <span class="n">count_</span><span class="p">;</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="n">shared_count</span><span class="p">(</span><span class="n">T</span> <span class="o">*</span><span class="n">t</span><span class="p">)</span><span class="o">:</span>
<span class="n">t_</span><span class="p">(</span><span class="k">new</span> <span class="n">sp_counted_base_impl</span><span class="o"><</span><span class="n">T</span><span class="o">></span><span class="p">(</span><span class="n">t</span><span class="p">))</span>
<span class="p">{</span>
<span class="n">count_</span> <span class="o">++</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">release</span><span class="p">()</span>
<span class="p">{</span>
<span class="o">--</span><span class="n">count_</span><span class="p">;</span>
<span class="k">if</span><span class="p">(</span><span class="mi">0</span> <span class="o">==</span> <span class="n">count_</span><span class="p">)</span> <span class="k">delete</span> <span class="n">t_</span><span class="p">;</span>
<span class="p">}</span>
<span class="o">~</span><span class="n">shared_count</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">release</span><span class="p">();</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="n">sp_counted_base</span> <span class="o">*</span><span class="n">t_</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="n">shared_count</span><span class="o">::</span><span class="n">count_</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="k">class</span> <span class="nc">myautoptr</span>
<span class="p">{</span>
<span class="nl">public:</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">Y</span><span class="o">></span>
<span class="n">myautoptr</span><span class="p">(</span><span class="n">Y</span><span class="o">*</span> <span class="n">y</span><span class="p">)</span><span class="o">:</span><span class="n">sc_</span><span class="p">(</span><span class="n">y</span><span class="p">),</span><span class="n">t_</span><span class="p">(</span><span class="n">y</span><span class="p">){}</span>
<span class="o">~</span><span class="n">myautoptr</span><span class="p">(){</span> <span class="n">sc_</span><span class="p">.</span><span class="n">release</span><span class="p">();}</span>
<span class="nl">private:</span>
<span class="n">shared_count</span> <span class="n">sc_</span><span class="p">;</span>
<span class="n">T</span> <span class="o">*</span><span class="n">t_</span><span class="p">;</span>
<span class="p">};</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">myautoptr</span><span class="o"><</span><span class="n">A</span><span class="o">></span> <span class="n">a</span><span class="p">(</span><span class="k">new</span> <span class="n">B</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>从上面的代码可以看到,shared_ptr里面不单存了一个模板类型的指针,
还存了一个shared_count。
这个shared_count的作用就是用来作为引用计数还有自动管理指针用的。
而shared_count里面又存了一个sp_counted_base,而sp_counted_base_impl
是一个模板类,其继承于sp_counted_base。这其实是一个模板技巧,也就是声明一个
通用的基类,然后定义一个模板类来继承于这个基类,而其他类通过基类的指针来使用这个模板类,
这样就可以在编译时确定一些类型信息,而同时把一些通用的实现细节推迟到运行时。这句话什么意思呢?
看完接下来的解释你就明白了。</p>
<p>接下来我们又要注意到,shared_ptr和shared_count的构造函数都是模板成员函数,
模板类型由参数决定,而这个技巧和上面的模板继承技巧组合在一起,就是这节开始的时候,
例子中不用写析构函数的理由。</p>
<p>首先,当我们声明一个<code class="language-plaintext highlighter-rouge">shared_ptr<int></code>的时候,它只是把里面的t_成员给特化了,
而shared_count里面存的是什么类型的指针仍然没有确定。</p>
<p>而当我们调用<code class="language-plaintext highlighter-rouge">shared_ptr<int>(new int(3))</code>的时候,他就调用了shared_ptr的构造函数。
这个时候就特化了模板构造函数,然后这个构造函数里面又调用了shared_count的构造函数,
所以shared_count的构造函数也被特化,而又同时特化了sp_counted_base_impl,
这个时候里面的指针就完全被特化了。</p>
<p>而我们看到,在shared_ptr被析构的时候,它调用的是shared_count的release函数,
release函数里面又delete了它的类型为sp_counted_base的指针,
所以调用的是sp_counted_base的析构函数(虚函数)。因为是虚函数,当具体类型确定之后,
是会具体调用到具体的析构函数的。但是在编译的时候,不需要知道具体的类型。</p>
<p>说了那么多,其实就是一句话,调用shared_ptr的析构函数的时候,它不需要知道具体的指针类型。
也就是说这个类型即使incomplete也没有关系。而在调用shared_ptr的构造函数的时候,
shared_ptr就是会知道这个类型的所有信息,从而使得delete的时候调用到具体的析构函数。</p>
<p>所以对于shared_ptr来说,构造函数需要知道所有的类型信息,而析构函数是不要知道类型信息的。
回到例子里面,当我们不声明析构函数的时候,编译器为我们定义了一个默认的析构函数,
这个时候shared_ptr的析构函数就会被特化并定义,同时也调用sp_counted_base
的析构函数也就被编译了。但是这个时候并不许要具体的类型信息,
所以类型是incomplete也是可以的。当我们定义A的构造函数的时候,这个时候shared_ptr
的构造函数就被特化,从而shared_count的构造函数被特化,而sp_counted_base_impl
也就是被特化了。这个时候shared_ptr也就有了所有必要的类型信息,
他的析构函数就可以正常的工作了。</p>
<p>这就是为什么用shared_ptr来实现Pimpl可以不用写析构函数的原因了,
为了实现这个功能,shared_ptr牺牲了一点点的空间来完成上面的概念,比普通的shared_ptr
多了一个<code class="language-plaintext highlighter-rouge">sizeof(sp_counted_base*)</code>的大小。</p>
利用jekyll搭建中文博客
2012-09-16T00:00:00+00:00
http://airekans.github.io/jekyll/2012/09/16/jekyll-chinese-blog
<p>好吧,今天终于开始在<a href="https://github.com">github</a>上面开始写博客了。</p>
<p>之前在CSDN上面写点技术文章,感觉也还过得去。无奈CSDN的样子实在太丑了点,
而且对于代码的显示也不太友好,而且很多地方都不能自己配置,所以后来转到了
在SAE上面搭一个wordpress。</p>
<p>SAE嘛,刚开始吸引我的地方是他是一个Paas,而且有许多PHP应用已经port到了SAE上面,
其中就包括了Wordpress。而且SAE也是一段时间内免费的,所以我就开始了往SAE上面迁移博客。
说是迁移,其实也没有把CSDN那里的文章搬过来……Wordpress好是好,可以当过了半年之后,
注册SAE时候送的那500云豆就花完了,所以我的博客在某一天就上不去了,
而我就花了10RMB暂时的买了几个云豆。所以嘛,本着不想花钱,又想自己折腾的精神,
我就开始在Github上面搭博客了。</p>
<p>目前在Github上面打博客一般就是通过Github Pages或者是jekyll来搭建了。
Github Pages其实引擎也都是用的jekyll,所以最终我就决定自己用jekyll来搭了。
下面就开始进入主题,介绍一下我的这个博客是怎么搭建起来的。</p>
<h1 id="jekyll是个啥东西"><a href="https://github.com/mojombo/jekyll">jekyll</a>是个啥东西?</h1>
<p>jekyll实际上是由github开发出来的用于在github上面放置静态页面的一个页面生成工具。
它不是像wordpress那样的一个博客web程序。它是一个从markdown文件生成静态的HTML的工具。</p>
<p>实际上,用jekyll不单只可以做博客,也可以做一些其他的动态内容不太多的网站。
不过一般来说,动态内容不多的也就博客了。所以下面先来讲讲用jekyll写博客是个怎么样的流程。</p>
<p>假设已经装好了jekyll,如果我现在要写一篇新的博客,那么就会有下面的流程:</p>
<ol>
<li><code class="language-plaintext highlighter-rouge">rake post title="test"</code> 这个命令在_post目录下面创建了一个
“datetime-test.md”的文件了.</li>
<li><code class="language-plaintext highlighter-rouge">jekyll --server</code> 这个命令启动了jekyll的server程序,监听4000
端口。这样就可以打开浏览器进行浏览了,网址是”localhost:4000”。</li>
<li>重复1。</li>
</ol>
<p>有了上面的流程,写一个博客就很方便了,开着jekyll server,然后用你最喜欢的编辑器,
写markdown,就是这么简单。</p>
<h1 id="搭建jekyll博客环境">搭建jekyll博客环境</h1>
<p>要搭建jekyll环境很简单,你只需要一个安装好ruby 1.9.3,然后执行下面的命令:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gem install jekyll
</code></pre></div></div>
<p>接着在github创建一个叫做<code class="language-plaintext highlighter-rouge">USERNAME.github.com</code>的repo。</p>
<p>然后去用下面的命令把一个jekyll的博客模板下载下来:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>git clone https://github.com/plusjade/jekyll-bootstrap.git USERNAME.github.com
cd USERNAME.github.com
git remote set-url origin git@github.com:USERNAME/USERNAME.github.com.git
git push origin master
</code></pre></div></div>
<p>这样,你就创建好了一个jekyll博客了。你现在可以打开<code class="language-plaintext highlighter-rouge">USERNAME.github.com</code>看看。</p>
<p>有了上面的步骤,接着你就是修改一下repo里面的index.md,
还有创建博客的时候按照上面描述的顺序去创建就可以了。</p>
<h1 id="代码语法高亮">代码语法高亮</h1>
<p>在jekyll的文档里面,说到代码的语法高亮是通过<a href="http://pygments.org/">pygment</a>来实现的。
按照文档上面说,用下面的格式就可以实现语法高亮了:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>{% highlight ruby linenos %}
def foo
puts 'foo'
end
{% endhighlight %}
</code></pre></div></div>
<p>但是如果不小心的话,jekyll是只会将代码块区分开来,但是并没有将其语法高亮。
后来仔细看文档,发现了下面的话:</p>
<blockquote>
<p>In order for the highlighting to show up, you’ll need to include a highlighting stylesheet. For an example stylesheet you can look at <a href="http://github.com/mojombo/tpw/tree/master/css/syntax.css">syntax.css</a>. These are the same styles as used by GitHub and you are free to use them for your own site.</p>
</blockquote>
<p>所以关键的就是把上面提到的那个syntax.css文件加到默认的css加载里面去。
由于我默认用的是twitter主题,所以就做如下的改动:</p>
<ol>
<li>将syntax.css放到assert/themes/twitter/css/里面去。</li>
<li>在_include/themes/twitter/default.html里面的head节点里面把上面的syntax.css给加载上去。</li>
</ol>
<p>用了上面的方法,就可以实现和Github一样的语法高亮了。
对于默认的语法高亮设置,<a href="http://www.stehem.net/2012/02/14/how-to-get-pygments-to-work-with-jekyll.html">这篇文章</a>也讲到如何设置,如果有什么问题也可以参考一下。</p>
<h1 id="building-github-pages">Building Github Pages</h1>
<p>其实这个过程还是很简单的,就只需要将你本地的commit push到github,
那么github马上会帮你自动的build。成功之后,你的邮箱就会收到一个
page build successful的邮件。</p>
<p>但是在某些情况下,你的commit可能会build不成功。In this case,你的page并不会更新,
而是停留在上一次好的版本。当出现这种情况,有下面几种问题的解决方案:</p>
<ol>
<li>仔细看github上面关于用jekyll来build pages的<a href="https://help.github.com/articles/using-jekyll-with-pages">文档</a>。</li>
<li>参考github上关于jekyll的<a href="https://help.github.com/articles/pages-don-t-build-unable-to-run-jekyll">trouble shooting</a>。</li>
<li>Last but not the least, 发邮件问github的support.</li>
</ol>
<p>为什么会说到这个呢,因为在我的pages里面,就出现过这个情况。
原因是我用了liquid里面的raw tag,而github的jekyll是不支持这个tag的。
所以在我push了commit之后,page一直没有更新。后来问support才知道,原来是page
的build出现问题了。然后看了一下trouble shooting,才知道原来我用了raw tag……
总之,问题解决就是好事。</p>
<h1 id="主题">主题</h1>
<p>至于主题这个事,我现在还在慢慢的研究,暂时还是用回默认的twitter主题。</p>
<p>以后有什么补充的话,我会继续在这个文章里面进行补充。</p>
inlineCallbacks: A New Way towards Asynchronous Programming
2012-07-17T00:00:00+00:00
http://airekans.github.io/python/2012/07/17/inlinecallbacks
<p>异步编程,是目前解决性能问题的一个大方向。其中怎么样实现异步有多种不同的实现方式。通过异步的方式,能够实现更高的资源利用和响应性。在网络和图形界面编程里面,一种非常普遍的做法是基于事件来实现用户响应性。也就是程序利用一个主事件循环,不断的处理触发的事件。而对应事件的处理是通过回调(callback)的形式注册到事件循环中,当对应的事件触发的时候,主循环就是调用对应的回调。</p>
<p>虽然这种基于事件和回调的编程模式存在了很多年了,但是用回调来写业务逻辑有一种很不爽的感觉,那就是经常的发事件,然后写对应的回调函数,会将一个很简单的处理逻辑分散在不同的地方,并且很有可能会引入额外的复杂性。自己在写界面的时候就经常出现一段紧密相关的逻辑分布在两个不同的类中,使得在找对应的上下文的时候出现极大的阻碍。</p>
<p>对于这种情况,在Python里面的<a href="http://twistedmatrix.com/">twisted</a>.defer提供了一种很优雅的解决方案。利用defer里面的inlineCallbacks这个decorator,可以使我们写异步的代码可以像写同步的代码一样,从而降低了异步编程的难度。(在C# 5和Javascript的<a href="https://github.com/JeffreyZhao/jscex">Jscex</a>里面已经有类似的实现)</p>
<p>twisted是一个Python的基于事件循环的网络库,里面实现了基本的事件循环和各种相关的网络工具。其中的defer抽象就是这篇文章主要介绍的对象。关于twisted的介绍可以看官网的教程,或者是<a href="http://krondo.com/?page_id=1327">著名的poetry twisted tutor</a>。</p>
<h1 id="例子">例子</h1>
<p>本文会用一个比较典型的例子来进行讲解。想象我们需要写这么一个服务器:</p>
<blockquote>
<p>一个视频下载服务器,在接受到客户端的请求之后,会去下载相关的视频,并保存在服务器本地。具体来说,客户段会发送给服务器一个段地址。服务器在接受到短地址之后,会首先向段地址服务提供商请求转换段地址。在服务器接受到转换后的原地址之后,会向真正的下载地址发出真正的下载请求,然后在下载完成之后,将它保存起来。</p>
</blockquote>
<p>首先,服务器程序肯定不会是同步的去处理这种请求,因为这样就大大的降低服务器的处理能力。所以我们会用异步调用的方式来处理这个请求,而在twisted里面就是通过注册事件回调的方式来完成。</p>
<h1 id="同步实现">同步实现</h1>
<p>假设我们利用同步的方式来完成上述的功能,对应的代码应该是像下面这样:</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">stringReceived</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">shortUrl</span><span class="p">):</span>
<span class="bp">self</span><span class="o">.</span><span class="n">transport</span><span class="o">.</span><span class="n">loseConnection</span><span class="p">()</span>
<span class="bp">self</span><span class="o">.</span><span class="n">downloadVideoFromShortUrl</span><span class="p">(</span><span class="n">shortUrl</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">downloadVideoFromShortUrl</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">shortUrl</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="n">transformShortUrl</span><span class="p">(</span><span class="n">shortUrl</span><span class="p">)</span>
<span class="n">video</span> <span class="o">=</span> <span class="n">downloadVideoFromUrl</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">storeVideo</span><span class="p">(</span><span class="n">video</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">BaseException</span><span class="p">,</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span> <span class="s">"exception:"</span><span class="p">,</span> <span class="n">e</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中,<code class="language-plaintext highlighter-rouge">stringReceived</code>函数会在接收到客户端发送过来的短地址之后调用,参数就是对应的<code class="language-plaintext highlighter-rouge">shortUrl</code>。在<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrl</code>里面的是程序的主要逻辑,它按顺序的调用了shortUrl转换、从url下载地址视频和本地储存视频文件。假设每个函数都是同步调用的话,逻辑非常清晰,看代码的时候直接从上往下读就可以了。其中也包含了错误的处理,也就是一个大的try…catch,其中<code class="language-plaintext highlighter-rouge">transformShortUrl</code>和<code class="language-plaintext highlighter-rouge">downloadVideoFromUrl</code>会在出现错误的时候抛<code class="language-plaintext highlighter-rouge">BaseException</code>。</p>
<p>但是同步代码的问题就在于,当你进程阻塞在任何一个同步调用上的时候,你的进程什么都干不了了。所以这个时候我们就会利用异步调用来解决这个问题。假设<code class="language-plaintext highlighter-rouge">transformShortUrl</code>、<code class="language-plaintext highlighter-rouge">downloadVideoFromUrl</code>都变成了异步调用。一般来说异步调用的结果我们都会通过回调的方式来处理。现在看看代码是怎么样。</p>
<h1 id="基于回调的异步实现">基于回调的异步实现</h1>
<p>基本的代码如下:</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">downloadVideoFromShortUrlAsync</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">shortUrl</span><span class="p">):</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">transformShortUrlAsync</span><span class="p">(</span><span class="n">shortUrl</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">downloadVideoFromUrl</span><span class="p">(</span><span class="n">url</span><span class="p">):</span>
<span class="k">print</span> <span class="s">"long url:"</span><span class="p">,</span> <span class="n">url</span>
<span class="n">d</span> <span class="o">=</span> <span class="n">downloadVideoFromUrlAsync</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">errDownloadVideoFromUrl</span><span class="p">(</span><span class="n">err</span><span class="p">):</span>
<span class="k">print</span> <span class="s">"exception:"</span><span class="p">,</span> <span class="n">err</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallbacks</span><span class="p">(</span><span class="n">storeVideo</span><span class="p">,</span> <span class="n">errDownloadVideoFromUrl</span><span class="p">)</span>
<span class="k">def</span> <span class="nf">errTransformShortUrl</span><span class="p">(</span><span class="n">err</span><span class="p">):</span>
<span class="k">print</span> <span class="s">"exception:"</span><span class="p">,</span> <span class="n">err</span>
<span class="n">d</span><span class="o">.</span><span class="n">addCallbacks</span><span class="p">(</span><span class="n">downloadVideoFromUrl</span><span class="p">,</span> <span class="n">errTransformShortUrl</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>为了容易区别,我把所有异步调用的函数都在函数名后面加上Async,来表示它是一个异步调用。每个异步调用会返回一个defer,暂且你可以认为这个defer表示的是这个调用是异步的。当你要处理这个异步调用的结果的时候,就往这个defer上面添加一个函数。当这个异步调用完成之后,就会调用添加到这个defer上面的函数。</p>
<p>由于现在我们要用回调来处理调用结果,所以我们就要将处理结果的逻辑放在另一个函数里面。就比如我们在转换完段地址之后,会从这个地址下载视频。而下载视频的逻辑就另外定义一个函数来完成,也就是代码中的<code class="language-plaintext highlighter-rouge">downloadVideoFromUrl</code>。可以看到,处理逻辑已经变得复杂,而且增加了嵌套。况且处理的逻辑有点不符合从上往下的阅读习惯。在利用回调的实现里面,必须将结果的处理和调用逻辑分开写,否则你无法完成操作。在写一些带有循环和复杂逻辑的代码的时候,这个弊端就会显现出来。</p>
<p>而且你可以看到处理错误的逻辑和正确的处理逻辑被分割开,你很难看出里面的具体逻辑。如果你不是写习惯了这种基于回调的代码,相信一般人很难在一开始的时候就看出上面的逻辑。</p>
<p>既然基于回调的写程序方式那么的反人类,那么我们有什么解决方案呢?twisted的<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>就出场了。</p>
<h1 id="基于inlinecallbacks的异步实现">基于inlineCallbacks的异步实现</h1>
<p>首先我们的几个基本调用还是异步,那么用了<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>之后的代码如下:</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="o">@</span><span class="n">inlineCallbacks</span>
<span class="k">def</span> <span class="nf">downloadVideoFromShortUrlAsync</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">shortUrl</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">url</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">transformShortUrlAsync</span><span class="p">(</span><span class="n">shortUrl</span><span class="p">)</span>
<span class="n">video</span> <span class="o">=</span> <span class="k">yield</span> <span class="n">downloadVideoFromUrlAsync</span><span class="p">(</span><span class="n">url</span><span class="p">)</span>
<span class="n">storeVideo</span><span class="p">(</span><span class="n">video</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">BaseException</span><span class="p">,</span> <span class="n">e</span><span class="p">:</span>
<span class="k">print</span> <span class="s">"exception:"</span><span class="p">,</span> <span class="n">e</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>省略掉多出来的yield,这个代码就和同步的一模一样!!唯一不同的就是在异步调用的前面加上了yield!!</p>
<p>怎么样,这样写代码是不是很爽?</p>
<p>但是细想一下,我们的<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>明明是异步调用啊,明明不能马上的获得结果啊,那<code class="language-plaintext highlighter-rouge">url = transformShortUrlAsync</code>那不就是错误的么?</p>
<p>秘密就在于我们多加上去的<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>这个<code class="language-plaintext highlighter-rouge">decorator</code>和<code class="language-plaintext highlighter-rouge">yield</code>上面。首先解释一下,<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>本身也是一个异步调用。当他执行到第一个异步调用的地方,它会在<code class="language-plaintext highlighter-rouge">yield</code>的地方“等待”异步调用的执行结束和返回结果。在第二个异步调用的地方也是同样的,他也是“等待”异步调用的执行结束和返回结果。</p>
<p>也就是从<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>的角度来说,他的执行顺序是和同步没有差别的,他也是首先执行<code class="language-plaintext highlighter-rouge">transformShortUrl</code>,然后<code class="language-plaintext highlighter-rouge">downloadVideo</code>,最后store。而且从代码的结构上来说,也是很清晰的反应出了这一点。</p>
<p>但是,你会不会觉得这里有点怪怪的?既然<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>函数会在<code class="language-plaintext highlighter-rouge">yield</code>的地方等待异步调用的执行,那么整个调用本身不就又变回同步的了么?那我用异步调用来干什么……</p>
<p>神奇就神奇在,如果yield后面的函数调用是异步的,那么<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>也还是异步的!但是他要等待结果,怎么异步啊?其实,整个函数的执行是这样的:</p>
<ol>
<li>进入<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>函数,调用<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>。</li>
<li>由于<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>是一个异步调用,所以在函数返回的时候,结果还没有产生。这个时候,<strong>downloadVideoFromShortUrlAsync就返回了。</strong></li>
<li>当<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>的结果产生之后,就会继续从<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>函数没有执行的部分开始执行,这个时候url就获得了异步调用的结果。</li>
<li>接着调用<code class="language-plaintext highlighter-rouge">downloadVideoFromUrlAsync</code>,和step 2一样,当这个异步调用返回的时候,<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>就又返回了。</li>
<li>当<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>的结果获得之后,执行就又从<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>没有执行的部分开始执行,这个时候就video就赋值为已经下载的视频文件了。</li>
<li>接着执行余下的部分。</li>
</ol>
<p>整个执行时序就如下面这幅图显示:</p>
<p><img src="/assets/img/inlineCallbacks.jpg" alt="sequence diagram of downloadVideoFromShortUrl" /></p>
<p>就如上面的图显示的这样,<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>会在异步调用的结果返回之后继续调用接下来的部分。</p>
<p>需要注意的是,<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>并不会将一个本来同步的函数变成异步,他只是使得一个函数在调用异步函数的时候可以很方便的书写,并且将自己也变成一个异步函数。但是如果你调用的函数不是异步的,那么用<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>修饰的这个函数也不会是异步的。</p>
<h1 id="inlinecallbacks的实现">inlineCallbacks的实现</h1>
<p>所以我们最关心的是,How does the magic happen? 那我们直接来看看代码实现。注意这里我假设你知道Python的decorator, 也知道Python的generator。</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">inlineCallbacks</span><span class="p">(</span><span class="n">f</span><span class="p">):</span>
<span class="k">def</span> <span class="nf">unwindGenerator</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">gen</span> <span class="o">=</span> <span class="n">f</span><span class="p">(</span><span class="o">*</span><span class="n">args</span><span class="p">,</span> <span class="o">**</span><span class="n">kwargs</span><span class="p">)</span>
<span class="k">except</span> <span class="n">_DefGen_Return</span><span class="p">:</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span>
<span class="s">"inlineCallbacks requires </span><span class="si">%</span><span class="s">r to produce a generator; instead"</span>
<span class="s">"caught returnValue being used in a non-generator"</span> <span class="o">%</span> <span class="p">(</span><span class="n">f</span><span class="p">,))</span>
<span class="k">if</span> <span class="ow">not</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">gen</span><span class="p">,</span> <span class="n">types</span><span class="o">.</span><span class="n">GeneratorType</span><span class="p">):</span>
<span class="k">raise</span> <span class="nb">TypeError</span><span class="p">(</span>
<span class="s">"inlineCallbacks requires </span><span class="si">%</span><span class="s">r to produce a generator; "</span>
<span class="s">"instead got </span><span class="si">%</span><span class="s">r"</span> <span class="o">%</span> <span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">gen</span><span class="p">))</span>
<span class="k">return</span> <span class="n">_inlineCallbacks</span><span class="p">(</span><span class="bp">None</span><span class="p">,</span> <span class="n">gen</span><span class="p">,</span> <span class="n">Deferred</span><span class="p">())</span>
<span class="k">return</span> <span class="n">mergeFunctionMetadata</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">unwindGenerator</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中的mergeFunctionMetaData其实就是将f的__name__和__doc__赋给<code class="language-plaintext highlighter-rouge">unwindGenerator</code>。而我们从<code class="language-plaintext highlighter-rouge">unwindGenerator</code>可以看到,函数首先调用了f,也就是被修饰的函数,而因为要用<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>的函数一般都是generator,这个函数返回的是一个generator object。所以最重要的函数是<code class="language-plaintext highlighter-rouge">_inlineCallbacks</code>这个函数。我们再来看看它的实现。</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
</pre></td><td class="code"><pre><span class="k">def</span> <span class="nf">_inlineCallbacks</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">deferred</span><span class="p">):</span>
<span class="n">waiting</span> <span class="o">=</span> <span class="p">[</span><span class="bp">True</span><span class="p">,</span> <span class="c1"># waiting for result?
</span> <span class="bp">None</span><span class="p">]</span> <span class="c1"># result
</span>
<span class="k">while</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">isFailure</span> <span class="o">=</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">failure</span><span class="o">.</span><span class="n">Failure</span><span class="p">)</span>
<span class="k">if</span> <span class="n">isFailure</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">result</span><span class="o">.</span><span class="n">throwExceptionIntoGenerator</span><span class="p">(</span><span class="n">g</span><span class="p">)</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">g</span><span class="o">.</span><span class="n">send</span><span class="p">(</span><span class="n">result</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">StopIteration</span><span class="p">:</span>
<span class="c1"># fell off the end, or "return" statement
</span> <span class="n">deferred</span><span class="o">.</span><span class="n">callback</span><span class="p">(</span><span class="bp">None</span><span class="p">)</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="k">except</span> <span class="n">_DefGen_Return</span><span class="p">,</span> <span class="n">e</span><span class="p">:</span>
<span class="n">appCodeTrace</span> <span class="o">=</span> <span class="n">exc_info</span><span class="p">()[</span><span class="mi">2</span><span class="p">]</span><span class="o">.</span><span class="n">tb_next</span>
<span class="k">if</span> <span class="n">isFailure</span><span class="p">:</span>
<span class="n">appCodeTrace</span> <span class="o">=</span> <span class="n">appCodeTrace</span><span class="o">.</span><span class="n">tb_next</span>
<span class="k">if</span> <span class="n">appCodeTrace</span><span class="o">.</span><span class="n">tb_next</span><span class="o">.</span><span class="n">tb_next</span><span class="p">:</span>
<span class="n">ultimateTrace</span> <span class="o">=</span> <span class="n">appCodeTrace</span>
<span class="k">while</span> <span class="n">ultimateTrace</span><span class="o">.</span><span class="n">tb_next</span><span class="o">.</span><span class="n">tb_next</span><span class="p">:</span>
<span class="n">ultimateTrace</span> <span class="o">=</span> <span class="n">ultimateTrace</span><span class="o">.</span><span class="n">tb_next</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">ultimateTrace</span><span class="o">.</span><span class="n">tb_frame</span><span class="o">.</span><span class="n">f_code</span><span class="o">.</span><span class="n">co_filename</span>
<span class="n">lineno</span> <span class="o">=</span> <span class="n">ultimateTrace</span><span class="o">.</span><span class="n">tb_lineno</span>
<span class="n">warnings</span><span class="o">.</span><span class="n">warn_explicit</span><span class="p">(</span>
<span class="s">"returnValue() in </span><span class="si">%</span><span class="s">r causing </span><span class="si">%</span><span class="s">r to exit: "</span>
<span class="s">"returnValue should only be invoked by functions decorated "</span>
<span class="s">"with inlineCallbacks"</span> <span class="o">%</span> <span class="p">(</span>
<span class="n">ultimateTrace</span><span class="o">.</span><span class="n">tb_frame</span><span class="o">.</span><span class="n">f_code</span><span class="o">.</span><span class="n">co_name</span><span class="p">,</span>
<span class="n">appCodeTrace</span><span class="o">.</span><span class="n">tb_frame</span><span class="o">.</span><span class="n">f_code</span><span class="o">.</span><span class="n">co_name</span><span class="p">),</span>
<span class="nb">DeprecationWarning</span><span class="p">,</span> <span class="n">filename</span><span class="p">,</span> <span class="n">lineno</span><span class="p">)</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">callback</span><span class="p">(</span><span class="n">e</span><span class="o">.</span><span class="n">value</span><span class="p">)</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="k">except</span><span class="p">:</span>
<span class="n">deferred</span><span class="o">.</span><span class="n">errback</span><span class="p">()</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="k">if</span> <span class="nb">isinstance</span><span class="p">(</span><span class="n">result</span><span class="p">,</span> <span class="n">Deferred</span><span class="p">):</span>
<span class="c1"># a deferred was yielded, get the result.
</span> <span class="k">def</span> <span class="nf">gotResult</span><span class="p">(</span><span class="n">r</span><span class="p">):</span>
<span class="k">if</span> <span class="n">waiting</span><span class="p">[]:</span>
<span class="n">waiting</span><span class="p">[]</span> <span class="o">=</span> <span class="bp">False</span>
<span class="n">waiting</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">r</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">_inlineCallbacks</span><span class="p">(</span><span class="n">r</span><span class="p">,</span> <span class="n">g</span><span class="p">,</span> <span class="n">deferred</span><span class="p">)</span>
<span class="n">result</span><span class="o">.</span><span class="n">addBoth</span><span class="p">(</span><span class="n">gotResult</span><span class="p">)</span>
<span class="k">if</span> <span class="n">waiting</span><span class="p">[]:</span>
<span class="n">waiting</span><span class="p">[]</span> <span class="o">=</span> <span class="bp">False</span>
<span class="k">return</span> <span class="n">deferred</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">waiting</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span>
<span class="n">waiting</span><span class="p">[]</span> <span class="o">=</span> <span class="bp">True</span>
<span class="n">waiting</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
<span class="k">return</span> <span class="n">deferred</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>首先知道,<code class="language-plaintext highlighter-rouge">_inlineCallbacks</code>这个函数的3个参数接受的分别是上一次这个generator返回的结果(result),这个<code class="language-plaintext highlighter-rouge">generator(g)</code>,还有这个generator对应的defer(deferred)。</p>
<p>首先,这个函数第一次调用是从<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>(注意区分有没有下划线开头)里面调过来的。所以第一次调用的时候,result是<code class="language-plaintext highlighter-rouge">None</code>,而g是一个开没有开始执行的generator。</p>
<p>而最重要的就是15-20行的代码。</p>
<ol>
<li>首先16行的代码就是取得result的类型信息。这样需要注意的是,如果异步调用返回的是一个错误的结果,那么类型就是<code class="language-plaintext highlighter-rouge">failure.Failure</code>。如果是正常的话,就不是<code class="language-plaintext highlighter-rouge">failure.Failure</code>。</li>
<li>17-20行:接着就根据result的类型来进行不同的处理。如果result是failure的话,那么就调用<code class="language-plaintext highlighter-rouge">result.throwExceptionIntoGenerator(g)</code>,这个函数的作用就是将result对应的异常抛进g里面。<br />
如果result的类型不是failure的话,那么就是正常的结果。所以就直接用<code class="language-plaintext highlighter-rouge">g.send(result)</code>来将结果传进这个generator里面。注意到,当第一次调用<code class="language-plaintext highlighter-rouge">_inlineCallbacks</code>的时候,result是<code class="language-plaintext highlighter-rouge">None</code>,所以第一次调用相当于调用下面的代码:<code class="language-plaintext highlighter-rouge">g.send(None)</code>。这个用法是正确的,因为当generator还没有开始的时候,<code class="language-plaintext highlighter-rouge">g.send()</code>只能传<code class="language-plaintext highlighter-rouge">None</code>这样的参数。</li>
</ol>
<p>接下来最重要的就是69到76行的代码。注意到上面对generator的操作会返回一个这个yield的值。如果yield出来的一个defer,那么表示这个时候yield后面跟的是一个异步调用,所以这个时候,<code class="language-plaintext highlighter-rouge">_inlineCallbacks</code>会将一个<code class="language-plaintext highlighter-rouge">gotResult</code>函数传进这个defer里面,这样当异步调用完成的时候,<code class="language-plaintext highlighter-rouge">gotResult</code>就会被调用并处理调用的结果。</p>
<p>在gotResult里面,忽略掉if waiting那一段,其实最后的就是调用回<code class="language-plaintext highlighter-rouge">_inlineCallback</code>自己。所以现在我们大概可以有下面一个执行顺序了:</p>
<p>当我们调用<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>的时候,最开始的时候是在<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>的里面调用了一次这个函数,而一个generator在开始的时候是直接返回一个generator object的。这个时候<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>就调用了<code class="language-plaintext highlighter-rouge">_inlineCallbacks(None, gen, Deferred())</code>。</p>
<p>这时进到<code class="language-plaintext highlighter-rouge">_inlineCallbacks</code>里面的时候就会走到20行,就是<code class="language-plaintext highlighter-rouge">result = g.send(None)</code>。这个语句是成立的。这个时候<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>就开始运行,直到调用到<code class="language-plaintext highlighter-rouge">transformShortUrlAsync</code>并且返回一个defer。这个时候就继续走到78行。也就是在这个defer上面添加gotResult函数。那么当这个defer被调用(也就是结果获得)的时候,gotResult就会获得这个结果,并继续执行<code class="language-plaintext highlighter-rouge">downloadVideoFromShortUrlAsync</code>下面的代码。</p>
<h1 id="分析">分析</h1>
<p>正如前面所讲,有了<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>之后,其实自己定义的函数并没有变成异步,只不过他将函数里面调用异步函数的地方自动的做了回调的处理,从而使得函数本身以一种“奇怪”的方式异步执行。</p>
<p>为什么可以有这种效果呢?我觉得主要有以下几点:</p>
<ol>
<li>AIO,也就是异步IO。这个可以说是实现这种语法结果的必要条件,因为当我们从调用异步函数的地方获得了一个defer之后,这时候并没有获得结果。而结果会在未来的某个时刻获得。而我们需要在获得结果的那个时刻,函数余下的部分可以继续执行,而这一个就是AIO的用法,我们就可以把获得结果的处理部分当做回调那样传递给这个IO操作,让他自动的在操作完成的时候调用这个回调。而在twisted里面,AIO的是使用事件循环来实现的。</li>
<li>generator。这个并不是实现<code class="language-plaintext highlighter-rouge">inlineCallbacks</code>这种语法结构的必要条件,就像Jscex里面就是通过修改语法树的方式来实现,因为Javascript里面是没有generator的。但是有了generator之后,就会发现实现这个结构会异常的简单,就像本身就应该是这么写的一样。可以说generator对于基于回调的一些实现都是很好的实现利器,至少我在inlineCallbacks这部分是真正的感受到了generator带来的方便。</li>
</ol>
<p>所以主要还是AIO的功劳,就像在Node.js里面,实现类似的功能是比较方便的,因为Node.js本身的IO都是AIO,所以只要修改语法树,就是可以达到这种效果。</p>
SSH协议详解
2012-06-28T00:00:00+00:00
http://airekans.github.io/protocol/2012/06/28/ssh-explained
<p>作为程序员,一定不会没有用过ssh吧。当我们需要远程登录到服务器上进行操作的时候,一般就会用ssh。
ssh是secure shell的简称,它相对于早起的telnet和rsh的明文传输,提供了加密、校验和压缩,使得我们可以很安全的远程操作,
而不用担心信息泄露(当然不是绝对的,加密总有可能被破解,只是比起明文来说那是强了不少)。</p>
<p>本文会详细的讲解SSH协议是怎么定义的,以及他是怎么实现安全的加密。</p>
<h1 id="几个基本概念">几个基本概念</h1>
<p>在介绍ssh协议之前,有几个涉及到的基本概念首先需要介绍,它们对于理解ssh协议本身有非常重要和关键的作用。</p>
<h2 id="加密">加密</h2>
<p>加密的意思是将一段数据经过处理之后,输出为一段外人无法或者很难破译的数据,除了指定的人可以解密之外。
一般来说,加密的输入还会有一个key,这个key作为加密的参数,
而在解密的时候也会用一个相关联(有可能是相同)的key作为输入。粗略来说是下面的流程:</p>
<figure class="highlight"><pre><code class="language-py" data-lang="py"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="c1"># 加密方
</span><span class="n">encrypted_data</span> <span class="o">=</span> <span class="n">encrypt</span><span class="p">(</span><span class="n">raw_data</span><span class="p">,</span> <span class="n">key</span><span class="p">)</span>
<span class="c1"># 解密方
</span><span class="n">raw_data</span> <span class="o">=</span> <span class="n">decrypt</span><span class="p">(</span><span class="n">encrypted_data</span><span class="p">,</span> <span class="n">key1</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>目前主流的加密算法一般分为下面两类:</p>
<ol>
<li><a href="http://en.wikipedia.org/wiki/Symmetric-key_algorithm">私钥(secret key)加密</a>,也称为对称加密</li>
<li><a href="http://en.wikipedia.org/wiki/Public-key_encryption">公钥(public key)加密</a></li>
</ol>
<h2 id="私钥加密">私钥加密</h2>
<p>所谓的私钥加密,是说加密方和解密方用的都是同一个key,这个key对于加密方和解密方来说是保密的,第三方是不能知道的。在第三方不知道私钥的情况下,是很难将加密的数据解密的。一般来说是加密方先产生私钥,然后通过一个安全的途径来告知解密方这个私钥。</p>
<h2 id="公钥加密">公钥加密</h2>
<p>公钥加密,是说解密的一方首先生成一对密钥,一个私钥一个公钥,私钥不会泄漏出去,而公钥则是可以任意的对外发布的。用公钥进行加密的数据,只能用私钥才能解密。加密方首先从解密方获取公钥,然后利用这个公钥进行加密,把数据发送给解密方。解密方利用私钥进行解密。如果解密的数据在传输的过程中被第三方截获,也不用担心,因为第三方没有私钥,没有办法进行解密。</p>
<p>公钥加密的问题还包括获取了公钥之后,加密方如何保证公钥来自于确定的一方,而不是某个冒充的机器。假设公钥不是来自我们信任的机器,那么就算我们用公钥加密也没有用,因为加密之后的数据是发送给了冒充的机器,该机器就可以利用它产生的私钥进行解密了。所以公钥加密里面比较重要的一步是身份认证。</p>
<p>需要说明一下,一般的私钥加密都会比公钥加密快,所以大数据量的加密一般都会使用私钥加密,而公钥加密会作为身份验证和交换私钥的一个手段。</p>
<h2 id="数据一致性完整性">数据一致性/完整性</h2>
<p>数据一致性说得是如何保证一段数据在传输的过程中没有遗漏、破坏或者修改过。一般来说,目前流行的做法是对数据进行hash,得到的hash值和数据一起传输,然后在收到数据的时候也对数据进行hash,将得到的hash值和传输过来的hash值进行比对,如果是不一样的,说明数据已经被修改过;如果是一样的,则说明极有可能是完整的。</p>
<p>目前流行的hash算法有<a href="http://en.wikipedia.org/wiki/MD5">MD5</a>和<a href="http://en.wikipedia.org/wiki/Sha1">SHA-1</a>算法。</p>
<h2 id="身份验证">身份验证</h2>
<p>身份验证说的是,判断一个人或者机器是不是就是你想要联系的。也就是说如果A想要和B通信,一般来说开始的时候会交换一些数据,A怎么可以判断发送回来的数据就真的是B发送的呢?现实中有很多方法可以假冒一个机器。</p>
<p>在SSH里面,这主要是通过公钥来完成的。首先客户端会有一个公钥列表,保存的是它信任的机器上面的公钥。在开始SSH连接之后,服务器会发送过来一个公钥,然后客户端就会进行查找,如果这个公钥在这个列表里面,就说明这个机器是真的服务器。</p>
<p>当然实际的情况会复杂一些。实际上服务器不是真的发送公钥过来,因为这很容易被第三方盗取。这个在下面会详细的讲述。</p>
<h1 id="ssh2协议概况">SSH2协议概况</h1>
<p>理解一个协议最好是从他的大概信息交流流程来了解。这个在《<a href="http://docstore.mik.ua/orelly/networking_2ndEd/ssh/index.htm">SSH: The Secure</a>》里面有很详细的说明,我从中摘取了几个主要的图来说明一下。</p>
<p>首先是一个主要的脉络图:</p>
<p><img src="http://docstore.mik.ua/orelly/networking_2ndEd/ssh/figs/ssh_0301.gif" alt="SSH overview" title="SSH overview" /></p>
<p>可以看到,里面有几个关键的key:</p>
<ol>
<li>session key: 这个是用来作为secret key加密用的一个key,同时也作为每个ssh连接的标识ID。</li>
<li>host key: 这个是用来作为server的身份验证用的。</li>
<li>known-hosts: 这个是存在客户端的一个可信server的public key列表。</li>
<li>user key: 这个是用来作为client的身份验证用的。</li>
</ol>
<p>当server和client交换了session key之后,所有的数据都会使用这个session来进行私钥加密。</p>
<p>上面的图是一个很粗略的描述,下面这个图是对SSH2协议的一个详细的描述:</p>
<p><img src="http://docstore.mik.ua/orelly/networking_2ndEd/ssh/figs/ssh_0304.gif" alt="SSH2 protocol details" title="SSH2 protocol details" /></p>
<p>上面这幅图大致的说明了SSH2协议的全景。首先SSH2协议分为3个子协议,分别是SSH-TRANS, SSH-AUTH和SSH-CONN。其中SSH-TRANS是传输协议,定义了传输的包和加密通道,其他两个协议是建立在这个协议之上的。</p>
<p>SSH-AUTH是SSH里面用于验证客户端身份的协议。我们在用ssh命令输入密码的那一步实际上就是在这个阶段。可以看到的是,虽然传输的是用户名和密码,但是由于这个协议建立在SSH-TRANS之上,所以内容都是加密的,可以放心的传输。</p>
<p>而SSH-CONN是真正的应用协议。在这里可以定义各种不同的协议,其中我们经常使用的scp、sftp还有正常的remote shell都是定义在这里的一种协议实现。这里的各种应用协议都要首先经过SSH-AUTH的验证之后才可以使用。</p>
<p>这个三个协议之间的关系可以用下面这幅图来说明:</p>
<p><img src="http://docstore.mik.ua/orelly/networking_2ndEd/ssh/figs/ssh_0305.gif" alt="SSH protocol relationship" title="SSH protocol relationship" /></p>
<p>其中SSH-TRANS是基本的协议,SSH-AUTH和SSH-CONN都是通过这个协议来实现安全加密的。虽然在时序上,SSH-CONN发生在SSH-AUTH之后,但是SSH-CONN并不依赖于SSH-AUTH。</p>
<h1 id="ssh-protocol">SSH Protocol</h1>
<h2 id="ssh-trans">SSH-TRANS</h2>
<p>首先介绍一下SSH-TRANS的基本结构。在客户端连接上SSH服务器之后,会进行下面协议通信:</p>
<ol>
<li>
<p>客户端和服务端都向对方发送一个ssh版本字符串。字符串的格式如下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSH-protoversion-softwareversion SP comments CR LF
</code></pre></div> </div>
<p>其中comment是可选的。
一般来说,目前用的ssh服务器和客户端一般都是支持SSH2,所以一个开始的version string一般就像下面这样:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SSH-2.0-OpenSSH CR LF
</code></pre></div> </div>
</li>
<li>
<p>接下来的通信都用SSH自身定义的一个Binary Packet Protocol进行通信。这个Binary Packet Protocol其实就是将所有的用户数据都加上长度头,然后再进行加密。一个Packet的定义如下:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>uint32 packet_length
byte padding_length
byte[n1] payload; n1 = packet_length - padding_length - 1
byte[n2] random padding; n2 = padding_length
byte[m] mac (Message Authentication Code - MAC); m = mac_length
</code></pre></div> </div>
<p>实际上所有的数据都放在payload里面。最后的mac是用来给数据计算校验码用的。</p>
</li>
<li>
<p>在传输完ssh version string之后,客户端和服务端会开始进行key exchange,简称kex。Kex是用来让客户端和服务器生成本次通信的密钥和session ID的。
在kex之后,服务器和客户端都有一个key和hash,而私钥加密用的secret key就是通过这两个值来生成的。
具体的算法这里就不阐述了,可以去看SSH-TRANS的RFC[2]。在kex的最后一步,服务器会给客户端发送他自己的public key。
而客户端会通过在自己的known_hosts里面查找这个public key来验证服务器的身份。
至此,服务器和客户端都用来secret key,所以接下来所以数据都会进行加密,而不用担心信息泄露。
在kex之后,客户端就可以开始进行SSH-AUTH,也就是叫服务器验证自己的身份。</p>
</li>
</ol>
<h2 id="ssh-auth">SSH-AUTH</h2>
<p>在客户端的身份认证中,有3种预先定义好的方法可以用。</p>
<ol>
<li>public key</li>
<li>password</li>
<li>hostbased</li>
</ol>
<p>其中前两种是我们平常最常用的:password就是一般的密码验证,而public key就是一般的无密码验证。
当服务器成功的验证了客户端的身份之后,就会开始客户端请求的服务(service)了。
需要注意的是,服务器的验证方式并不是说3种方式任选其一,而是可以组合的。也就是说,服务器可以要求客户端同时通过Password和public key两种方式的认证。</p>
<h2 id="ssh-conn">SSH-CONN</h2>
<p>这个也就是我们最后用到的一个服务的协议定义了。最常用的包括shell, port forwarding,X11 forwarding等等。</p>
<p>在SSH-CONN里面最重要的就是Channel的机制了。在SSH-CONN里面,和服务器的通信基本上都是通过建立channel来通信的。
多个channel共享同一个ssh session。SSH协议自身定义如何负责多个channel之间消息的分发。
对于使用者来说只需要开多个channel就可以了。
比如说普通在ssh的客户端开启port forwarding的时候,就会开启一个shell channel和一个forwarding channel。
这一part对于程序员来说都是比较熟悉的。</p>
<h1 id="library">Library</h1>
<p>目前看的ssh的库主要有<a href="http://www.libssh.org/">libssh</a>和<a href="http://www.libssh2.org/">libssh2</a>。其中的比较可以在<a href="http://www.libssh2.org/libssh2-vs-libssh.html">这里</a>找到。从接口上来说,
libssh2的接口定义比较清晰,不过libssh2只能用于client端的开发,而libssh可以进行server和client端的开发。
而且libssh2的文档比libssh的文档要差些。在做开发的时候文档是一个很关键的因素。</p>
<h1 id="references">References</h1>
<ol>
<li><a href="http://docstore.mik.ua/orelly/networking_2ndEd/ssh/index.htm">SSH: The Secure Shell</a></li>
<li><a href="http://tools.ietf.org/html/rfc4253" title="SSH-TRANS">SSH-TRANS</a></li>
<li><a href="http://tools.ietf.org/html/rfc4251" title="SSH-ARCH">SSH-ARCH</a></li>
<li><a href="http://tools.ietf.org/html/rfc4252" title="SSH-AUTH">SSH-AUTH</a></li>
<li><a href="http://tools.ietf.org/html/rfc4254" title="SSH-CONN">SSH-CONN</a></li>
</ol>
Non blocking Queue的实现
2012-05-28T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/28/implementation-of-non-blocking-queue
<p>之前在实现Tpool的时候就实现过一个用pthread_cond_signal/wait的BlockingQueue。而在多线程程序里面,用到队列的地方无数,对队列的并发要求也各不相同。实现一个简单点的线程池,在吞吐量不高的情况下用BlockingQueue还是没有什么问题的。但是在吞吐量大的情况下,用锁实现的Queue会因为加锁/解锁的开销成为性能瓶颈。</p>
<p>为了解决这个问题,就出现了Lock-free的队列实现,也称为Non Blocking Queue。本文主要讲解实现算法的一些细节。</p>
<h1 id="并发队列的实现形式">并发队列的实现形式</h1>
<p>并发队列在实现上,一般有下面几种:</p>
<ol>
<li>Single Lock队列:用一把锁,锁住队列的Enqueue和Dequeue操作。</li>
<li>Double Lock队列:用两把锁分别锁住Enqueue和Dequeue操作。</li>
<li>Lock Free队列(Non Blocking Queue):完全不用锁来进行Enqueue和Dequeue的同步。</li>
</ol>
<p>可以看到,对于Single Lock来说,只要线程数量多了,Enqueue和Dequeue操作数量一上去,那么这个锁就会成为了瓶颈。</p>
<p>Double Lock则解决了一部分问题,使得Enqueue和Dequeue的锁分开,只会在多个Enqueue和多个Dequeue之间产生互斥。则使得在Enqueue和Dequeue的速率相差不大的情况下,吞吐量会提高不少。</p>
<p>但是Double Lock仍然在入队和出队操作本身之间存在着互斥,在多个消费者之间仍然会有瓶颈。</p>
<p>Lock free则完全将这些互斥减到最小的程度。</p>
<h1 id="non-blocking-queue的实现">Non Blocking Queue的实现</h1>
<p>在实现上,Non Blocking Queue的数据结构的实现是和Double Lock的实现相同的,可以参照<a href="http://www.parallellabs.com/2010/10/25/practical-concurrent-queue-algorithm/" title="多线程队列的算法优化">冠诚的文章</a>去了解一下。</p>
<p>粗略的展示一下实现代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
</pre></td><td class="code"><pre><span class="k">typedef</span> <span class="k">struct</span> <span class="n">node_t</span> <span class="p">{</span>
<span class="n">TYPE</span> <span class="n">value</span><span class="p">;</span>
<span class="n">node_t</span> <span class="o">*</span><span class="n">next</span>
<span class="p">}</span> <span class="n">NODE</span><span class="p">;</span>
<span class="k">typedef</span> <span class="k">struct</span> <span class="n">queue_t</span> <span class="p">{</span>
<span class="n">NODE</span> <span class="o">*</span><span class="n">head</span><span class="p">;</span>
<span class="n">NODE</span> <span class="o">*</span><span class="n">tail</span><span class="p">;</span>
<span class="n">LOCK</span> <span class="n">q_h_lock</span><span class="p">;</span>
<span class="n">LOCK</span> <span class="n">q_t_lock</span><span class="p">;</span>
<span class="p">}</span> <span class="n">Q</span><span class="p">;</span>
<span class="n">initialize</span><span class="p">(</span><span class="n">Q</span> <span class="o">*</span><span class="n">q</span><span class="p">)</span> <span class="p">{</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">()</span> <span class="c1">// Allocate a free node</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="nb">NULL</span> <span class="c1">// Make it the only node in the linked list</span>
<span class="n">q</span><span class="o">-></span><span class="n">head</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">tail</span> <span class="o">=</span> <span class="n">node</span> <span class="c1">// Both head and tail point to it</span>
<span class="n">q</span><span class="o">-></span><span class="n">q_h_lock</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">q_t_lock</span> <span class="o">=</span> <span class="n">FREE</span> <span class="c1">// Locks are initially free</span>
<span class="p">}</span>
<span class="n">enqueue</span><span class="p">(</span><span class="n">Q</span> <span class="o">*</span><span class="n">q</span><span class="p">,</span> <span class="n">TYPE</span> <span class="n">value</span><span class="p">)</span> <span class="p">{</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">()</span> <span class="c1">// Allocate a new node from the free list</span>
<span class="n">node</span><span class="o">-></span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="c1">// Copy enqueued value into node</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="nb">NULL</span> <span class="c1">// Set next pointer of node to NULL</span>
<span class="n">lock</span><span class="p">(</span><span class="o">&</span><span class="n">q</span><span class="o">-></span><span class="n">q_t_lock</span><span class="p">)</span> <span class="c1">// Acquire t_lock in order to access Tail</span>
<span class="n">q</span><span class="o">-></span><span class="n">tail</span><span class="o">-></span><span class="n">next</span> <span class="o">=</span> <span class="n">node</span> <span class="c1">// Link node at the end of the queue</span>
<span class="n">q</span><span class="o">-></span><span class="n">tail</span> <span class="o">=</span> <span class="n">node</span> <span class="c1">// Swing Tail to node</span>
<span class="n">unlock</span><span class="p">(</span><span class="o">&</span><span class="n">q</span><span class="o">-></span><span class="n">q_t_lock</span><span class="p">)</span> <span class="c1">// Release t_lock</span>
<span class="err">}</span>
<span class="n">dequeue</span><span class="p">(</span><span class="n">Q</span> <span class="o">*</span><span class="n">q</span><span class="p">,</span> <span class="n">TYPE</span> <span class="o">*</span><span class="n">pvalue</span><span class="p">)</span> <span class="p">{</span>
<span class="n">lock</span><span class="p">(</span><span class="o">&</span><span class="n">q</span><span class="o">-></span><span class="n">q_h_lock</span><span class="p">)</span> <span class="c1">// Acquire h_lock in order to access Head</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">q</span><span class="o">-></span><span class="n">head</span> <span class="c1">// Read Head</span>
<span class="n">new_head</span> <span class="o">=</span> <span class="n">node</span><span class="o">-></span><span class="n">next</span> <span class="c1">// Read next pointer</span>
<span class="k">if</span> <span class="n">new_head</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="c1">// Is queue empty?</span>
<span class="n">unlock</span><span class="p">(</span><span class="o">&</span><span class="n">q</span><span class="o">-></span><span class="n">q_h_lock</span><span class="p">)</span> <span class="c1">// Release h_lock before return</span>
<span class="k">return</span> <span class="n">FALSE</span> <span class="c1">// Queue was empty</span>
<span class="n">endif</span>
<span class="o">*</span><span class="n">pvalue</span> <span class="o">=</span> <span class="n">new_head</span><span class="o">-></span><span class="n">value</span> <span class="c1">// Queue not empty, read value</span>
<span class="n">q</span><span class="o">-></span><span class="n">head</span> <span class="o">=</span> <span class="n">new_head</span> <span class="c1">// Swing Head to next node</span>
<span class="n">unlock</span><span class="p">(</span><span class="o">&</span><span class="n">q</span><span class="o">-></span><span class="n">q_h_lock</span><span class="p">)</span> <span class="c1">// Release h_lock</span>
<span class="n">free</span><span class="p">(</span><span class="n">node</span><span class="p">)</span> <span class="c1">// Free node</span>
<span class="k">return</span> <span class="n">TRUE</span> <span class="c1">// Queue was not empty, dequeue succeeded</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而对于Non Blocking Queue,最核心的操作是一个叫做Compare And Swap(简称CAS)的操作。这个操作用C++来表示大概是下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="k">template</span> <span class="o"><</span><span class="k">typename</span> <span class="n">T</span><span class="o">></span>
<span class="kt">bool</span> <span class="nf">CompareAndSwap</span><span class="p">(</span><span class="n">T</span><span class="o">*</span> <span class="n">dest</span><span class="p">,</span> <span class="n">T</span> <span class="n">oldValue</span><span class="p">,</span> <span class="n">T</span> <span class="n">newValue</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">dest</span> <span class="o">==</span> <span class="n">oldValue</span><span class="p">)</span>
<span class="p">{</span>
<span class="o">*</span><span class="n">dest</span> <span class="o">=</span> <span class="n">newValue</span><span class="p">;</span>
<span class="k">return</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="nb">false</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>咋一看好像没什么大不了的,但是要注意到这个操作上在某些硬件上是实现成一条指令的, 所以可以保证这个操作是原子的。在X86的CPU上,这个指令是CMPXCHG。</p>
<p>有了这条指令,我们就可以用它来实现很多原本必须在加锁的情况下才可以实现的并发算法,其中Non Block Queue也就是使用了它。</p>
<p>在著名的《Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms》论文里面,就有如下的Non Blocking Queue实现伪码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
</pre></td><td class="code"><pre><span class="n">structure</span> <span class="n">pointer_t</span> <span class="p">{</span><span class="n">ptr</span><span class="o">:</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">node_t</span><span class="p">,</span> <span class="n">count</span><span class="o">:</span> <span class="kt">unsigned</span> <span class="n">integer</span><span class="p">}</span>
<span class="n">structure</span> <span class="n">node_t</span> <span class="p">{</span><span class="n">value</span><span class="o">:</span> <span class="n">data</span> <span class="n">type</span><span class="p">,</span> <span class="n">next</span><span class="o">:</span> <span class="n">pointer_t</span><span class="p">}</span>
<span class="n">structure</span> <span class="n">queue_t</span> <span class="p">{</span><span class="n">Head</span><span class="o">:</span> <span class="n">pointer_t</span><span class="p">,</span> <span class="n">Tail</span><span class="o">:</span> <span class="n">pointer_t</span><span class="p">}</span>
<span class="n">initialize</span><span class="p">(</span><span class="n">Q</span><span class="o">:</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">queue_t</span><span class="p">)</span>
<span class="n">node</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">()</span> <span class="c1">// Allocate a free node</span>
<span class="n">node</span><span class="o">-></span><span class="n">next</span><span class="p">.</span><span class="n">ptr</span> <span class="o">=</span> <span class="nb">NULL</span> <span class="c1">// Make it the only node in the linked list</span>
<span class="n">Q</span><span class="o">-></span><span class="n">Head</span><span class="p">.</span><span class="n">ptr</span> <span class="o">=</span> <span class="n">Q</span><span class="o">-></span><span class="n">Tail</span><span class="p">.</span><span class="n">ptr</span> <span class="o">=</span> <span class="n">node</span> <span class="c1">// Both Head and Tail point to it</span>
<span class="n">enqueue</span><span class="p">(</span><span class="n">Q</span><span class="o">:</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">queue_t</span><span class="p">,</span> <span class="n">value</span><span class="o">:</span> <span class="n">data</span> <span class="n">type</span><span class="p">)</span>
<span class="n">E1</span><span class="o">:</span> <span class="n">node</span> <span class="o">=</span> <span class="n">new_node</span><span class="p">()</span> <span class="c1">// Allocate a new node from the free list</span>
<span class="n">E2</span><span class="o">:</span> <span class="n">node</span><span class="o">-></span><span class="n">value</span> <span class="o">=</span> <span class="n">value</span> <span class="c1">// Copy enqueued value into node</span>
<span class="n">E3</span><span class="o">:</span> <span class="n">node</span><span class="o">-></span><span class="n">next</span><span class="p">.</span><span class="n">ptr</span> <span class="o">=</span> <span class="nb">NULL</span> <span class="c1">// Set next pointer of node to NULL</span>
<span class="n">E4</span><span class="o">:</span> <span class="n">loop</span> <span class="c1">// Keep trying until Enqueue is done</span>
<span class="n">E5</span><span class="o">:</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">Q</span><span class="o">-></span><span class="n">Tail</span> <span class="c1">// Read Tail.ptr and Tail.count together</span>
<span class="n">E6</span><span class="o">:</span> <span class="n">next</span> <span class="o">=</span> <span class="n">tail</span><span class="p">.</span><span class="n">ptr</span><span class="o">-></span><span class="n">next</span> <span class="c1">// Read next ptr and count fields together</span>
<span class="n">E7</span><span class="o">:</span> <span class="k">if</span> <span class="n">tail</span> <span class="o">==</span> <span class="n">Q</span><span class="o">-></span><span class="n">Tail</span> <span class="c1">// Are tail and next consistent?</span>
<span class="c1">// Was Tail pointing to the last node?</span>
<span class="n">E8</span><span class="o">:</span> <span class="k">if</span> <span class="n">next</span><span class="p">.</span><span class="n">ptr</span> <span class="o">==</span> <span class="nb">NULL</span>
<span class="c1">// Try to link node at the end of the linked list</span>
<span class="n">E9</span><span class="o">:</span> <span class="k">if</span> <span class="n">CAS</span><span class="p">(</span><span class="o">&</span><span class="n">tail</span><span class="p">.</span><span class="n">ptr</span><span class="o">-></span><span class="n">next</span><span class="p">,</span> <span class="n">next</span><span class="p">,</span> <span class="o"><</span><span class="n">node</span><span class="p">,</span> <span class="n">next</span><span class="p">.</span><span class="n">count</span><span class="o">%</span><span class="mi">2</span><span class="n">B1</span><span class="o">></span><span class="p">)</span>
<span class="n">E10</span><span class="o">:</span> <span class="k">break</span> <span class="c1">// Enqueue is done. Exit loop</span>
<span class="n">E11</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">E12</span><span class="o">:</span> <span class="k">else</span> <span class="c1">// Tail was not pointing to the last node</span>
<span class="c1">// Try to swing Tail to the next node</span>
<span class="n">E13</span><span class="o">:</span> <span class="n">CAS</span><span class="p">(</span><span class="o">&</span><span class="n">Q</span><span class="o">-></span><span class="n">Tail</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="o"><</span><span class="n">next</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="n">tail</span><span class="p">.</span><span class="n">count</span><span class="o">%</span><span class="mi">2</span><span class="n">B1</span><span class="o">></span><span class="p">)</span>
<span class="n">E14</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">E15</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">E16</span><span class="o">:</span> <span class="n">endloop</span>
<span class="c1">// Enqueue is done. Try to swing Tail to the inserted node</span>
<span class="n">E17</span><span class="o">:</span> <span class="n">CAS</span><span class="p">(</span><span class="o">&</span><span class="n">Q</span><span class="o">-></span><span class="n">Tail</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="o"><</span><span class="n">node</span><span class="p">,</span> <span class="n">tail</span><span class="p">.</span><span class="n">count</span><span class="o">%</span><span class="mi">2</span><span class="n">B1</span><span class="o">></span><span class="p">)</span>
<span class="n">dequeue</span><span class="p">(</span><span class="n">Q</span><span class="o">:</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">queue_t</span><span class="p">,</span> <span class="n">pvalue</span><span class="o">:</span> <span class="n">pointer</span> <span class="n">to</span> <span class="n">data</span> <span class="n">type</span><span class="p">)</span><span class="o">:</span> <span class="n">boolean</span>
<span class="n">D1</span><span class="o">:</span> <span class="n">loop</span> <span class="c1">// Keep trying until Dequeue is done</span>
<span class="n">D2</span><span class="o">:</span> <span class="n">head</span> <span class="o">=</span> <span class="n">Q</span><span class="o">-></span><span class="n">Head</span> <span class="c1">// Read Head</span>
<span class="n">D3</span><span class="o">:</span> <span class="n">tail</span> <span class="o">=</span> <span class="n">Q</span><span class="o">-></span><span class="n">Tail</span> <span class="c1">// Read Tail</span>
<span class="n">D4</span><span class="o">:</span> <span class="n">next</span> <span class="o">=</span> <span class="n">head</span><span class="p">.</span><span class="n">ptr</span><span class="o">-></span><span class="n">next</span> <span class="c1">// Read Head.ptr->next</span>
<span class="n">D5</span><span class="o">:</span> <span class="k">if</span> <span class="n">head</span> <span class="o">==</span> <span class="n">Q</span><span class="o">-></span><span class="n">Head</span> <span class="c1">// Are head, tail, and next consistent?</span>
<span class="n">D6</span><span class="o">:</span> <span class="k">if</span> <span class="n">head</span><span class="p">.</span><span class="n">ptr</span> <span class="o">==</span> <span class="n">tail</span><span class="p">.</span><span class="n">ptr</span> <span class="c1">// Is queue empty or Tail falling behind?</span>
<span class="n">D7</span><span class="o">:</span> <span class="k">if</span> <span class="n">next</span><span class="p">.</span><span class="n">ptr</span> <span class="o">==</span> <span class="nb">NULL</span> <span class="c1">// Is queue empty?</span>
<span class="n">D8</span><span class="o">:</span> <span class="k">return</span> <span class="n">FALSE</span> <span class="c1">// Queue is empty, couldn't dequeue</span>
<span class="n">D9</span><span class="o">:</span> <span class="n">endif</span>
<span class="c1">// Tail is falling behind. Try to advance it</span>
<span class="n">D10</span><span class="o">:</span> <span class="n">CAS</span><span class="p">(</span><span class="o">&</span><span class="n">Q</span><span class="o">-></span><span class="n">Tail</span><span class="p">,</span> <span class="n">tail</span><span class="p">,</span> <span class="o"><</span><span class="n">next</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="n">tail</span><span class="p">.</span><span class="n">count</span><span class="o">%</span><span class="mi">2</span><span class="n">B1</span><span class="o">></span><span class="p">)</span>
<span class="n">D11</span><span class="o">:</span> <span class="k">else</span> <span class="c1">// No need to deal with Tail</span>
<span class="c1">// Read value before CAS</span>
<span class="c1">// Otherwise, another dequeue might free the next node</span>
<span class="n">D12</span><span class="o">:</span> <span class="o">*</span><span class="n">pvalue</span> <span class="o">=</span> <span class="n">next</span><span class="p">.</span><span class="n">ptr</span><span class="o">-></span><span class="n">value</span>
<span class="c1">// Try to swing Head to the next node</span>
<span class="n">D13</span><span class="o">:</span> <span class="k">if</span> <span class="n">CAS</span><span class="p">(</span><span class="o">&</span><span class="n">Q</span><span class="o">-></span><span class="n">Head</span><span class="p">,</span> <span class="n">head</span><span class="p">,</span> <span class="o"><</span><span class="n">next</span><span class="p">.</span><span class="n">ptr</span><span class="p">,</span> <span class="n">head</span><span class="p">.</span><span class="n">count</span><span class="o">%</span><span class="mi">2</span><span class="n">B1</span><span class="o">></span><span class="p">)</span>
<span class="n">D14</span><span class="o">:</span> <span class="k">break</span> <span class="c1">// Dequeue is done. Exit loop</span>
<span class="n">D15</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">D16</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">D17</span><span class="o">:</span> <span class="n">endif</span>
<span class="n">D18</span><span class="o">:</span> <span class="n">endloop</span>
<span class="n">D19</span><span class="o">:</span> <span class="n">free</span><span class="p">(</span><span class="n">head</span><span class="p">.</span><span class="n">ptr</span><span class="p">)</span> <span class="c1">// It is safe now to free the old node</span>
<span class="n">D20</span><span class="o">:</span> <span class="k">return</span> <span class="n">TRUE</span> <span class="c1">// Queue was not empty, dequeue succeeded</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中Enqueue操作最重要的是E9行,Dequeue操作最重要的是D13行。</p>
<h1 id="enqueue">Enqueue</h1>
<p>(E1-E3)首先,无论在Double lock还是Lock free的队列算法里面,enqueue操作都要求先把一个节点分配并设置好,然后再把这个节点放到队列里面,这样可以用尽量少的操作把节点完整的添加到队列里。</p>
<p>(E5-E6)然后,线程尝试从Q里面取出尾节点,并把next指针也一并取出来。需要注意的是,<code class="language-plaintext highlighter-rouge">Q->tail</code>总是指针队列里面的元素,但是并不总是指着尾节点,但是在操作中,<code class="language-plaintext highlighter-rouge">Q->tail</code>总是尝试尽可能的接近并指向尾节点。</p>
<p>(E7-E11)这几行主要是看CAS操作成功有些什么前提条件。首先CAS比较的是<code class="language-plaintext highlighter-rouge">tail.ptr->next</code>的值,而上面一行的if判断就表明,这个时候的<code class="language-plaintext highlighter-rouge">tail.ptr->next</code>一定是指向NULL,否则CAS操作是不能成功的。一旦CAS操作成功,也就意味着新节点已经被添加到队列的尾部。注意CAS保证了这个比较并设置的过程是原子性的。当添加成功之后,就可以跳出循环,准备结束enqueue。注意这个时候虽然插入了新的节点,但是没有更新<code class="language-plaintext highlighter-rouge">Q->tail</code>的值。</p>
<p>(E12-E14)这几行会在<code class="language-plaintext highlighter-rouge">tail == Q->tail</code>且<code class="language-plaintext highlighter-rouge">tail.ptr->next != NULL</code>的时候执行。这个条件意味着在取出tail的值之后,别的线程已经往队列里面添加了新的节点,但是<code class="language-plaintext highlighter-rouge">Q->tail</code>节点有可能没有更新。于是在这个条件下,线程就尝试更新<code class="language-plaintext highlighter-rouge">Q->tail</code>的值,使其往后挪动(利用CAS操作来更新<code class="language-plaintext highlighter-rouge">Q->tail</code>),尽量的指向队列的尾节点。</p>
<p>(E17)这一行其实和上面类似,只不过这是在加入了新节点之后,该线程尝试更新<code class="language-plaintext highlighter-rouge">Q->tail</code>,使其指向尾节点。这里也需要利用CAS操作,因为有可能在E9行成功加入新节点之后,另一个线程则走到了E13行,这个时候另外这个线程成功更新了<code class="language-plaintext highlighter-rouge">Q->tail</code>。所以当当前线程走到E17行的时候,有可能<code class="language-plaintext highlighter-rouge">Q->tail</code>已经被更新了,所以就需要使用CAS来检查值并更新。</p>
<p>从上面可以知道,在插入新节点的时候,插入点总是在最后,并且在插入之后,会把<code class="language-plaintext highlighter-rouge">Q->tail</code>尽可能的往后挪。</p>
<h1 id="dequeue">Dequeue</h1>
<p>Dequeue函数有一个很重要的假设是<code class="language-plaintext highlighter-rouge">Q->head</code>总是指向队列的头结点。Dequeue的策略是,head节点指向的是一个假节点,实际的头结点是head的next节点。在dequeue的时候,首先将<code class="language-plaintext highlighter-rouge">head->next</code>的值取出,作为返回值,然后将head节点取出并释放,此时原本的next节点作为新的head节点。</p>
<p>(D2-D4)在开始其他操作之前,需要先把头结点<code class="language-plaintext highlighter-rouge">Q->head</code>和它的next节点取出来。这里还取了<code class="language-plaintext highlighter-rouge">Q->tail</code>节点,是因为需要判断队列是否为空,和<code class="language-plaintext highlighter-rouge">Q->tail</code>此时是否指向了尾节点。</p>
<p>(D5-D9)在<code class="language-plaintext highlighter-rouge">head.ptr == tail.ptr</code>并且<code class="language-plaintext highlighter-rouge">next.ptr == NULL</code>的时候,表示这个时候队列里面只有一个假节点,也就是说这个时候队列为空,所以这个时候就返回false。</p>
<p>(D10)走到了这一行,说明了这个时候<code class="language-plaintext highlighter-rouge">head.ptr == tail.ptr</code>但是<code class="language-plaintext highlighter-rouge">next.ptr != NULL</code>。也就是说,队列中这时不只一个节点,但是<code class="language-plaintext highlighter-rouge">tail.ptr</code>却和<code class="language-plaintext highlighter-rouge">head.ptr</code>指向同一个节点,所以这个时候<code class="language-plaintext highlighter-rouge">tail.ptr</code>的指向是落后于尾节点的。所以在这里就尝试将tail往后挪动,使其尽量的靠近尾节点。</p>
<p>(D11-D15)线程走到这个分支就表示队列此时有两个节点以上。这个时候先将next的节点的值取出,然后尝试将头结点指向next节点(通过CAS实现)。如果CAS操作成功了,就表示节点操作成功,这个时候就可以安全的返回值了。如果没有成功,就表示这个时候头结点已经被别的线程修改了,取值操作就失效了,所以就需要重新循环一次。</p>
<p>(D19)既然在D13的时候,CAS已经确保了原head节点不在队列里面,这个时候就可以把这个原来的节点删除。</p>
<p>从上面的讲述可以看书,Non Blocking Queue的实现上是通过轮询来解决竞态条件的。如果在之前取出的状态不满足队列操作当时的假设的话,就通过重新执行一次来继续进行操作。而CAS则保证了在执行队列操作过程中的原子性。</p>
<p>当然CAS操作是Lock free算法的很重要的一步,但是要实现Lock free算法是极其困难的一件事情。要保证其正确性,要从各个方面来进行测试和验证。冠诚曾经提到Doug Lea在实现java.util.concurrent里面的LinkedBlockingQueue的时候,是要用一个人年来实现的。所以在想要用Lock free算法的时候,应该尽量使用现有的算法,而不是重造轮子。</p>
<h1 id="references">References:</h1>
<ol>
<li><a href="http://www.parallellabs.com/2010/10/25/practical-concurrent-queue-algorithm/" title="多线程队列的算法优化">多线程队列的算法优化</a></li>
<li><a href="http://www.cs.rochester.edu/research/synchronization/pseudocode/queues.html" title="Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms">Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms</a></li>
<li><a href="https://www.ibm.com/developerworks/java/library/j-jtp11234/">https://www.ibm.com/developerworks/java/library/j-jtp11234/</a></li>
<li><a href="http://www.ibm.com/developerworks/java/library/j-jtp04186/index.html">http://www.ibm.com/developerworks/java/library/j-jtp04186/index.html</a></li>
<li><a href="http://www.codeproject.com/Articles/23317/Lock-Free-Queue-implementation-in-C-and-C">http://www.codeproject.com/Articles/23317/Lock-Free-Queue-implementation-in-C-and-C</a></li>
</ol>
线程池库Tpool实现笔记(6)
2012-05-23T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/23/implementation-of-tpool6
<p>前几节已经介绍了Tpool里面基本所有的实现细节,接下来我会谈谈我在实现Tpool的过程中的一些测试经验。</p>
<p>首先,多线程程序有一个最难测试的地方,就是他的不可预测性。也就是说在程序运行的过程中,没有办法准确的知道当前线程是哪一个,以及运行到哪里。因为这些都是跟系统当前的调度策略和环境有关的。这也成为了多线程程序最难测试和调试的一点,当然现在的debugger都有一些命令可以支持线程调度的限制,但是这仍然没有降低编程的难度。</p>
<p>所以除了手工的测试之外,最重要的保障就是编写单元测试。但是单元测试仍然具有不可预测性,而作为程序员,我们应该要在单元测试的时候要尽量去重现线程的竞态条件。而我在这里使用<code class="language-plaintext highlighter-rouge">sleep</code>来实现。</p>
<p>有一个点我需要说明一下,我觉得多线程程序里面如果只是针对运算的结果来检查函数是否做了预期的工作的话,是远远达不到测试的要求的。因为结果正确不一定表示函数没有出问题。比如说一个线程池函数的任务是往线程池里面加入任务并执行,如果我测试的时候加入两个对全局变量i执行%2B%2Bi的任务,再加入两个对这个变量执行–i的任务,那么在运行结束之后检查i等于初始值是不能说明线程池正常执行的。因为有可能在执行<code class="language-plaintext highlighter-rouge">++i</code>的时候,任务没有加锁,而导致两个任务同时读到了同一个值,所以执行两个任务之后i的值只比之前多了1,而不是2。但是之后两个–i也发生了同样的情况,也只是对i剪了1,所以最终结果还是和预期的一样。所以在测试多线程的时候,要把测试的范围定位到最细的粒度上,而且要尽量的去创造出错的环境。</p>
<p>比如我在测试<code class="language-plaintext highlighter-rouge">WorkerThread</code>的时候,要测试<code class="language-plaintext highlighter-rouge">Cancel</code>这个方法。而<code class="language-plaintext highlighter-rouge">Cancel</code>这个方法的定义是<code class="language-plaintext highlighter-rouge">Cancel</code>从开始调用一直block住,直到<code class="language-plaintext highlighter-rouge">WorkerThread</code>结束执行之后。</p>
<p>这个<code class="language-plaintext highlighter-rouge">Cancel</code>会在什么情况下会出现问题?或者说我的<code class="language-plaintext highlighter-rouge">Cancel</code>函数是在什么情况下才有用呢?最重要的场景是当线程正在运行,而这个时候我从另外一个线程调用<code class="language-plaintext highlighter-rouge">Cancel</code>。而在<code class="language-plaintext highlighter-rouge">Cancel</code>之后的任务将不会执行。</p>
<p>那么怎么重现这种条件呢?</p>
<p>首先明确要测试的条件是下面两个:</p>
<ol>
<li>从调用<code class="language-plaintext highlighter-rouge">Cancel</code>开始到结束,线程从开始运行的状态到结束的状态。</li>
<li><code class="language-plaintext highlighter-rouge">Cancel</code>之后,线程不再执行任务。</li>
</ol>
<p>上面的条件,可以用下面的场景来测试:一个任务的运行时间是2秒,且在两秒的最后会对一个全局变量i进行+1操作。往任务队列里面加入两个这样的任务。而我从主线程里面睡眠1秒之后,对工作者线程执行Cancel调用。</p>
<p>对于条件1,如果<code class="language-plaintext highlighter-rouge">Cancel</code>没有等待到线程结束就返回,那么在返回之后,全局变量i的值应该和线程运行之前没有变化。如果是正确执行的话,那么值应该是改变了的,并且比之前应该是大1。也就是只执行了一个任务,而第二个任务没有执行就返回了。</p>
<p>而对于条件2,就是在线程结束后,i的值仍然是和<code class="language-plaintext highlighter-rouge">Cancel</code>之后一样,并且任务队列里面应该还有一个任务。</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre><span class="n">TEST</span><span class="p">(</span><span class="n">WorkerThread</span><span class="p">,</span> <span class="n">test_Cancel</span><span class="p">)</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">counter</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">TaskQueueBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">q</span><span class="p">(</span><span class="k">new</span> <span class="n">LinearTaskQueue</span><span class="p">);</span>
<span class="p">{</span>
<span class="n">WorkerThread</span> <span class="n">t</span><span class="p">(</span><span class="n">q</span><span class="p">);</span>
<span class="n">q</span><span class="o">-></span><span class="n">Push</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span><span class="p">(</span><span class="k">new</span> <span class="n">TestTask</span><span class="p">(</span><span class="n">counter</span><span class="p">)));</span>
<span class="n">q</span><span class="o">-></span><span class="n">Push</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span><span class="p">(</span><span class="k">new</span> <span class="n">TestTask</span><span class="p">(</span><span class="n">counter</span><span class="p">)));</span>
<span class="n">sleep</span><span class="p">(</span><span class="mi">1</span><span class="p">);</span>
<span class="n">t</span><span class="p">.</span><span class="n">Cancel</span><span class="p">();</span>
<span class="c1">// expect WorkerThread run only one task</span>
<span class="n">ASSERT_EQ</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="p">}</span>
<span class="n">ASSERT_EQ</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">counter</span><span class="p">);</span>
<span class="n">ASSERT_EQ</span><span class="p">(</span><span class="mi">1</span><span class="p">,</span> <span class="n">q</span><span class="o">-></span><span class="n">Size</span><span class="p">());</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>可以看到在上面的单元测试里面,我通过故意的构造时序上的冲突来测试我的函数有没有达到我的预期,从而达到测试的目的。</p>
<p>至于活跃性测试[1],我暂时还没有在我的单元测试里面明确的去测试这一点。只是在测试过程中有不可预期hang住情况下回去看对应的单元测试。</p>
<h1 id="references">References</h1>
<ol>
<li>《JAVA并发编程实战》</li>
</ol>
线程池库Tpool实现笔记(5)
2012-05-22T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/22/implementation-of-tpool5
<p>在实现Tpool的过程中,除了主要的几个类——线程池、任务队列、任务、工作者线程之外,还需要一些辅助的工具类,主要有下面几个。</p>
<h1 id="mutex">Mutex</h1>
<p>mutex(互斥锁)是用来实现多线程同步的主要机制之一。而Linux里面的C接口用起来多少有点不方便(对于C++程序员),因为C++里面一般都会用<a href="http://en.wikipedia.org/wiki/Resource_Acquisition_Is_Initialization" title="Resource Acquisition Is Initialization">RAII</a>来实现资源的自动管理,否则管理成本会比较高。</p>
<p>在C++等支持RAII机制的语言里面,一般是写好获取和释放资源的函数,然后程序自动在某个上下文就帮你释放资源了。目前大多数的多线程库都是利用了RAII的技术来对C接口做一层封装,比如说boost::thread和wx。</p>
<p>在Tpool中也是类似,Mutex的定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">Mutex</span> <span class="o">:</span> <span class="k">private</span> <span class="n">boost</span><span class="o">::</span><span class="n">noncopyable</span> <span class="p">{</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">MutexLocker</span><span class="p">;</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">MutexWaitLocker</span><span class="p">;</span>
<span class="nl">public:</span>
<span class="n">Mutex</span><span class="p">();</span>
<span class="o">~</span><span class="n">Mutex</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="c1">// These two functions can only called by MutexLocker</span>
<span class="kt">void</span> <span class="n">Lock</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Unlock</span><span class="p">();</span>
<span class="n">pthread_mutex_t</span> <span class="n">m_mutex</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>注意到我将Lock和Unlock函数都设置为private,不对外部暴露,因为我觉得如果接口以暴露,程序员总有一种冲动去使用它,所以我在设计这个库的时候就是秉着尽量不让用户干坏事的原则来设计的。但是变成private之后有一个问题就是和他紧密相关的类也访问不了这些函数了,暂时我的解决方法是用friend来处理这个问题。</p>
<p>然后用的时候通过一个Locker来自动的把Mutex加锁和解锁:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="n">MutexLocker</span><span class="o">::</span><span class="n">MutexLocker</span><span class="p">(</span><span class="n">Mutex</span><span class="o">&</span> <span class="n">m</span><span class="p">)</span>
<span class="o">:</span> <span class="n">m_mutex</span><span class="p">(</span><span class="n">m</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">m_mutex</span><span class="p">.</span><span class="n">Lock</span><span class="p">();</span>
<span class="p">}</span>
<span class="n">MutexLocker</span><span class="o">::~</span><span class="n">MutexLocker</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">m_mutex</span><span class="p">.</span><span class="n">Unlock</span><span class="p">();</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<h1 id="conditionvariable">ConditionVariable</h1>
<p>条件变量是实现同步的重要手段。比如在实现任务队列的时候,假设当前的消费者线程想空的队列取任务的话,其中一种实现就是让线程block在那,然后等队列非空的事后再唤醒线程。</p>
<p>上面的等待,一个经典的实现就是通过条件变量来实现。在pthread里面有C的条件变量接口,而我在Tpool里面对其进行了简单的封装。</p>
<p>由于条件变量是与一个互斥锁联系起来的,所以我实现上要求在构造条件变量的时候就要传入一个Mutex。定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">ConditionVariable</span> <span class="o">:</span> <span class="k">private</span> <span class="n">boost</span><span class="o">::</span><span class="n">noncopyable</span> <span class="p">{</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">ConditionWaitLocker</span><span class="p">;</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">ConditionNotifyLocker</span><span class="p">;</span>
<span class="k">friend</span> <span class="k">class</span> <span class="nc">ConditionNotifyAllLocker</span><span class="p">;</span>
<span class="nl">public:</span>
<span class="k">explicit</span> <span class="n">ConditionVariable</span><span class="p">(</span><span class="n">Mutex</span><span class="o">&</span> <span class="n">m</span><span class="p">);</span>
<span class="o">~</span><span class="n">ConditionVariable</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="kt">void</span> <span class="n">Notify</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">NotifyAll</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Wait</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Lock</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Unlock</span><span class="p">();</span>
<span class="n">Mutex</span><span class="o">&</span> <span class="n">m_mutex</span><span class="p">;</span>
<span class="n">pthread_cond_t</span> <span class="n">m_cond</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>有定义可以看出,用户必须保证在<code class="language-plaintext highlighter-rouge">ConditionVariable</code>的生命周期内,Mutex必须一直有效(也就是Mutex的生命周期必须>=ConditionVariable的生命周期)。<br />
其中最重要的函数就是<code class="language-plaintext highlighter-rouge">Notify</code>和<code class="language-plaintext highlighter-rouge">NotifyAll</code>,分别是对<code class="language-plaintext highlighter-rouge">pthread_cond_signal</code>和<code class="language-plaintext highlighter-rouge">pthread_cond_broadcast</code>的简单封装。</p>
<p>而对于等待和唤醒,在UNPv2[1]里面有介绍过有几个经典的模式。而我在这里将这几个模式通过类的形式实现,从而减少用户出错的可能。</p>
<p>对于等待,是通过<code class="language-plaintext highlighter-rouge">ConditionWaitLocker</code>来实现的,用法如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">{</span>
<span class="n">sync</span><span class="o">::</span><span class="n">ConditionNotifyLocker</span> <span class="n">l</span><span class="p">(</span><span class="n">cond</span><span class="p">,</span> <span class="n">NotifyFunc</span><span class="p">());</span>
<span class="n">WAIT_CONDITION</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span> <span class="c1">// 设置条件为true</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而唤醒是通过<code class="language-plaintext highlighter-rouge">ConditionNotifyLocker</code>和<code class="language-plaintext highlighter-rouge">ConditionNotifyAllLocker</code>来使用,用法如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">{</span>
<span class="n">sync</span><span class="o">::</span><span class="n">ConditionNotifyLocker</span> <span class="n">l</span><span class="p">(</span><span class="n">condition</span><span class="p">,</span> <span class="n">NotifyFunc</span><span class="p">());</span>
<span class="n">WAIT_CONDITION</span> <span class="o">=</span> <span class="nb">false</span><span class="p">;</span> <span class="c1">// 设置条件为true</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<h1 id="references">References</h1>
<ol>
<li>《Unix Network Programming Vol.2》</li>
</ol>
线程池库Tpool实现笔记(4)
2012-05-21T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/21/implementation-of-tpool4
<p>上一节已经实现好了工作者线程,而这一节就会实现用户最为关心的任务。</p>
<p>线程池封装了线程的实现细节,只对用户暴露了添加任务和控制生命周期的接口。所以对于用户来说,只需要把想要完成的事情封装在一个任务里面然后交给线程池就可以了,剩下的事情就交给线程池来处理。</p>
<p>所以任务只需要定义某种接口,然后让用户自己定义所需的任务类型就可以了。需要注意的是任务只定义了接口,而没有实现具体的线程安全性,也就是如果在多个线程池执行的任务里面使用了共享的资源的话,需要任务自己去保证线程安全。</p>
<p>除此之外,任务还需要定义一些基本的生命周期管理方法,使得当任务执行时间过长的情况下可以中止任务的执行。之前的<code class="language-plaintext highlighter-rouge">WorkerThread</code>就已经提到了任务的中止。</p>
<p>Tpool中的任务定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">TaskBase</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">enum</span> <span class="n">State</span> <span class="p">{</span>
<span class="n">INIT</span><span class="p">,</span>
<span class="n">RUNNING</span><span class="p">,</span>
<span class="n">FINISHED</span><span class="p">,</span>
<span class="n">CANCELLED</span><span class="p">,</span>
<span class="p">};</span>
<span class="n">TaskBase</span><span class="p">();</span>
<span class="o">~</span><span class="n">TaskBase</span><span class="p">()</span> <span class="p">{}</span>
<span class="kt">void</span> <span class="n">Run</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Cancel</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">CancelAsync</span><span class="p">();</span>
<span class="n">State</span> <span class="n">GetState</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
<span class="nl">protected:</span>
<span class="kt">void</span> <span class="n">CheckCancellation</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
<span class="nl">private:</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">DoRun</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中定义的<code class="language-plaintext highlighter-rouge">Run</code>函数是用户最为关系的调用接口,用户必须重写<code class="language-plaintext highlighter-rouge">DoRun</code>函数,然后把他加到线程池里面就可以让任务正常的运行了。</p>
<p>比如说可以创建一个这样的任务:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="k">struct</span> <span class="n">FakeTask</span> <span class="o">:</span> <span class="k">public</span> <span class="n">TaskBase</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">DoRun</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>然后用下面的语句把Task加到线程池里面:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre><span class="n">LFixedThreadPool</span> <span class="n">threadPool</span><span class="p">;</span>
<span class="n">threadPool</span><span class="p">.</span><span class="n">AddTask</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span><span class="p">(</span><span class="k">new</span> <span class="n">FakeTask</span><span class="p">));</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>然后任务就会被执行。</p>
<p>有了这个基本的接口之后,我们还需要考虑一下怎么取消任务。比如说有一个任务是向某个URL取大量数据,如果当时的网络环境不好,则这个任务可能会执行很长的时间,如果这个时候任务队列里面有大量这样的任务,则工作者线程会被这些任务阻塞住,从而影响线程池的效率。要防止这种情况发生,可以使用一种类似于工作者线程那样的方式来实现取消机制。</p>
<p>工作者线程是通过查询一个flag的状态来判断退出与否的。任务也是使用了这种方式。通过查询一个退出标志位,任务判断是否该取消,如果取消,则抛出退出异常。但是在任务里面,我们没有办法预先写好在什么时候进行判断,所以这个判断的时机就交给用户在实现任务的时候来决定。而任务只是提供了一个函数来check这个事情。这个函数在Tpool里面叫做<code class="language-plaintext highlighter-rouge">CheckCancellation</code>,定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="kt">void</span> <span class="n">TaskBase</span><span class="o">::</span><span class="n">CheckCancellation</span><span class="p">()</span> <span class="k">const</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">m_isRequestCancel</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">throw</span> <span class="n">TaskCancelException</span><span class="p">(</span><span class="s">"cancel task"</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而用户在实现任务的时候就需要保证隔一段时间就去check一下,比如:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="k">struct</span> <span class="n">FakeTask</span> <span class="o">:</span> <span class="k">public</span> <span class="n">TaskBase</span> <span class="p">{</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">DoRun</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">1000</span><span class="p">;</span> <span class="o">%</span><span class="mi">2</span><span class="n">B</span><span class="o">%</span><span class="mi">2</span><span class="n">Bi</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">CheckCancellation</span><span class="p">();</span>
<span class="n">sleep</span><span class="p">(</span><span class="mi">2</span><span class="p">);</span> <span class="c1">// 模拟一个耗时操作</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而在<code class="language-plaintext highlighter-rouge">Run</code>函数里面,不是简单的去调用<code class="language-plaintext highlighter-rouge">DoRun</code>,而是先检查<code class="language-plaintext highlighter-rouge">CheckCancellation</code>一下,这样就可以在任务没有跑的情况下也能取消的效果。如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
</pre></td><td class="code"><pre><span class="kt">void</span> <span class="n">TaskBase</span><span class="o">::</span><span class="n">Run</span><span class="p">()</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="n">CheckCancellation</span><span class="p">();</span> <span class="c1">// check before running the task.</span>
<span class="n">SetState</span><span class="p">(</span><span class="n">RUNNING</span><span class="p">);</span>
<span class="n">DoRun</span><span class="p">();</span>
<span class="n">SetState</span><span class="p">(</span><span class="n">FINISHED</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="k">const</span> <span class="n">TaskCancelException</span><span class="o">&</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">SetState</span><span class="p">(</span><span class="n">CANCELLED</span><span class="p">);</span>
<span class="p">}</span>
<span class="c1">// wake up the waiting thread it is cancelling this task.</span>
<span class="n">ConditionNotifyLocker</span><span class="p">(</span><span class="n">m_cancelCondition</span><span class="p">,</span>
<span class="n">boost</span><span class="o">::</span><span class="n">bind</span><span class="p">(</span><span class="o">&</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">IsStopState</span><span class="p">,</span> <span class="k">this</span><span class="p">));</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>而其余的<code class="language-plaintext highlighter-rouge">Cancel</code>函数只需要去设置一下取消flag就可以了。</p>
<p>至此,一个线程池所需要的主要元素都已经基本实现完毕了。接下来的几节我会讲述实现这个库用到的一些工具类实现和一些测试多线程程序的经验。</p>
线程池库Tpool实现笔记(3)
2012-05-20T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/20/implementation-of-tpool3
<p>上一节我们已经实现了一个基本的任务队列了。而在这一节我会讲述工作者线程的实现。</p>
<p>Tpool的工作者线程使用了类似<code class="language-plaintext highlighter-rouge">boost::thread</code>实现的线程实现。</p>
<p>工作者线程应该实现下面几个功能点:</p>
<ol>
<li>工作者线程在执行的时候不断地从任务队列获取任务,一旦获取了任何,则执行它。当一个任务执行完之后,继续获取任务。</li>
<li>支持工作者线程的生命周期管理,也就是可以让用户开始、结束工作者线程。</li>
</ol>
<p>假设我们有下面这样一个<a href="https://github.com/airekans/Tpool/blob/master/include/Thread.h" title="Tpool::Thread">基本的线程</a>定义:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">Thread</span> <span class="o">:</span> <span class="k">private</span> <span class="n">boost</span><span class="o">::</span><span class="n">noncopyable</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">template</span>
<span class="k">explicit</span> <span class="n">Thread</span><span class="p">(</span><span class="k">const</span> <span class="n">Func</span><span class="o">&</span> <span class="n">f</span><span class="p">);</span>
<span class="o">~</span><span class="n">Thread</span><span class="p">();</span>
<span class="nl">private:</span>
<span class="k">template</span>
<span class="k">static</span> <span class="kt">void</span><span class="o">*</span> <span class="n">ThreadFunction</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">arg</span><span class="p">);</span>
<span class="n">pthread_t</span> <span class="n">m_threadId</span><span class="p">;</span>
<span class="kt">bool</span> <span class="n">m_isStart</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中最重要的是构造函数是接受一个<code class="language-plaintext highlighter-rouge">Functor</code>,而这个<code class="language-plaintext highlighter-rouge">Functor</code>就是这个线程要执行的函数。而线程的析构函数里面则会去join这个线程,也就是这个线程默认是Joinable的。</p>
<p>有了上面的线程定义,很容易就会想到在创建这个Thread的时候将一个不断循环的从任务队列里面取任务的functor传递进去。</p>
<p>首先看一下<code class="language-plaintext highlighter-rouge">WorkerThread</code>的声明:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">WorkerThread</span> <span class="p">{</span>
<span class="nl">private:</span>
<span class="k">enum</span> <span class="n">State</span> <span class="p">{</span>
<span class="n">INIT</span><span class="p">,</span>
<span class="n">RUNNING</span><span class="p">,</span>
<span class="n">FINISHED</span><span class="p">,</span>
<span class="p">};</span>
<span class="nl">public:</span>
<span class="k">typedef</span> <span class="n">boost</span><span class="o">::</span><span class="n">shared_ptr</span> <span class="n">Ptr</span><span class="p">;</span>
<span class="n">WorkerThread</span><span class="p">(</span><span class="n">TaskQueueBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">taskQueue</span><span class="p">);</span>
<span class="k">template</span>
<span class="n">WorkerThread</span><span class="p">(</span><span class="n">TaskQueueBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">taskQueue</span><span class="p">,</span> <span class="n">FinishAction</span> <span class="n">action</span><span class="p">);</span>
<span class="o">~</span><span class="n">WorkerThread</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">Cancel</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">CancelAsync</span><span class="p">();</span>
<span class="kt">void</span> <span class="n">CancelNow</span><span class="p">();</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>目前的<code class="language-plaintext highlighter-rouge">WorkerThread</code>是设计成在构造函数里面就启动一个新的线程,而不是通过一个<code class="language-plaintext highlighter-rouge">Start</code>函数。而<code class="language-plaintext highlighter-rouge">Cancel</code>函数和其他几个变体都是为了完成线程的生命周期管理的。</p>
<p>而<code class="language-plaintext highlighter-rouge">WorkerThread</code>的定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre><span class="k">template</span>
<span class="n">WorkerThread</span><span class="o">::</span><span class="n">WorkerThread</span><span class="p">(</span><span class="n">TaskQueueBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">taskQueue</span><span class="p">,</span>
<span class="n">FinishAction</span> <span class="n">action</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">using</span> <span class="n">boost</span><span class="o">::</span><span class="n">bind</span><span class="p">;</span>
<span class="n">m_taskQueue</span> <span class="o">=</span> <span class="n">taskQueue</span><span class="p">;</span>
<span class="c1">// ensure that the thread is created successfully.</span>
<span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="c1">// check for the creation exception</span>
<span class="n">m_thread</span><span class="p">.</span><span class="n">reset</span><span class="p">(</span><span class="k">new</span> <span class="n">Thread</span><span class="p">(</span><span class="n">bind</span><span class="p">(</span><span class="o">&</span><span class="n">WorkerThread</span><span class="o">::</span>
<span class="n">ThreadFunction</span><span class="p">,</span>
<span class="k">this</span><span class="p">,</span> <span class="n">action</span><span class="p">)));</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="k">const</span> <span class="n">std</span><span class="o">::</span><span class="n">exception</span><span class="o">&</span> <span class="n">e</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ProcessError</span><span class="p">(</span><span class="n">e</span><span class="p">);</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中那个While循环是因为Thread在创建失败的时候会抛出异常,而我需要确保当<code class="language-plaintext highlighter-rouge">WorkerThread</code>的构造函数执行完的时候,线程已经被构造好。</p>
<p>而其中的<code class="language-plaintext highlighter-rouge">ThreadFunction</code>就是线程函数,定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="k">template</span>
<span class="kt">void</span> <span class="n">WorkerThread</span><span class="o">::</span><span class="n">ThreadFunction</span><span class="p">(</span><span class="n">FinishAction</span> <span class="n">action</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">WorkFunction</span><span class="p">();</span>
<span class="n">action</span><span class="p">();</span> <span class="c1">// WorkerThread finished.</span>
<span class="n">NotifyFinished</span><span class="p">();</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>这个线程是首先执行<code class="language-plaintext highlighter-rouge">WorkFunction</code>,然后执行一个用户传递进来的functor,这个functor是用户希望在线程结束之后能够执行的某个行为。最后再通知一下等待<code class="language-plaintext highlighter-rouge">WorkerThread</code>结束的线程。</p>
<p>最重要的就是<code class="language-plaintext highlighter-rouge">WorkFunction</code>。定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
</pre></td><td class="code"><pre><span class="kt">void</span> <span class="n">WorkerThread</span><span class="o">::</span><span class="n">WorkFunction</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">SetState</span><span class="p">(</span><span class="n">RUNNING</span><span class="p">);</span>
<span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">try</span>
<span class="p">{</span>
<span class="c1">// 1. check cancel request</span>
<span class="n">CheckCancellation</span><span class="p">();</span>
<span class="c1">// 2. fetch task from task queue</span>
<span class="n">GetTaskFromTaskQueue</span><span class="p">();</span>
<span class="c1">// 2.5. check cancel request again</span>
<span class="n">CheckCancellation</span><span class="p">();</span>
<span class="c1">// 3. perform the task</span>
<span class="k">if</span> <span class="p">(</span><span class="n">m_runningTask</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="k">dynamic_cast</span><span class="p">(</span><span class="n">m_runningTask</span><span class="p">.</span><span class="n">get</span><span class="p">())</span> <span class="o">!=</span> <span class="nb">NULL</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">break</span><span class="p">;</span> <span class="c1">// stop the worker thread.</span>
<span class="p">}</span>
<span class="k">else</span>
<span class="p">{</span>
<span class="n">m_runningTask</span><span class="o">-></span><span class="n">Run</span><span class="p">();</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="c1">// 4. perform any post-task action</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(</span><span class="k">const</span> <span class="n">WorkerThreadExitException</span><span class="o">&</span><span class="p">)</span>
<span class="p">{</span>
<span class="c1">// stop the worker thread.</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">catch</span> <span class="p">(...)</span> <span class="c1">// caught other exception</span>
<span class="p">{</span>
<span class="c1">// continue</span>
<span class="p">}</span>
<span class="p">}</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>可以看到这里<code class="language-plaintext highlighter-rouge">WorkFunction</code>就是用了一个While循环来不断的从任务队列里面取任务,然后执行。同时会判断拿出来的任务是不是类型为<code class="language-plaintext highlighter-rouge">EndTask</code>的任务,如果是,就意味着用户要求结束工作者线程,函数可以结束执行了。</p>
<p><code class="language-plaintext highlighter-rouge">WorkFunction</code>的基本思想还是比较简单的。但是除了任务的执行之外,还需要支持生命周期管理。假设一下当执行任务到一半,用户想要中止工作者线程的执行,这个时候如何去停止就是一个很重要的考虑了。如果是通过向任务队列里面添加<code class="language-plaintext highlighter-rouge">EndTask</code>这种缓和的方式,如果在多个工作者线程共享一个任务队列的时候,很难确保工作者线程可以马上中止,因为也许队列中会有其他的任务排在<code class="language-plaintext highlighter-rouge">EndTask</code>前面。</p>
<p>为了让用户(主要是线程池)能对工作者线程有更加细粒度的生命周期控制,我将中止的类型做了以下几种区分:</p>
<ol>
<li>线程池中止:整个线程池中止执行,此时线程池不再接受新的任务请求,同时往任务队列添加<code class="language-plaintext highlighter-rouge">EndTask</code>,使得工作者线程可以在执行完其他任务之后结束执行。这种中止方式是最缓和,也是最保险的。</li>
<li>工作者线程非紧急中止:这种方式要求工作者不再取新的任务,并且在执行完当前正在执行的任务之后就结束执行。</li>
<li>工作者线程紧急中止:类似于前一种方式,但是会尝试直接中止当前正在执行的任务并中止线程。</li>
</ol>
<p>除了第一种方式之外,其他两种中止都比较复杂。</p>
<p>初看起来,也许会觉得直接用<code class="language-plaintext highlighter-rouge">pthread_cancel</code>就可以实现类似的功能了。但是不要忘记,<code class="language-plaintext highlighter-rouge">pthread_cancel</code>是非常危险的一种线程取消机制,无论是async模式还是defered模式的,稍微不小心就会导致死锁的出现。</p>
<p>为了避免这种糟糕的实现,必须在线程之上自己实现一种线程取消的机制,使得线程可以安全的退出。</p>
<p>从线程函数的角度看,因为工作者线程主要是一个While循环执行任务的模式,就可以采用一种查询flag然后退出循环的方式来实现退出机制。这里主要有两个问题:</p>
<ol>
<li>什么时候查询flag。</li>
<li>怎么退出循环。</li>
</ol>
<h1 id="什么时候查询flag">什么时候查询flag?</h1>
<p>回去看到<code class="language-plaintext highlighter-rouge">WorkFunction</code>的实现,可以看到我在获取新的任务之前和获取了任务之后都查询了一次flag,如果flag被设置了,那么就退出。为什么要在这两个时候呢?</p>
<p>首先任务的运行途中,任务的退出是归任务自己管的,这个在Task的实现里面会有,而工作者线程不负责。而在其余的时候就应该尽可能的去检查flag,从而提高响应性。</p>
<p>第一个check会在工作者线程第一次进入循环或者是执行完任务的时候进行判断,而第二个check是在工作者线程获取完任务的时候,因为线程有可能在获取任务的时候阻塞住,所以这个时候检查也是必须的。</p>
<h1 id="怎么退出循环">怎么退出循环?</h1>
<p>一般的程序语言里面,可以有以下几种退出控制的方式(不管程序现在嵌套了多少个程序栈):</p>
<ol>
<li>函数的返回值表示某种退出状态,然后在没有函数调用的地方都去check一下返回值,根据具体的值去返回或者是继续执行。这种方式对于程序员来说非常的繁琐。</li>
<li>goto语句,当发生了某种情况之后,就用goto语句跳转到处理错误的逻辑那里,而不管现在是在哪个地方。</li>
<li>C++里面的异常:通过异常,不论程序运行了多少个嵌套的函数,都可以在抛出异常之后,跳转到对应的异常处理代码段。当然实际上异常的实现可能也就是某种程度上的goto。</li>
</ol>
<p>实现上的难度来说,在C++里面用异常来实现退出机制是最方便的一种方式,当然C++的异常机制有很多defects,但是只要小范围里面小心的运用,还是可以放心的用的。</p>
<p>所以我在<code class="language-plaintext highlighter-rouge">CheckCancellation</code>里面抛出一个异常,然后在While循环里面去catch这个异常就可以达到退出的目的了。</p>
<p>至此,就已经实现好了一个简单可用的工作者线程了。</p>
线程池库Tpool实现笔记(2)
2012-05-20T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/20/implementation-of-tpool2
<p>在上一节中,介绍了线程池的主要概念和<a href="https://github.com/airekans/Tpool">Tpool</a>的主要对外接口。并且在之前也讲过实现线程封装的一些方案。在Tpool里面,我选择的是类似于boost::thread的实现方式,也就是线程类是通过接受一个functor来指定线程的执行方式的。</p>
<p>在这一节里,我会讲述线程池里面最重要的数据结构——任务队列在Tpool中的实现。</p>
<h1 id="什么是任务队列taskqueue">什么是任务队列(TaskQueue)?</h1>
<p>所谓的任务队列,就是线程池用来存放用户发送过来的任务的一个数据结构,这些任务会在之后以某种顺序被工作者线程取出并执行。</p>
<p>在Tpool中,定义了一个抽象的<code class="language-plaintext highlighter-rouge">TaskQueueBase</code>接口,定义如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="k">namespace</span> <span class="n">tpool</span> <span class="p">{</span>
<span class="k">class</span> <span class="nc">TaskQueueBase</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">typedef</span> <span class="n">boost</span><span class="o">::</span><span class="n">shared_ptr</span> <span class="n">Ptr</span><span class="p">;</span>
<span class="k">virtual</span> <span class="o">~</span><span class="n">TaskQueueBase</span><span class="p">()</span> <span class="p">{}</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">Push</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">task</span><span class="p">)</span> <span class="o">=</span> <span class="p">;</span>
<span class="k">virtual</span> <span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">Pop</span><span class="p">()</span> <span class="o">=</span> <span class="p">;</span>
<span class="k">virtual</span> <span class="kt">size_t</span> <span class="n">Size</span><span class="p">()</span> <span class="k">const</span> <span class="o">=</span> <span class="p">;</span>
<span class="p">};</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>所有实现的任务队列都必须遵守这个接口。其中<code class="language-plaintext highlighter-rouge">Push</code>是往这个队列中加入任务,<code class="language-plaintext highlighter-rouge">Pop</code>则是从队列中取出任务。</p>
<p>实现这个接口的队列都会以某种方式存取任务。一般任务队列都会实现为FIFO式的队列。在Tpool中有一个默认的实现<code class="language-plaintext highlighter-rouge">LinearTaskQueue</code>,就是一个无界的FIFO队列。当然也可以实现一个具有任务优先级概念的任务队列,这个队列里的任务都具有优先级,而在Pop任务的时候总是获取优先级最高的任务。</p>
<h1 id="实现lineartaskqueue">实现LinearTaskQueue</h1>
<p><code class="language-plaintext highlighter-rouge">LinearTaskQueue</code>的声明如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre><span class="k">namespace</span> <span class="n">tpool</span> <span class="p">{</span>
<span class="k">class</span> <span class="nc">LinearTaskQueue</span> <span class="o">:</span> <span class="k">public</span> <span class="n">TaskQueueBase</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">Push</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">task</span><span class="p">);</span>
<span class="k">virtual</span> <span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">Pop</span><span class="p">();</span>
<span class="k">virtual</span> <span class="kt">size_t</span> <span class="n">Size</span><span class="p">()</span> <span class="k">const</span><span class="p">;</span>
<span class="nl">private:</span>
<span class="k">typedef</span> <span class="n">std</span><span class="o">::</span><span class="n">queue</span> <span class="n">TaskQueueImpl</span><span class="p">;</span>
<span class="n">TaskQueueImpl</span> <span class="n">m_tasks</span><span class="p">;</span>
<span class="k">mutable</span> <span class="n">sync</span><span class="o">::</span><span class="n">MutexConditionVariable</span> <span class="n">m_mutexCond</span><span class="p">;</span>
<span class="p">};</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>我用了<code class="language-plaintext highlighter-rouge">std::queue</code>来作为这个队列的内部实现,其中的<code class="language-plaintext highlighter-rouge">Push</code>和<code class="language-plaintext highlighter-rouge">Pop</code>操作怎么保证同步就是最为重要的地方。</p>
<p>因为在线程池里,很有可能同时有多个线程在同时向任务队列取任务,所以怎么保证取任务的正确性是很重要的。还有可能是在线程池往队列添加任务的同时工作者线程也在从队列取任务,这时候确保<code class="language-plaintext highlighter-rouge">Push</code>和<code class="language-plaintext highlighter-rouge">Pop</code>的同步也是非常重要的。</p>
<p>在队列同步的实现上,有以下几种实现:</p>
<ol>
<li>Single Lock: 用一把互斥锁锁住Push和Pop来保证操作的同步。</li>
<li>Double Lock: 用两把锁分别锁住Push和Pop,使得读和写之间不存在互斥,从而提高了效率。</li>
<li>Non-blocking: 完全不用锁的实现,目前Java的concurrent包里面就有一个NonBlockingQueue,使用的就是这种实现。</li>
</ol>
<p>从效率上来说,1到3的实现是递增的,但是实现的难度也是递增的。在<code class="language-plaintext highlighter-rouge">LinearTaskQueue</code>里面用的是最简单的Single Lock实现。</p>
<p>首先可以看到我在<code class="language-plaintext highlighter-rouge">LinearTaskQueue</code>里面声明了一个<a href="https://github.com/airekans/Tpool/blob/master/include/ConditionVariable.h" title="ConditionVariable的实现">MutexConditionVariable</a>,这是一个绑定了Mutex的一个条件变量。如果不用条件变量而只用Mutex的话,需要在Pop的时候用Mutex来进行状态的轮询,因为如果Pop的时候队列为空,需要等待队列变为非空,这是非常没有效率的一种实现。而是用条件变量的话,可以避免使用轮询,而在队列为空的时候让线程等待并阻塞住,这样就可以提高效率。</p>
<p>下面是Push的实现:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
</pre></td><td class="code"><pre><span class="kt">void</span> <span class="n">LinearTaskQueue</span><span class="o">::</span><span class="n">Push</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">task</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">ConditionNotifyAllLocker</span> <span class="n">l</span><span class="p">(</span><span class="n">m_mutexCond</span><span class="p">,</span>
<span class="n">bind</span><span class="p">(</span><span class="o">&</span><span class="n">TaskQueueImpl</span><span class="o">::</span><span class="n">empty</span><span class="p">,</span> <span class="o">&</span><span class="n">m_tasks</span><span class="p">));</span>
<span class="n">m_tasks</span><span class="p">.</span><span class="n">push</span><span class="p">(</span><span class="n">task</span><span class="p">);</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的意思是先将<code class="language-plaintext highlighter-rouge">m_mutexCond</code>锁上,并且当队列为空的时候通知其他等待的线程。然后往队列里面添加任务。这种加锁 → 通知 → 设置状态的方式是一种典型的模式,在UNPv1[1]里面有详细的说明。</p>
<p>而<code class="language-plaintext highlighter-rouge">Pop</code>的实现如下:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
</pre></td><td class="code"><pre><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">LinearTaskQueue</span><span class="o">::</span><span class="n">Pop</span><span class="p">()</span>
<span class="p">{</span>
<span class="c1">// wait until task queue is not empty</span>
<span class="n">ConditionWaitLocker</span> <span class="n">l</span><span class="p">(</span><span class="n">m_mutexCond</span><span class="p">,</span>
<span class="n">bind</span><span class="p">(</span><span class="o">&</span><span class="n">TaskQueueImpl</span><span class="o">::</span><span class="n">empty</span><span class="p">,</span> <span class="o">&</span><span class="n">m_tasks</span><span class="p">));</span>
<span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">task</span> <span class="o">=</span> <span class="n">m_tasks</span><span class="p">.</span><span class="n">front</span><span class="p">();</span>
<span class="n">m_tasks</span><span class="p">.</span><span class="n">pop</span><span class="p">();</span>
<span class="k">return</span> <span class="n">task</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>同样的,<code class="language-plaintext highlighter-rouge">Pop</code>里面也进行了下面几步:</p>
<ol>
<li>加锁,并且判断队列是否为空,如果为空,阻塞住。</li>
<li>从队列里面取出任务,然后返回。</li>
</ol>
<p>注意到在<code class="language-plaintext highlighter-rouge">Push</code>里面用的是<code class="language-plaintext highlighter-rouge">NotifyAll</code>而不是<code class="language-plaintext highlighter-rouge">Notify</code>,也就是在放入队列的时候,会通知所有的等待线程,而不是通知一个。有人就会问,通知所有的不会有性能问题么?用<code class="language-plaintext highlighter-rouge">Notify</code>不是也可以么?</p>
<p>对于第一个问题,暂时来说由于用的是一种指定执行顺序的唤醒模式,也就是:</p>
<ol>
<li>A线程加锁,唤醒等待线程B。</li>
<li>B执行,在唤醒之后尝试加锁,但是由于锁被A获取,所以再次阻塞。</li>
<li>A继续执行,设置状态为真,解锁。</li>
<li>B被唤醒,加锁,然后执行接下来的操作,解锁。</li>
</ol>
<p>所以执行顺序肯定是A → B,所以在A唤醒B这个过程中如果使用的是<code class="language-plaintext highlighter-rouge">NotifyAll</code>的话,会有多个线程同时尝试加锁,但是都会阻塞住,这个过程比较短,所以不会造成太大的性能开销。况且实现上我还是只有在队列为空的情况下才会去唤醒等待线程。</p>
<p>而对于第二个问题,答案是不能简单的用<code class="language-plaintext highlighter-rouge">Notify</code>来替换<code class="language-plaintext highlighter-rouge">NotifyAll</code>。想象一下下面这样的执行场景:</p>
<ol>
<li>队列为空,此时有两个线程在等待。</li>
<li>此时另一个线程执行Push,这个过程唤醒了一个线程。</li>
<li>假设这个被唤醒的线程还没有来得及被调度,这时另一个线程又调用了一次<code class="language-plaintext highlighter-rouge">Push</code>,注意,这个时候并不会执行<code class="language-plaintext highlighter-rouge">Notify</code>,因为我的<code class="language-plaintext highlighter-rouge">Notify</code>条件是当队列为空才会执行,而这个时候队列不为空。</li>
<li>在这个情况下,本来应该是两个等待的线程都被唤醒,但是实际上只有一个线程被唤醒,而另一个线程则一直等在那里,没有人去唤醒他。</li>
</ol>
<p>解决方法也很简单,就是把<code class="language-plaintext highlighter-rouge">Notify</code>的条件改成每次<code class="language-plaintext highlighter-rouge">Push</code>的时候都会<code class="language-plaintext highlighter-rouge">Notify</code>一次,不过这样的开销和用<code class="language-plaintext highlighter-rouge">NotifyAll</code>到底哪个大还需要看操作系统怎么实现了。</p>
<p>当然有一种最好的实现是是用类似于读写锁。把调用Pop的线程当做读者,而把调用<code class="language-plaintext highlighter-rouge">Push</code>的线程当做写者,并且把当前等待的读者数量记录下,而在这个数不为零的时候去<code class="language-plaintext highlighter-rouge">Notify</code>。</p>
<p>至此,一个基本的<code class="language-plaintext highlighter-rouge">TaskQueue</code>已经实现完毕了。</p>
<h1 id="references">References</h1>
<ol>
<li>《Unix Network Programming, vol.1》</li>
</ol>
线程池库Tpool实现笔记(1)
2012-05-18T00:00:00+00:00
http://airekans.github.io/multi-threaded/2012/05/18/implementation-of-tpool1
<p>之前就想实现线程池来着,想看看里面有什么需要注意的地方。之前的Thread实现就是为这个做的准备。</p>
<p>我用C++实现了一个基于pthread的线程池。</p>
<p>项目地址:<a href="https://github.com/airekans/Tpool">https://github.com/airekans/Tpool</a></p>
<h1 id="什么是线程池">什么是线程池?</h1>
<p>顾名思义,线程池就是一个放着一堆线程在那跑着的对象。OO里面经常把这种存放着大量预分配资源的对象称之为池(Pool),比如线程池、数据库连接池、内存池。</p>
<h1 id="为什么要用线程池">为什么要用线程池?</h1>
<p>那么我们为什么要用线程池呢?直接要用的时候就fork一个新的线程不是已经可以了么?</p>
<p>我们可以用下面这个场景来看看:</p>
<blockquote>
<p>在HTTP服务器中,如果我们用单线程来处理请求的话,明显是不够的。为了提高服务器的并发性,我们利用线程来处理请求。</p>
</blockquote>
<p>那么既然是用线程,假设我们用一个简单的来一个request,服务器就fork一个新的线程的方式,那么如果同时在1s内,服务器接收到1000个请求,那么服务器就需要fork 1000个线程来处理这些请求。1000个线程啊!!你能想象OS在这些线程之间切换的开销有多大么?而且光是创建和销毁线程也是有消耗的,如果请求和线程之间是1对1的话,这里系统的开销就会随着请求的增多的急剧增大。况且系统本身也是有线程数量的限制的,一个进程最多只能创建<code class="language-plaintext highlighter-rouge">PTHREAD_THREADS_MAX</code>这个多的线程。</p>
<p>线程池就是为了解决上面的问题,也就是减少处理请求从而新创建线程所造成的额外开销。如果我们在请求进来之前就fork好几个线程,而请求进来之后就交给这几个线程来处理,处理完之后这几个线程就继续等待下一个请求而不是结束执行。这样的方式就大大的减少了线程的创建、切换、销毁所带来的开销了。</p>
<p>除此之外,线程池还将各种线程之间的交互操作进行抽象,使得用户可以最大限度的不用担心多线程编程里面的繁琐的细节。</p>
<h1 id="怎么实现线程池">怎么实现线程池?</h1>
<p>线程池的实现有很多种,不过基本都离不开下面的几个概念:</p>
<ol>
<li>线程池(ThreadPool):总的对外接口,负责接收处理请求等工作。用户一般就只和这个接口打交道。</li>
<li>任务(Task):指需要进行的处理,比如上面的HTTP服务器例子的话就是处理HTTP请求返回对应的资源。一般用户会将这些请求交给线程池执行。</li>
<li>任务队列(TaskQueue):在线程池里面,需要存放用户提交过来的任务,以便让线程执行。一般来讲我们会用Queue来实现任务的存放,因为先进先出(FIFO)的方式是符合人们日常生活中处理请求的习惯的。</li>
<li>工作者线程(WorkerThread):负责处理请求的线程。由线程池负责管理,对于用户来说是不可见。工作者线程会从任务队列里面取出任务,然后执行任务。在执行完任务之后,会继续从任务队列里面取下一个任务。</li>
</ol>
<p>上面的概念可以用下面的图来说明清楚:</p>
<p><img src="/assets/img/ThreadPool.jpg" alt="ThreadPool" /></p>
<p>有了上面的概念,我们可以大概的知道线程池的接口大致是下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">ThreadPool</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="n">ThreadPool</span><span class="p">(</span><span class="k">const</span> <span class="kt">size_t</span> <span class="n">threadNum</span> <span class="o">=</span> <span class="mi">10</span><span class="p">);</span>
<span class="kt">bool</span> <span class="n">AddTask</span><span class="p">(</span><span class="n">TaskBase</span><span class="o">::</span><span class="n">Ptr</span> <span class="n">task</span><span class="p">);</span>
<span class="kt">void</span> <span class="n">Stop</span><span class="p">();</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中<code class="language-plaintext highlighter-rouge">ThreadPool</code>的构造函数的参数是工作者线程数量。<code class="language-plaintext highlighter-rouge">AddTask</code>方法用来向线程池添加任务。<code class="language-plaintext highlighter-rouge">Stop</code>方法则是停止线程池的执行。</p>
<p>有了上面的接口,那么用户在使用的时候就只需要创建对应的Task,然后将他<code class="language-plaintext highlighter-rouge">AddTask</code>进线程池里面就可以不用关心任务的执行细节了。只需要知道这个任务会被异步的执行就可以了。</p>
Emacs中的Tag查找功能
2012-05-18T00:00:00+00:00
http://airekans.github.io/emacs/2012/05/18/find-tag-in-emacs
<p>在Emacs里面,查找symbol并跳转到其定义上是通过etags来完成的,和Vim的相类似。但是在用了一段时间之后,发觉etags的跳转在对python的支持有时候很不智能,经常会跳转到<code class="language-plaintext highlighter-rouge">import</code>语句而不是<code class="language-plaintext highlighter-rouge">def</code>语句,这个让我颇为恼火,当时就下决心要抽个时间看看Emacs里面的实现是怎么回事,有没有什么改进的余地。</p>
<p>首先简单的介绍一下etags的用法。一般要用etags,就要经过下面几步:</p>
<ol>
<li>
<p>在源文件的根目录下,执行后面的语句:<code class="language-plaintext highlighter-rouge">find . -name '*.c' -exec etags -a {} \;</code></p>
<p>这个会生成一个TAGS文件,
是Emacs用来查找tags的默认名字。</p>
</li>
<li>
<p>打开emacs,在想要看定义的symbol(变脸或函数)上面按<code class="language-plaintext highlighter-rouge">M-.</code>(英文里面的句号),
或者直接<code class="language-plaintext highlighter-rouge">M-x find-tag</code>来查找。然后会提示TAGS的目录,输入就是了。</p>
</li>
<li>
<p>一般来说,到了这一步,Emacs就会跳转到对应的symbol定义处了。</p>
</li>
</ol>
<p>在讲解之前,先说清楚一个概念,就是tag。tag就是在etags里面识别出来的一个作为标识symbol。</p>
<p>那么上面的1主要处理的是tag的生成,而2是从生成的tag里面查找。所以我的问题主要是在2里面,也就是Emacs是怎么查找tag的。(不过在了解了机制之后,发现对于python来说,原来我的问题是落在1里面的,这个是后话)</p>
<p>在Emacs里面,所有和<code class="language-plaintext highlighter-rouge">find-tag</code>函数相关的东西都定义在<code class="language-plaintext highlighter-rouge">etags.el</code>里面,这个也是提供和etags对接的一个Emacs库。</p>
<p>在调用<code class="language-plaintext highlighter-rouge">find-tag</code>的时候,就会执行下面的语句:</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">defun</span> <span class="nv">find-tag</span> <span class="p">(</span><span class="nv">tagname</span> <span class="k">&optional</span> <span class="nv">next-p</span> <span class="nv">regexp-p</span><span class="p">)</span>
<span class="p">(</span><span class="nv">interactive</span> <span class="p">(</span><span class="nv">find-tag-interactive</span> <span class="s">"Find tag: "</span><span class="p">))</span>
<span class="p">(</span><span class="k">let*</span> <span class="p">((</span><span class="nv">buf</span> <span class="p">(</span><span class="nv">find-tag-noselect</span> <span class="nv">tagname</span> <span class="nv">next-p</span> <span class="nv">regexp-p</span><span class="p">))</span> <span class="c1">;****</span>
<span class="p">(</span><span class="nv">pos</span> <span class="p">(</span><span class="nv">with-current-buffer</span> <span class="nv">buf</span> <span class="p">(</span><span class="nv">point</span><span class="p">))))</span>
<span class="p">(</span><span class="nv">condition-case</span> <span class="no">nil</span>
<span class="p">(</span><span class="nv">switch-to-buffer</span> <span class="nv">buf</span><span class="p">)</span>
<span class="p">(</span><span class="nb">error</span> <span class="p">(</span><span class="nv">pop-to-buffer</span> <span class="nv">buf</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">goto-char</span> <span class="nv">pos</span><span class="p">)))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>实际上调用的第3行的<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>。那么就看看<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>干了些什么。</p>
<p>下面是<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>的定义:</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">defun</span> <span class="nv">find-tag-noselect</span> <span class="p">(</span><span class="nv">tagname</span> <span class="k">&optional</span> <span class="nv">next-p</span> <span class="nv">regexp-p</span><span class="p">)</span>
<span class="p">(</span><span class="nv">interactive</span> <span class="p">(</span><span class="nv">find-tag-interactive</span> <span class="s">"Find tag: "</span><span class="p">))</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">find-tag-history</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">tagname</span> <span class="nv">find-tag-history</span><span class="p">))</span>
<span class="c1">;; Save the current buffer's value of `find-tag-hook' before</span>
<span class="c1">;; selecting the tags table buffer. For the same reason, save value</span>
<span class="c1">;; of `tags-file-name' in case it has a buffer-local value.</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">local-find-tag-hook</span> <span class="nv">find-tag-hook</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">eq</span> <span class="ss">'-</span> <span class="nv">next-p</span><span class="p">)</span>
<span class="c1">;; Pop back to a previous location.</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nv">ring-empty-p</span> <span class="nv">tags-location-ring</span><span class="p">)</span>
<span class="p">(</span><span class="nb">error</span> <span class="s">"No previous tag locations"</span><span class="p">)</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">marker</span> <span class="p">(</span><span class="nv">ring-remove</span> <span class="nv">tags-location-ring</span> <span class="p">)))</span>
<span class="p">(</span><span class="nb">prog1</span>
<span class="c1">;; Move to the saved location.</span>
<span class="p">(</span><span class="nv">set-buffer</span> <span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nv">marker-buffer</span> <span class="nv">marker</span><span class="p">)</span>
<span class="p">(</span><span class="nb">error</span> <span class="s">"The marked buffer has been deleted"</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">marker-position</span> <span class="nv">marker</span><span class="p">))</span>
<span class="c1">;; Kill that marker so it doesn't slow down editing.</span>
<span class="p">(</span><span class="nv">set-marker</span> <span class="nv">marker</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)</span>
<span class="c1">;; Run the user's hook. Do we really want to do this for pop?</span>
<span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'local-find-tag-hook</span><span class="p">))))</span>
<span class="c1">;; Record whence we came.</span>
<span class="p">(</span><span class="nv">ring-insert</span> <span class="nv">find-tag-marker-ring</span> <span class="p">(</span><span class="nv">point-marker</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">and</span> <span class="nv">next-p</span> <span class="nv">last-tag</span><span class="p">)</span>
<span class="c1">;; Find the same table we last used.</span>
<span class="p">(</span><span class="nv">visit-tags-table-buffer</span> <span class="ss">'same</span><span class="p">)</span>
<span class="c1">;; Pick a table to use.</span>
<span class="p">(</span><span class="nv">visit-tags-table-buffer</span><span class="p">)</span>
<span class="c1">;; Record TAGNAME for a future call with NEXT-P non-nil.</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">last-tag</span> <span class="nv">tagname</span><span class="p">))</span>
<span class="c1">;; Record the location so we can pop back to it later.</span>
<span class="p">(</span><span class="k">let</span> <span class="p">((</span><span class="nv">marker</span> <span class="p">(</span><span class="nv">make-marker</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">with-current-buffer</span>
<span class="c1">;; find-tag-in-order does the real work.</span>
<span class="p">(</span><span class="nv">find-tag-in-order</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nb">and</span> <span class="nv">next-p</span> <span class="nv">last-tag</span><span class="p">)</span> <span class="nv">last-tag</span> <span class="nv">tagname</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">regexp-p</span>
<span class="nv">find-tag-regexp-search-function</span>
<span class="nv">find-tag-search-function</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">regexp-p</span>
<span class="nv">find-tag-regexp-tag-order</span>
<span class="nv">find-tag-tag-order</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">regexp-p</span>
<span class="nv">find-tag-regexp-next-line-after-failure-p</span>
<span class="nv">find-tag-next-line-after-failure-p</span><span class="p">)</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">regexp-p</span> <span class="s">"matching"</span> <span class="s">"containing"</span><span class="p">)</span>
<span class="p">(</span><span class="nb">or</span> <span class="p">(</span><span class="nb">not</span> <span class="nv">next-p</span><span class="p">)</span> <span class="p">(</span><span class="nb">not</span> <span class="nv">last-tag</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">set-marker</span> <span class="nv">marker</span> <span class="p">(</span><span class="nv">point</span><span class="p">))</span>
<span class="p">(</span><span class="nv">run-hooks</span> <span class="ss">'local-find-tag-hook</span><span class="p">)</span>
<span class="p">(</span><span class="nv">ring-insert</span> <span class="nv">tags-location-ring</span> <span class="nv">marker</span><span class="p">)</span>
<span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>首先看到<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>在<code class="language-plaintext highlighter-rouge">next-p</code>为负数的情况下是会跳回到之前的tag,而不是跳转到当前tag的位置。<br />
而在接下来的判断中,<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>首先把tag-table打开,然后记录下当前的tag,以便在之后跳回到这个tag。</p>
<p>最重要的就是调用了<code class="language-plaintext highlighter-rouge">find-tag-in-order</code>。从名字可以看出,这个函数是从<code class="language-plaintext highlighter-rouge">tag-table</code>中逐个逐个的找tag。实际上,<code class="language-plaintext highlighter-rouge">find-tag-in-order</code>是首先利用一个general的search函数粗略的匹配tag,然后再用order参数(一个函数列表)里面的函数按照不同的标准来进行进一步的匹配。</p>
<p>下面是<code class="language-plaintext highlighter-rouge">find-tag-in-order</code>的定义:</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
</pre></td><td class="code"><pre><span class="p">(</span><span class="nb">defun</span> <span class="nv">find-tag-in-order</span> <span class="p">(</span><span class="nv">pattern</span>
<span class="nv">search-forward-func</span>
<span class="nv">order</span>
<span class="nv">next-line-after-failure-p</span>
<span class="nv">matching</span>
<span class="nv">first-search</span><span class="p">)</span>
<span class="c1">;; State is saved so that the loop can be continued.</span>
<span class="p">(</span><span class="k">let</span> <span class="p">(</span><span class="nv">file</span> <span class="c1">;name of file containing tag</span>
<span class="nv">tag-info</span> <span class="c1">;where to find the tag in FILE</span>
<span class="p">(</span><span class="nv">first-table</span> <span class="no">t</span><span class="p">)</span>
<span class="p">(</span><span class="nv">tag-order</span> <span class="nv">order</span><span class="p">)</span>
<span class="p">(</span><span class="nv">match-marker</span> <span class="p">(</span><span class="nv">make-marker</span><span class="p">))</span>
<span class="nv">goto-func</span>
<span class="p">(</span><span class="nv">case-fold-search</span> <span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nv">memq</span> <span class="nv">tags-case-fold-search</span> <span class="o">'</span><span class="p">(</span><span class="no">nil</span> <span class="no">t</span><span class="p">))</span>
<span class="nv">tags-case-fold-search</span>
<span class="nv">case-fold-search</span><span class="p">))</span>
<span class="p">)</span>
<span class="p">(</span><span class="nv">save-excursion</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">first-search</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">tag-lines-already-matched</span> <span class="no">nil</span><span class="p">)</span>
<span class="p">(</span><span class="nv">visit-tags-table-buffer</span> <span class="ss">'same</span><span class="p">))</span>
<span class="c1">;; Get a qualified match.</span>
<span class="p">(</span><span class="k">catch</span> <span class="ss">'qualified-match-found</span>
<span class="c1">;; Iterate over the list of tags tables.</span>
<span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nb">or</span> <span class="nv">first-table</span>
<span class="p">(</span><span class="nv">visit-tags-table-buffer</span> <span class="no">t</span><span class="p">))</span>
<span class="p">(</span><span class="nb">and</span> <span class="nv">first-search</span> <span class="nv">first-table</span>
<span class="c1">;; Start at beginning of tags file.</span>
<span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">)))</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">first-table</span> <span class="no">nil</span><span class="p">)</span>
<span class="c1">;; Iterate over the list of ordering predicates.</span>
<span class="p">(</span><span class="nv">while</span> <span class="nv">order</span>
<span class="p">(</span><span class="nv">while</span> <span class="p">(</span><span class="nb">funcall</span> <span class="nv">search-forward-func</span> <span class="nv">pattern</span> <span class="no">nil</span> <span class="no">t</span><span class="p">)</span>
<span class="c1">;; Naive match found. Qualify the match.</span>
<span class="p">(</span><span class="nb">and</span> <span class="p">(</span><span class="nb">funcall</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">order</span><span class="p">)</span> <span class="nv">pattern</span><span class="p">)</span>
<span class="c1">;; Make sure it is not a previous qualified match.</span>
<span class="p">(</span><span class="nb">not</span> <span class="p">(</span><span class="nb">member</span> <span class="p">(</span><span class="nv">set-marker</span> <span class="nv">match-marker</span> <span class="p">(</span><span class="nv">save-excursion</span>
<span class="p">(</span><span class="nv">beginning-of-line</span><span class="p">)</span>
<span class="p">(</span><span class="nv">point</span><span class="p">)))</span>
<span class="nv">tag-lines-already-matched</span><span class="p">))</span>
<span class="p">(</span><span class="k">throw</span> <span class="ss">'qualified-match-found</span> <span class="no">nil</span><span class="p">))</span>
<span class="p">(</span><span class="k">if</span> <span class="nv">next-line-after-failure-p</span>
<span class="p">(</span><span class="nv">forward-line</span> <span class="mi">1</span><span class="p">)))</span>
<span class="c1">;; Try the next flavor of match.</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">order</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">order</span><span class="p">))</span>
<span class="p">(</span><span class="nv">goto-char</span> <span class="p">(</span><span class="nv">point-min</span><span class="p">)))</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">order</span> <span class="nv">tag-order</span><span class="p">))</span>
<span class="c1">;; We throw out on match, so only get here if there were no matches.</span>
<span class="c1">;; Clear out the markers we use to avoid duplicate matches so they</span>
<span class="c1">;; don't slow down editting and are immediately available for GC.</span>
<span class="p">(</span><span class="nv">while</span> <span class="nv">tag-lines-already-matched</span>
<span class="p">(</span><span class="nv">set-marker</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">tag-lines-already-matched</span><span class="p">)</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">tag-lines-already-matched</span> <span class="p">(</span><span class="nb">cdr</span> <span class="nv">tag-lines-already-matched</span><span class="p">)))</span>
<span class="p">(</span><span class="nv">set-marker</span> <span class="nv">match-marker</span> <span class="no">nil</span> <span class="no">nil</span><span class="p">)</span>
<span class="p">(</span><span class="nb">error</span> <span class="s">"No %stags %s %s"</span> <span class="p">(</span><span class="k">if</span> <span class="nv">first-search</span> <span class="s">""</span> <span class="s">"more "</span><span class="p">)</span>
<span class="nv">matching</span> <span class="nv">pattern</span><span class="p">))</span>
<span class="c1">;; Found a tag; extract location info.</span>
<span class="p">(</span><span class="nv">beginning-of-line</span><span class="p">)</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">tag-lines-already-matched</span> <span class="p">(</span><span class="nb">cons</span> <span class="nv">match-marker</span>
<span class="nv">tag-lines-already-matched</span><span class="p">))</span>
<span class="c1">;; Expand the filename, using the tags table buffer's default-directory.</span>
<span class="c1">;; We should be able to search for file-name backwards in file-of-tag:</span>
<span class="c1">;; the beginning-of-line is ok except when positioned on a "file-name" tag.</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">file</span> <span class="p">(</span><span class="nv">expand-file-name</span>
<span class="p">(</span><span class="k">if</span> <span class="p">(</span><span class="nv">memq</span> <span class="p">(</span><span class="nb">car</span> <span class="nv">order</span><span class="p">)</span> <span class="o">'</span><span class="p">(</span><span class="nv">tag-exact-file-name-match-p</span>
<span class="nv">tag-file-name-match-p</span>
<span class="nv">tag-partial-file-name-match-p</span><span class="p">))</span>
<span class="p">(</span><span class="nv">save-excursion</span> <span class="p">(</span><span class="nv">forward-line</span> <span class="mi">1</span><span class="p">)</span>
<span class="p">(</span><span class="nv">file-of-tag</span><span class="p">))</span>
<span class="p">(</span><span class="nv">file-of-tag</span><span class="p">)))</span>
<span class="nv">tag-info</span> <span class="p">(</span><span class="nb">funcall</span> <span class="nv">snarf-tag-function</span><span class="p">))</span>
<span class="c1">;; Get the local value in the tags table buffer before switching buffers.</span>
<span class="p">(</span><span class="k">setq</span> <span class="nv">goto-func</span> <span class="nv">goto-tag-location-function</span><span class="p">)</span>
<span class="p">(</span><span class="nv">tag-find-file-of-tag-noselect</span> <span class="nv">file</span><span class="p">)</span>
<span class="p">(</span><span class="nv">widen</span><span class="p">)</span>
<span class="p">(</span><span class="nv">push-mark</span><span class="p">)</span>
<span class="p">(</span><span class="nb">funcall</span> <span class="nv">goto-func</span> <span class="nv">tag-info</span><span class="p">)</span>
<span class="c1">;; Return the buffer where the tag was found.</span>
<span class="p">(</span><span class="nv">current-buffer</span><span class="p">))))</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>其中最为重要的就是<code class="language-plaintext highlighter-rouge">while</code>的一部分。首先<code class="language-plaintext highlighter-rouge">order</code>不为空,然后用<code class="language-plaintext highlighter-rouge">search-forward-func</code>(默认是<code class="language-plaintext highlighter-rouge">search-forward</code>)来查找这个tag的pattern,如果找到了,就再用<code class="language-plaintext highlighter-rouge">(car order)</code>来进行仔细的匹配。所以<code class="language-plaintext highlighter-rouge">order</code>里面的函数就是匹配的关键,就看里面有些什么匹配函数了。<code class="language-plaintext highlighter-rouge">order</code>是在<code class="language-plaintext highlighter-rouge">find-tag-noselect</code>传进来的。</p>
<p><code class="language-plaintext highlighter-rouge">order</code>在tag是正则表达式的时候是<code class="language-plaintext highlighter-rouge">find-tag-regexp-tag-order</code>,而在tag是普通的字符串的时候就是<code class="language-plaintext highlighter-rouge">find-tag-tag-order</code>。这里我着重看了一下<code class="language-plaintext highlighter-rouge">find-tag-tag-order</code>,它默认是下面的列表:</p>
<figure class="highlight"><pre><code class="language-cl" data-lang="cl"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="p">(</span><span class="nv">tag-exact-file-name-match-p</span>
<span class="nv">tag-file-name-match-p</span>
<span class="nv">tag-exact-match-p</span>
<span class="nv">tag-implicit-name-match-p</span>
<span class="nv">tag-symbol-match-p</span>
<span class="nv">tag-word-match-p</span>
<span class="nv">tag-partial-file-name-match-p</span>
<span class="nv">tag-any-match-p</span><span class="p">)</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>所以你可以看到,他是先按tag是不是完全匹配文件名,然后再去匹配看看是不是匹配tag,如果还是找不到的话,就去部分的匹配。而一般来说只有在一个tag在精准匹配里面找不到的时候,才可能去部分的匹配。也还有一种可能,那就生成tag的时候本身就生成错了,导致一些不是tag的地方也变成了tag。而在Python中,etags生成的tag就真的不全是我们想要的!</p>
<p>用etags对python进行tag的生成的时候,会把import语句也当成是tag的一种,从而生成在TAGS文件里面,所以用emacs的<code class="language-plaintext highlighter-rouge">find-tag</code>跳转的时候就会发现,当我想找一个tag的定义的时候,他却经常的跳到了import的地方,就是这个原因。</p>
<p>这里也大概的说一下etags生成的tag-table格式(我还没有看过etags的源码,是通过看TAGS文件总结出来的)。</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x7F
<filename>,<No1>
<Matched tag line>x7F[<tag>x01]<Line num>,<No2>
.....
</code></pre></div></div>
<p>每个tag文件都用x7F来分隔每个scan的文件。</p>
<p>后记:</p>
<p>通过查看了解etags.el的实现,我大致明白了Emacs里面一个library的构成和编写方式。也通过它明白了Emacs里面调试的一些小技巧,通过edebug能够比较好的了解一个elisp程序的运行状态。通过这个库,我明白其实Emacs里面的灵活性主要是通过elisp实现的,而其实它上面的插件都不是太成熟,但是由于elisp具有较好的可读性,而且用emacs的人都比较经得起折腾(或者说hack源码或者自己动手的能力较强),所以导致了emacs的用户都说Emacs是神器。事实上,我现在用的也还不是太熟,算是在进阶阶段,感觉Emacs有些功能还是不错的,至少在拓展性上我觉得比Vim要强,不过打字速度还真没有vim来的快,看个人用的怎么样了。</p>
<p>理解Emacs Lisp Library的编写,就需要看看<code class="language-plaintext highlighter-rouge">require</code>和<code class="language-plaintext highlighter-rouge">provide</code>这两个函数,还有一些<code class="language-plaintext highlighter-rouge">defgroup</code>和<code class="language-plaintext highlighter-rouge">defcustom</code>。</p>
由pthread C++ wrapper引发的血案
2012-04-12T00:00:00+00:00
http://airekans.github.io/cpp/2012/04/12/pthread-c-wrapper
<p>最近用C++实现pthread线程池的时候, 研究了一下C++里面实现线程的方式。主要是由下面两种:</p>
<ol>
<li>一个<code class="language-plaintext highlighter-rouge">Thread</code>基类,用户的线程类通过继承这个<code class="language-plaintext highlighter-rouge">Thread</code>基类并重写父类中特定方法来实现线程执行函数。</li>
<li>一个<code class="language-plaintext highlighter-rouge">Thread</code>类,定义了一个 <code class="language-plaintext highlighter-rouge">Run()</code>函数,函数的参数是一个<code class="language-plaintext highlighter-rouge">Functor</code>,当线程执行的时候,就会执行这个<code class="language-plaintext highlighter-rouge">Functor</code>。</li>
</ol>
<p>方案一大概是下面的感觉:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">Thread</span> <span class="p">{</span>
<span class="k">static</span> <span class="kt">void</span><span class="o">*</span> <span class="n">ThreadFunc</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">Thread</span><span class="o">*</span> <span class="n">t</span> <span class="o">=</span> <span class="k">static_cast</span><span class="o"><</span><span class="n">Thread</span><span class="o">*></span><span class="p">(</span><span class="n">arg</span><span class="p">);</span>
<span class="n">t</span><span class="o">-></span><span class="n">Entry</span><span class="p">();</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="nl">public:</span>
<span class="n">Thread</span><span class="p">()</span> <span class="p">{}</span>
<span class="o">~</span><span class="n">Thread</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">pthread_join</span><span class="p">(</span><span class="n">m_id</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">);</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="n">Run</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">pthread_create</span><span class="p">(</span><span class="o">&</span><span class="n">m_id</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">ThreadFunc</span><span class="p">,</span> <span class="k">this</span><span class="p">);</span>
<span class="p">}</span>
<span class="nl">private:</span>
<span class="k">virtual</span> <span class="kt">void</span> <span class="n">Entry</span><span class="p">()</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">pthread_t</span> <span class="n">m_id</span><span class="p">;</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>注意到,我设计上是希望这个线程类是joinable的,而且在析构函数里面自动的join。这样用户在用这个线程类的时候就比较方便,不用担心线程的结束。</p>
<p>对于方案二,代码大概就是下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">Thread</span> <span class="p">{</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">Func</span><span class="o">></span>
<span class="k">static</span> <span class="kt">void</span><span class="o">*</span> <span class="n">ThreadFunc</span><span class="p">(</span><span class="kt">void</span><span class="o">*</span> <span class="n">arg</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">auto_ptr</span><span class="o"><</span><span class="n">Func</span><span class="o">></span> <span class="n">f</span><span class="p">(</span><span class="k">static_cast</span><span class="o"><</span><span class="n">Func</span><span class="o">*></span><span class="p">(</span><span class="n">arg</span><span class="p">));</span>
<span class="p">(</span><span class="o">*</span><span class="n">f</span><span class="p">)();</span> <span class="c1">// call f</span>
<span class="k">return</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="p">}</span>
<span class="nl">public:</span>
<span class="k">template</span><span class="o"><</span><span class="k">typename</span> <span class="n">Func</span><span class="o">></span>
<span class="kt">void</span> <span class="n">Run</span><span class="p">(</span><span class="n">Func</span> <span class="n">f</span><span class="p">)</span>
<span class="p">{</span>
<span class="n">auto_ptr</span><span class="o"><</span><span class="n">Func</span><span class="o">></span> <span class="n">func</span><span class="p">(</span><span class="k">new</span> <span class="n">Func</span><span class="p">(</span><span class="n">f</span><span class="p">));</span>
<span class="n">pthread_create</span><span class="p">(</span><span class="n">m_id</span><span class="p">,</span> <span class="nb">NULL</span><span class="p">,</span> <span class="n">ThreadFunc</span><span class="o"><</span><span class="n">Func</span><span class="o">></span><span class="p">,</span> <span class="n">func</span><span class="p">.</span><span class="n">get</span><span class="p">());</span>
<span class="p">}</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>从一个用户的角度,我觉得通过继承一个类然后override他的一个虚方法来编写线程函数会直观一些。比如说像下面这样写一个线程类来输出”hello, world”:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
</pre></td><td class="code"><pre><span class="k">class</span> <span class="nc">HelloWorldThread</span> <span class="o">:</span> <span class="k">public</span> <span class="n">Thread</span> <span class="p">{</span>
<span class="nl">public:</span>
<span class="kt">void</span> <span class="n">Entry</span><span class="p">()</span>
<span class="p">{</span>
<span class="n">cout</span> <span class="o"><<</span> <span class="s">"hello, world"</span> <span class="o"><<</span> <span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="p">};</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>方案一的实现相对起来就很直观,而如果用方案二的话,就需要另外写一个<code class="language-plaintext highlighter-rouge">Functor</code>,对于没有Lambda的C++来说,it’s painful……</p>
<p>这样在用这个类的时候我就可以简单的写下面的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="p">{</span>
<span class="n">HelloWorldThread</span> <span class="n">t</span><span class="p">;</span> <span class="c1">// 线程开始执行</span>
<span class="p">}</span> <span class="c1">// 线程退出</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>注意,我期望在block退出的时候,这个线程自动的结束。</p>
<p>哦活活~~理想很丰满,现实很骨感!!方案一中的这种实现是有bug的。
如果你写一个单元测试,比如说像下面这样:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="p">{</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">10</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span>
<span class="n">HelloWorldThread</span> <span class="n">t</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>你会发现,在跑这个程序的大多数情况下,程序跑着跑着就crash了,Linux底下给你一个”pure virtual method called”的错误……
OMG,怎么回事?</p>
<p>这里就需要注意到,方案一中的实现,默认是joinable的线程。而我们在<code class="language-plaintext highlighter-rouge">Thread</code>类中的析构函数里面去<code class="language-plaintext highlighter-rouge">pthread_join</code>这个线程,从而保证这个线程在出作用域的时候会结束。
而既然”pure virtual method called”,那出问题的地方肯定是<code class="language-plaintext highlighter-rouge">t->Entry();</code>这一行咯。只有这一行call了虚函数嘛。
但是我们明明在子类中override了<code class="language-plaintext highlighter-rouge">Entry</code>函数啊!况且我调<code class="language-plaintext highlighter-rouge">Entry()</code>的时候的确是通过<code class="language-plaintext highlighter-rouge">HelloWorldThread</code>去调的啊!!</p>
<p>请仔细想想,调<code class="language-plaintext highlighter-rouge">Entry()</code>的时候可不一定是<code class="language-plaintext highlighter-rouge">HelloWorldThread</code>啊。
<code class="language-plaintext highlighter-rouge">static void* ThreadFunc(void* arg)</code>这个函数是在另一个线程里面执行的。而<code class="language-plaintext highlighter-rouge">pthread_join</code>这个函数是在<code class="language-plaintext highlighter-rouge">Thread</code>类的析构函数里面call,所以析构函数和<code class="language-plaintext highlighter-rouge">ThreadFunc</code>是不在同一个线程的。
我们案件重播一下,当我们启动线程之后,假设这个线程没有跑,这个时候我们来到了右大括号。此时<code class="language-plaintext highlighter-rouge">HelloWorldThread</code>的析构函数调用,为空,OK,这个时候继续调用父类的析构函数,这个时候就join,然后等待线程结束。注意到,在父类的析构函数里面,这个类就已经不再是<code class="language-plaintext highlighter-rouge">HelloWorldThread</code>了,他已经是<code class="language-plaintext highlighter-rouge">Thread</code>了。而<code class="language-plaintext highlighter-rouge">Thread</code>的<code class="language-plaintext highlighter-rouge">Entry</code>函数是纯虚的,如果线程现在开始运行的话,那么就会调用<code class="language-plaintext highlighter-rouge">Thread</code>的<code class="language-plaintext highlighter-rouge">Entry</code>函数(因为这个时候的对象是<code class="language-plaintext highlighter-rouge">Thread</code>类),Bang!! 悲剧总是这么发生的……
所以说,方案一中的实现是有问题的,至少用户不能利用RAII来进行线程的自动回收。所以基于这种实现的线程类,都必须由用户手动的去Join/Wait一下,否则就crash了。至少在目前我看过的实现中,wx的就是这么实现的,而它要求用户在joinable状态里面去主动的Wait一下线程。我觉得这样的实现不太clean,因为一旦你要求用户手动的做一些事情,就容易出现bug。而C++中的重要特性RAII就等于废了,所以我觉得方案二的实现较为好,虽然使用上有点不太习惯,不过习惯嘛,可以慢慢改。:)</p>
<h1 id="references">References</h1>
<ol>
<li><a href="http://stackoverflow.com/questions/3160403/pure-virtual-method-called-when-implementing-a-boostthread-wrapper-interface">http://stackoverflow.com/questions/3160403/pure-virtual-method-called-when-implementing-a-boostthread-wrapper-interface</a></li>
</ol>
深入理解C里面的switch
2012-04-06T00:00:00+00:00
http://airekans.github.io/c/2012/04/06/switch-case-in-c
<p>事情的起因是这样的,在wx的源码里面看到了下面一段比较诡异的代码:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
</pre></td><td class="code"><pre><span class="k">switch</span> <span class="p">(</span> <span class="n">level</span> <span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="n">wxLOG_Info</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span> <span class="n">GetVerbose</span><span class="p">()</span> <span class="p">)</span> <span class="c1">// ***** Note here ****</span>
<span class="k">case</span> <span class="n">wxLOG_Message</span><span class="p">:</span>
<span class="p">{</span>
<span class="n">m_aMessages</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">szString</span><span class="p">);</span>
<span class="n">m_aSeverity</span><span class="p">.</span><span class="n">Add</span><span class="p">(</span><span class="n">wxLOG_Message</span><span class="p">);</span>
<span class="n">m_aTimes</span><span class="p">.</span><span class="n">Add</span><span class="p">((</span><span class="kt">long</span><span class="p">)</span><span class="n">t</span><span class="p">);</span>
<span class="n">m_bHgasessages</span> <span class="o">=</span> <span class="nb">true</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">...</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>上面的<code class="language-plaintext highlighter-rouge">switch</code>活生生的把一个if的条件部分和主体部分给分开到两个case里面,最诡异的是这竟然是合法的。<br />
到底是怎么一回事呢?<br />
那就需要来看看switch到底是怎么实现的。</p>
<p>下面我就用C来写一个类似的程序,然后看看它的汇编代码是怎么一回事。<br />
下面的C程序用下面的命令就可以得到汇编输出:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>gcc -S -o switch.s switch.c
</code></pre></div></div>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre><span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="k">const</span> <span class="kt">char</span> <span class="o">*</span><span class="n">argv</span><span class="p">[])</span>
<span class="p">{</span>
<span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">switch</span> <span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="nl">default:</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="p">}</span>
<span class="k">return</span> <span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>来看一下汇编输出,注意这里的汇编是<a href="http://en.wikipedia.org/wiki/GNU_Assembler">GAS</a>的语法:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
</pre></td><td class="code"><pre>.file "switch.c"
.text
.globl main
.type main, @function
main:
pushl %ebp
movl %esp, %ebp
subl $8, %esp
andl $-16, %esp
movl $, %eax
addl $15, %eax
addl $15, %eax
shrl $4, %eax
sall $4, %eax
subl %eax, %esp
movl $1, -4(%ebp)
movl -4(%ebp), %eax
movl %eax, -8(%ebp)
cmpl $1, -8(%ebp)
je .L3
cmpl $2, -8(%ebp)
je .L5
jmp .L6
.L3:
cmpl $1, -4(%ebp)
jne .L4
.L5:
movl $2, -4(%ebp)
.L4:
movl $3, -4(%ebp)
jmp .L2
.L6:
movl $4, -4(%ebp)
.L2:
movl $, %eax
leave
ret
.size main, .-main
.section .note.GNU-stack,"",@progbits
.ident "GCC: (GNU) 3.4.4 20050721 (Red Hat 3.4.4-2)"
</pre></td></tr></tbody></table></code></pre></figure>
<p>注意一下最主要的部分:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
</pre></td><td class="code"><pre>movl $1, -4(%ebp)
movl -4(%ebp), %eax
</pre></td></tr></tbody></table></code></pre></figure>
<p>这两句相当于:</p>
<p>接下来的那段就是switch:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
</pre></td><td class="code"><pre> movl %eax, -8(%ebp) # 判断部分 switch (i)
cmpl $1, -8(%ebp)
je .L3
cmpl $2, -8(%ebp)
je .L5
jmp .L6
.L3: # case 1:
cmpl $1, -4(%ebp)
jne .L4
.L5: # case 2:
movl $2, -4(%ebp)
.L4:
movl $3, -4(%ebp)
jmp .L2
.L6: # default:
movl $4, -4(%ebp)
.L2:
</pre></td></tr></tbody></table></code></pre></figure>
<p>可以看到,实际上switch是在开始的部分用一系列的cmp来判断变量i是不是与case中的几个值相等,如果等于就jmp到对应的lable。这里的逻辑相当于使用了goto语句。<br />
而几个case的地方,实际上汇编代码是连接起来的,所以像开头所说的那部分condition和body分开的情况是可以存在的。<br />
实际上C里面的switch完全等价于goto语句,如下面的switch:</p>
<figure class="highlight"><pre><code class="language-cpp" data-lang="cpp"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
</pre></td><td class="code"><pre><span class="k">switch</span> <span class="p">(</span><span class="n">i</span><span class="p">)</span>
<span class="p">{</span>
<span class="k">case</span> <span class="mi">1</span><span class="p">:</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">case</span> <span class="mi">2</span><span class="p">:</span>
<span class="p">{</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">2</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">3</span><span class="p">;</span>
<span class="k">break</span><span class="p">;</span>
<span class="nl">default:</span>
<span class="n">i</span> <span class="o">=</span> <span class="mi">4</span><span class="p">;</span>
<span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<p>等价于下面的goto语句实现:</p>
<figure class="highlight"><pre><code class="language-gas" data-lang="gas"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
</pre></td><td class="code"><pre>if (i == 1)
goto L1;
else if (i == 2)
goto L2;
else
goto Ldefault;
L1:
if (i == 1)
L2:
{
i = 2;
}
i = 3;
goto Lend;
Ldefault:
i = 4;
Lend:
</pre></td></tr></tbody></table></code></pre></figure>
<p>当然goto实际上就是C语言版本的jmp指令了。</p>
<p>PS: 有兴趣的童鞋可以去看看<a href="http://en.wikipedia.org/wiki/Duff%27s_device">Duff’s device</a>,你就知道switch是多么强大、多么tricky的一个语句了。</p>