您现在的位置： Linux教程網 >> UnixLinux > >> Linux綜合 >> Linux資訊 >> 更多Linux

內核中的互斥之我見

　　/*e4gle:在我修改Linux源代碼的過程中曾被大量的內核互斥現象所困擾，這需要利用內核鎖去解決，雖然最後大部分解決，但我覺得應該留下些什麼，也沒時間寫了，偶爾看見這位兄弟的文章，覺得正是我想整理的，所以拿出來給大家分享，關於bottom_half和中斷的問題，在tcp/ip半底中絕對不能對文件讀寫操作，不然就panic，恰恰我在linux中的增強功能就有這個操作，使我郁悶了很久，歡迎大家討論　　*/ 　　內核中的互斥之我見　　by wheelz 　　看了前面各位的討論，我也有些想法，與大家商榷。　　需要澄清的是，互斥手段的選擇，不是根據臨界區的大小，而是根據臨界區的性質，以及有哪些部分的代碼，即哪些內核執行路徑來爭奪。　　從嚴格意義上說，semaphore和spinlock_XXX屬於不同層次的互斥手段，前者的實現有賴於後者，這有點象HTTP和TCP的關系，都是協議，但層次是不同的。　　先說semaphore，它是進程級的，用於多個進程之間對資源的互斥，雖然也是在內核中，但是該內核執行路徑是以進程的身份，代表進程來爭奪資源的。如果競爭不上，會有context switch，進程可以去sleep，但CPU不會停，會接著運行其他的執行路徑。從概念上說，這和單CPU或多CPU沒有直接的關系，只是在 semaphore本身的實現上，為了保證semaphore結構存取的原子性，在多CPU中需要spinlock來互斥。　　在內核中，更多的是要保持內核各個執行路徑之間的數據訪問互斥，這是最基本的互斥問題，即保持數據修改的原子性。semaphore的實現，也要依賴這個。在單CPU中，主要是中斷和bottom_half的問題，因此，開關中斷就可以了。在多CPU中，又加上了其他CPU的干擾，因此需要spinlock來幫助。這兩個部分結合起來，就形成了spinlock_XXX。它的特點是，一旦CPU進入了spinlock_XXX，它就不會干別的，而是一直空轉，直到鎖定成功為止。因此，這就決定了被spinlock_XXX鎖住的臨界區不能停，更不能context switch，要存取完數據後趕快出來，以便其他的在空轉的執行路徑能夠獲得spinlock。這也是spinlock的原則所在。如果當前執行路徑一定要進行context switch，那就要在schedule()之前釋放spinlock，否則，容易死鎖。因為在中斷和bh中，沒有context，無法進行context switch，只能空轉等待spinlock，你context switch走了，誰知道猴年馬月才能回來。　　因為spinlock的原意和目的就是保證數據修改的原子性，因此也沒有理由在spinlock 鎖住的臨界區中停留。　　spinlock_XXX有很多形式，有　　spin_lock()/spin_unlock()，　　spin_lock_irq()/spin_unlock_irq()，　　spin_lock_irqsave/spin_unlock_irqrestore() 　　spin_lock_bh()/spin_unlock_bh() 　　local_irq_disable/local_irq_enable 　　local_bh_disable/local_bh_enable 　　那麼，在什麼情況下具體用哪個呢？這要看是在什麼內核執行路徑中，以及要與哪些內核執行路徑相互斥。我們知道，內核中的執行路徑主要有：　　1 用戶進程的內核態，此時有進程context，主要是代表進程在執行系統調用等。　　2 中斷或者異常或者自陷等，從概念上說，此時沒有進程context，不能進行　　context switch。　　3 bottom_half，從概念上說，此時也沒有進程context。　　4 同時，相同的執行路徑還可能在其他的CPU上運行。　　這樣，考慮這四個方面的因素，通過判斷我們要互斥的數據會被這四個因素中　　的哪幾個來存取，就可以決定具體使用哪種形式的spinlock。如果只要和其他CPU互斥，就要用spin_lock/spin_unlock，如果要和irq及其他CPU互斥，就要用　　spin_lock_irq/spin_unlock_irq，如果既要和irq及其他CPU互斥，又要保存EFLAG的狀態，就要用spin_lock_irqsave/spin_unlock_irqrestore，如果要和bh及其他CPU互斥，就要用spin_lock_bh/spin_unlock_bh，如果不需要和其他CPU互斥，只要和irq互斥，則用local_irq_disable/local_irq_enable，　　如果不需要和其他CPU互斥，只要和bh互斥，則用local_bh_disable/local_bh_enable，　　等等。值得指出的是，對同一個數據的互斥，在不同的內核執行路徑中，　　所用的形式有可能不同(見下面的例子)。　　舉一個例子。在中斷部分中有一個irq_desc_t類型的結構數組變量irq_desc[]，　　該數組每個成員對應一個irq的描述結構，裡面有該irq的響應函數等。　　在irq_desc_t結構中有一個spinlock，用來保證存取(修改)的互斥。　　對於具體一個irq成員，irq_desc[irq]，對其存取的內核執行路徑有兩個，一是　　在設置該irq的響應函數時(setup_irq)，這通常發生在module的初始化階段，或　　系統的初始化階段；二是在中斷響應函數中(do_IRQ)。代碼如下：　　int setup_irq(unsigned int irq, strUCt irqaction * new) 　　{ 　　int shared = 0; 　　unsigned long flags; 　　struct irqaction *old, **p; 　　irq_desc_t *desc = irq_desc + irq; 　　/* 　　* Some drivers like serial.c use request_irq() heavily, 　　* so we have to be careful not to interfere with a 　　* running system. 　　*/ 　　if (new->flags & SA_SAMPLE_RANDOM) { 　　/* 　　* This function might sleep, we want to call it first, 　　* outside of the atomic block. 　　* Yes, this might clear the entropy pool if the wrong 　　* driver is attempted to be loaded, without actually 　　* installing a new handler, but is this really a problem, 　　* only the sysadmin is able to do this. 　　*/ 　　rand_initialize_irq(irq); 　　} 　　/* 　　* The following block of code has to be executed atomically 　　*/ 　　[1] spin_lock_irqsave(&desc->lock,flags); 　　p = &desc->action; 　　if ((old = *p) != NULL) { 　　/* Can't share interrupts unless both agree to */ 　　if (!(old->flags & new->flags & SA_SHIRQ)) { 　　[2] spin_unlock_irqrestore(&desc->lock,flags); 　　return -EBUSY; 　　} 　　/* add new interrupt at end of irq queue */ 　　do { 　　p = &old->next; 　　old = *p; 　　} while (old); 　　shared = 1; 　　} 　　*p = new; 　　if (!shared) { 　　desc->depth = 0; 　　desc->status &= ~(IRQ_DISABLED IRQ_AUTODETECT IRQ_WAITING); 　　desc->handler->startup(irq); 　　} 　　[3] spin_unlock_irqrestore(&desc->lock,flags); 　　register_irq_proc(irq); 　　return 0; 　　}　　asmlinkage unsigned int do_IRQ(struct pt_regs regs) 　　{ 　　/* 　　* We ack quickly, we don't want the irq controller 　　* thinking we're snobs just because some other CPU has 　　* disabled global interrupts (we have already done the 　　* INT_ACK cycles, it's too late to try to pretend to the 　　* controller that we aren't taking the interrupt). 　　* 　　* 0 return value means that this irq is already being 　　* handled by some other CPU. (or is disabled) 　　*/ 　　int irq = regs.orig_eax & 0xff; /* high bits used in ret_from_ code */ 　　int cpu = smp_processor_id(); 　　irq_desc_t *desc = irq_desc + irq; 　　struct irqaction * action; 　　unsigned int status; 　　kstat.irqs[cpu][irq]++; 　　[4] spin_lock(&desc->lock); 　　desc->handler->ack(irq); 　　/* 　　REPLAY is when Linux resends an IRQ that was dropped earlier 　　WAITING is used by probe to mark irqs that are being tested 　　*/ 　　status = desc->status & ~(IRQ_REPLAY IRQ_WAITING); 　　status = IRQ_PENDING; /* we _want_ to handle it */ 　　/* 　　* If the IRQ is disabled for whatever reason, we cannot 　　* use the action we have. 　　*/ 　　action = NULL; 　　if (!(status & (IRQ_DISABLED IRQ_INPROGRESS))) { 　　action = desc->action; 　　status &= ~IRQ_PENDING; /* we commit to handling */ 　　status = IRQ_INPROGRESS; /* we are handling it */ 　　} 　　desc->status = status; 　　/* 　　* If there is no IRQ handler or it was disabled, exit early. 　　Since we set PENDING, if another processor is handling 　　a different instance of this same irq, the other processor 　　will take care of it. 　　*/ 　　if (!action) 　　goto out; 　　/* 　　* Edge triggered interrupts need to remember 　　* pending events. 　　* This applies to any hw interrupts that allow a second 　　* instance of the same irq to arrive while we are in do_IRQ 　　* or in the handler. But the code here only handles the _second_ 　　* instance of the irq, not the third or fourth. So it is mostly 　　* useful for irq hardware that does not mask cleanly in an 　　* SMP environment. 　　*/ 　　for (;;) { 　　[5] spin_unlock(&desc->lock); 　　handle_IRQ_event(irq, ®s, action); 　　[6] spin_lock(&desc->lock)

上一篇文章：紅旗3.0 Server服務配置教程內核升級
下一篇文章：對內核重入的理解