歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
您现在的位置: Linux教程網 >> UnixLinux >  >> Linux基礎 >> 關於Linux

linux中頁緩沖和塊緩沖之概念

 

頁緩沖在《linux內核情景分析》一書的第5.6節文件的寫與讀一章中說明的很詳細,這裡摘抄下來;

在文件系統層中有三隔主要的數據結構,file結構、dentry結構和inode結構;

file結構:代表目標文件的一個上下文,不同進程可以在同一文件上建立不同的上下文,而且同一進程也可以通過打開一個文件多次而建立起多個上下文。因此不能在file結構上設置緩沖區隊列,因為這些file結構體之間都不共享。

dentry結構體:該結構體是文件名結構體,通過軟/硬鏈接可以得到多個dentry結構體對應一個文件,dentry結構體和文件也不是一對一關系,所以也不能在該結構體上建立緩沖區隊列;

inode結構體:很顯然就只有inode結構體了,inode結構體和文件是一對一的關系,可以這麼說inode就是代表文件。在inode結構體上設置了i_mapping指針,該指針指向了一個address_space數據結構,一般來說該數據結構就是inode->i_data,緩沖區隊列就是在該數據結構中;

 

掛在緩沖區隊列中的不是記錄塊而是內存頁面,因此當一個進程調用mmap()函數將一個文件映射到它用戶空間時,它只要設置相應的內存映射表,就可以很自然的把這些緩存頁面映射到進程的用戶空間。所以才又起名為i_mapping。

 

這裡還要了解下基數樹概念,先看看圖(圖片來自《深入linux內核架構》)

\

基數樹不是不是平衡樹,樹本身由兩種不同的數據結構組成,樹根節點和非葉子節點,樹根節點由簡單的數據結構表示,其中包含了樹的高度和指向組成樹的第一個節點的數據結構。節點本質上是數組,count是該節點的指針計數,其他的都是指向下一層節點的指針。而葉子節點是指向page的指針;

其中節點上的數據結構還包含了搜索標記,比如髒頁標記和回寫標記,可以很快的指定哪邊有標記的頁;

 

 

塊緩沖

塊緩沖在結構上由兩個部分組成:

1、緩沖頭:包含與緩沖區狀態相關的所有管理數據,塊號、長度,訪問器等,這些緩沖頭不直接存儲在緩沖頭之後,而是由緩沖頭指針指向的物理內存獨立區域中。

2、有用的數據保存在專門分配的頁中,這些頁也可以能同事存在頁緩沖中。

 

緩沖頭:

 

/*
 * Historically, a buffer_head was used to map a single block
 * within a page, and of course as the unit of I/O through the
 * filesystem and block layers.  Nowadays the basic I/O unit
 * is the bio, and buffer_heads are used for extracting block
 * mappings (via a get_block_t call), for tracking state within
 * a page (via a page_mapping) and for wrapping bio submission
 * for backward compatibility reasons (e.g. submit_bh).
 */
struct buffer_head {
    unsigned long b_state;      /* buffer state bitmap (see above) *///緩沖區狀態標識,看下面
    struct buffer_head *b_this_page;/* circular list of page's buffers *///指向下一個緩沖頭
    struct page *b_page;        /* the page this bh is mapped to *///指向擁有該塊緩沖區的頁描述符指針

    sector_t b_blocknr;     /* start block number *///塊設備的邏輯塊號
    size_t b_size;          /* size of mapping *///塊大小
    char *b_data;           /* pointer to data within the page *///塊在緩沖頁內的位置

    struct block_device *b_bdev;//指向塊設備描述符
    bh_end_io_t *b_end_io;      /* I/O completion *///i/o完成回調函數
    void *b_private;        /* reserved for b_end_io *///指向i/o完成回調函數的數據參數
    struct list_head b_assoc_buffers; /* associated with another mapping */
    struct address_space *b_assoc_map;  /* mapping this buffer is
                           associated with */
    atomic_t b_count;       /* users using this buffer_head *///塊使用計算器
};

 

 

緩沖區頭部的通用標志

enum bh_state_bits {
    BH_Uptodate,    /* Contains valid data *///表示緩沖區包含有效數據
    BH_Dirty,   /* Is dirty *///緩沖區是髒的
    BH_Lock,    /* Is locked *///緩沖區被鎖住
    BH_Req,     /* Has been submitted for I/O *///初始化緩沖區而請求數據傳輸
    BH_Uptodate_Lock,/* Used by the first bh in a page, to serialise
              * IO completion of other buffers in the page
              */

    BH_Mapped,  /* Has a disk mapping *///b_bdev和b_blocknr是有效的
    BH_New,     /* Disk mapping was newly created by get_block *///剛分配還沒有訪問過
    BH_Async_Read,  /* Is under end_buffer_async_read I/O *///異步讀該緩沖區
    BH_Async_Write, /* Is under end_buffer_async_write I/O *///異步寫該緩沖區
    BH_Delay,   /* Buffer is not yet allocated on disk *///還沒有在磁盤上分配緩沖區
    BH_Boundary,    /* Block is followed by a discontiguity *///
    BH_Write_EIO,   /* I/O error on write *///i/o錯誤
    BH_Unwritten,   /* Buffer is allocated on disk but not written */
    BH_Quiet,   /* Buffer Error Prinks to be quiet */
    BH_Meta,    /* Buffer contains metadata */
    BH_Prio,    /* Buffer should be submitted with REQ_PRIO */

    BH_PrivateStart,/* not a state bit, but the first bit available
             * for private allocation by other entities
             */
};


如果一個頁作為緩沖區頁使用,那麼與它的塊緩沖區相關的所有緩沖區首部都被收集在一個單向循環鏈表中。緩沖頁描述符的private字段指向該頁中第一個塊的緩沖區首部;而每個緩沖區首部的b_this_page字段中,該字段是指向鏈表中下一個緩沖區首部的指針。每個緩沖區首部的b_page指向所屬的緩沖區頁描述符;

 

\

從上圖可以看出一個緩沖頁對應了4個緩沖區,這就統一了page cache和buffer cache了。修改緩沖區或者緩沖頁,他們之間都會相互影響。

 

 

address_space結構體:

struct address_space {
struct inode *host; /* owner: inode, block_device *///指向宿主文件的inode
struct radix_tree_root page_tree; /* radix tree of all pages *///基數樹的root
spinlock_t tree_lock; /* and lock protecting it *///基數樹的鎖
unsigned int i_mmap_writable;/* count VM_SHARED mappings *///vm_SHARED共享映射頁計數
struct rb_root i_mmap; /* tree of private and shared mappings *///私有和共享映射的樹
struct list_head i_mmap_nonlinear;/*list VM_NONLINEAR mappings *///匿名映射的鏈表元素
struct mutex i_mmap_mutex; /* protect tree, count, list *///包含樹的mutex
/* Protected by tree_lock together with the radix tree */


unsigned long nrpages; /* number of total pages *///頁的總數
pgoff_t writeback_index;/* writeback starts here *///回寫的開始
const struct address_space_operations *a_ops; /* methods *///函數指針
unsigned long flags; /* error bits/gfp mask *///錯誤碼
struct backing_dev_info *backing_dev_info; /* device readahead, etc *///設備預讀
spinlock_t private_lock; /* for use by the address_space */
struct list_head private_list; /* ditto */
void *private_data; /* ditto */
} __attribute__((aligned(sizeof(long))));

 

struct inode *host和struct radix_tree_root page_tree關聯了文件和內存頁。

 

\

 

 

 346 struct address_space_operations {
 347     int (*writepage)(struct page *page, struct writeback_control *wbc);//寫操作,從頁寫到所有者的磁盤映像
 348     int (*readpage)(struct file *, struct page *);//讀操作,從所有者磁盤映像讀取到頁
 349 
 350     /* Write back some dirty pages from this mapping. */
 351     int (*writepages)(struct address_space *, struct writeback_control *);//指定數量的所有者髒頁回寫磁盤
 352 
 353     /* Set a page dirty.  Return true if this dirtied it */
 354     int (*set_page_dirty)(struct page *page);//把所有者的頁設置為髒頁
 355 
 356     int (*readpages)(struct file *filp, struct address_space *mapping,
 357             struct list_head *pages, unsigned nr_pages);//從磁盤中讀取所有者頁的鏈表
 358 
 359     int (*write_begin)(struct file *, struct address_space *mapping,
 360                 loff_t pos, unsigned len, unsigned flags,
 361                 struct page **pagep, void **fsdata);//
 362     int (*write_end)(struct file *, struct address_space *mapping,
 363                 loff_t pos, unsigned len, unsigned copied,
 364                 struct page *page, void *fsdata);
 365 
 366     /* Unfortunately this kludge is needed for FIBMAP. Don't use it */
 367     sector_t (*bmap)(struct address_space *, sector_t);
 368     void (*invalidatepage) (struct page *, unsigned long);
 369     int (*releasepage) (struct page *, gfp_t);
 370     void (*freepage)(struct page *);
 371     ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov,
 372             loff_t offset, unsigned long nr_segs);
 373     int (*get_xip_mem)(struct address_space *, pgoff_t, int,
 374                         void **, unsigned long *);
 375     /*
 376      * migrate the contents of a page to the specified target. If sync
 377      * is false, it must not block.
 378      */
 379     int (*migratepage) (struct address_space *,
 380             struct page *, struct page *, enum migrate_mode);
 381     int (*launder_page) (struct page *);
 382     int (*is_partially_uptodate) (struct page *, read_descriptor_t *,
 383                     unsigned long);
 384     int (*error_remove_page)(struct address_space *, struct page *);
 385 
 386     /* swapfile support */
 387     int (*swap_activate)(struct swap_info_struct *sis, struct file *file,
 388                 sector_t *span);
 389     void (*swap_deactivate)(struct file *file);
 390 };
 391 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Copyright © Linux教程網 All Rights Reserved