歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
您现在的位置: Linux教程網 >> UnixLinux >  >> Linux綜合 >> Linux內核

Linux內核實踐 - 如何添加網絡協議[二]:實現

內核版本:2.6.34

實現思路:

報文在網絡協議棧中的流動,對於接收來講,是 對報文的脫殼的過程,由於報文是已知的輸入,只要逐個解析協議號;對於發送來講,是各層發送函數的嵌套調用,由於沒有已 知的輸入,只能按事先設計好的協議進行層層構造。但無論報文怎樣的流動,核心是報文所在設備(skb->dev)的變化,相當 於各層之間傳遞的交接棒。

按照上述思路,brcm協議接收的處理作為模塊brcm_packet_type加入 到ptype_base中就可以了;brcm協議發送的處理則復雜一點,發送的嵌套調用完全是依賴於設備來推動的,因此要有一種新創建 的設備X,插入到vlan設備和網卡設備之間。

因此,至少要有brcm_packet_type來加入ptype_base和register_brcm_dev() 來向系統注冊設備X。進一步考慮,設備X在全局量init_net中有存儲,但我們還需要知道設備X與vlan設備以及網卡設備是何種 組織關系,所以在這裡設計了brcm_group_hash來存儲這種關系。為了對設備感興趣的事件作出響應,添加自己的notifier到 netdev_chain中。另外,為了用戶空間具有一定控制能力(如創建、刪除),還需要添加brcm相關的ioctl調用。為了讓它看起來 更完整,一種新的設備在proc中也應有對應項,用來調試和查看設備。

從最簡單開始

要 讓網絡協議棧能夠接收一種新協議是很簡單的,由於已經有報文作為輸入,我們要做的僅僅是編寫好brcm_packet_type,然後在 注冊模塊時只用做一件事:dev_add_pack。

static int __init 

brcm_proto_init(void)
{
 dev_add_pack(&brcm_packet_type);     
}

static struct packet_type brcm_packet_type __read_mostly = {     
 .type = cpu_to_be16(ETH_P_BRCM),     
 .func = brcm_skb_recv, /* BRCM receive method */ 
};     

int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,     
    struct packet_type *ptype, struct net_device *orig_dev)     
{
 struct brcm_hdr *bhdr;     
 struct brcm_rx_stats *rx_stats;     

 skb = skb_share_check(skb, GFP_ATOMIC);     
 if(!skb)     
  goto err_free;     
 bhdr = (struct brcm_hdr *)skb->data;     

 rcu_read_lock();     
 skb_pull_rcsum(skb, BRCM_HLEN);     
 // set protocol     
 skb->protocol = bhdr->brcm_encapsulated_proto;     
 // reorder skb     
 skb = brcm_check_reorder_header(skb);     
 if (!skb)      
  goto err_unlock;     

 netif_rx(skb);     
 rcu_read_unlock();     
 return NET_RX_SUCCESS;     

err_unlock:     
 rcu_read_unlock();     

err_free:     
 kfree_skb(skb);     
 return NET_RX_DROP;     
}

注冊這個模塊後,協議棧就能正常接收帶brcm報頭的報文的,代碼中ETH_P_BRCM是brcm的協議號,BRCM_HLEN是brcm 的報頭長度。正是由於有報文作為輸入,接收變得十分簡單。

但這僅僅是能接收而已,發送的報 文還是不帶brcm報頭的,而且接收的這段代碼也很粗略,沒有變更skb的設備,沒有記錄流量,沒有對brcm報頭作有意義的處理 ,下面逐一進行添加。

設備的相關定義

一種設備就是net_device類型,而每種設備都有自 己的私有變量,它存儲在net_device末尾,定義如下,其中real_dev指向下層設備,這是最基本屬性,其余可以視需要自己設定 ,brcm_rx_stats則是該設備接收流量統計:

struct brcm_dev_info{     
 struct net_device  *real_dev;     
 u16 brcm_port;     
 unsigned char  real_dev_addr[ETH_ALEN];     
 struct proc_dir_entry *dent;     
 struct brcm_rx_stats __percpu  *brcm_rx_stats;     
};
struct brcm_rx_stats {     
 unsigned long rx_packets;     
 unsigned long rx_bytes;     
 unsigned long multicast;     
 unsigned long rx_errors;     
};

設備間的關系問題

如果brcm僅僅是只有一個設備,則無需數據結構來存儲這種關系, 一個全局全變的brcm_dev就可以了。這裡的設計考慮的是復雜的情況,可以存在多個下層設備,多個brcm設備,之間沒有固定的 關系。所以需要一種數據結構來存儲這種關系- brcm_group_hash。下面是一個簡單的圖示:

各個數據結構定義如下:

static struct hlist_head brcm_group_hash

[BRCM_GRP_HASH_SIZE];     
struct brcm_group {     
 struct hlist_node hlist;     
 struct net_device *real_dev;     
 int nr_ports;     
 int killall;     
 struct net_device *brcm_devices_array[BRCM_GROUP_ARRAY_LEN];     
 struct rcu_head  rcu;     
};

brcm_group_hash作為全局變量存在,以hash表形式組織,brcm_group被插入到brcm_group_hash中,brcm_group存 儲了它與下層設備的關系(eth與brcm),real_dev指向e下層設備,而brcm設備則存儲在brcm_devices_array數組中。

下面完成由下層設備轉換成brcm設備的函數,brcm_port是報頭中的值,可以自己設定它的含義,這裡設定它表 示報文來自於哪個端口。

struct net_device *find_brcm_dev(struct 

net_device *real_dev, u16 brcm_port)     
{     
 struct brcm_group *grp = brcm_find_group(real_dev);     
 if (grp)      
  brcm_dev = grp->brcm_devices_array[brcm_port];     
 return NULL;     
}

因為在接收報文時,報文到達brcm層開始處理時,skb->dev指向的仍是下層設備,這時通過skb->dev查到 brcm_group->real_dev相匹配的hash項,然後通過報文brcm報頭的信息,確定brcm_group->brcm_devices_array中哪個 brcm設備作為skb的新設備;

而在發送報文時,報文到達brcm層開始處理時,skb->dev指向的是brcm 設備,為了繼續向下傳遞,需要變更為它的下層設備,在設備數據net_device的私有數據部分,一般會存儲一個指針,指向它的 下層設備,因此skb->dev只要變更為brcm_dev_info(dev)->real_dev。

流量統計

在數據結構中,brcm設備的私有數據brcm_dev_info中brcm_rx_stats記錄接收的流量信息;而dev->_tx[index]則會記錄發送 的流量信息。

在接收函數brcm_skb_rcv()中對於成功接收的報文會增加流量統計:

rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats,     
  smp_processor_id());     
rx_stats->rx_packets++;     
rx_stats->rx_bytes += skb->len;

在發送函數brcm_dev_hard_start_xmit()中對於發送的報文會增加相應流量 統計:

if (likely(ret == NET_XMIT_SUCCESS)) {     
txq->tx_packets++;     
txq->tx_bytes += len;     
} else 
 txq->tx_dropped++;

而brcm_netdev_ops->ndo_get_stats()即brcm_dev_get_stats()函數,則會將brcm網卡設 備中記錄的發送和接收流量信息匯總成通用的格式net_device_stats,像ifconfig等命令使用的就是net_device_stats轉換後的 結果。

完整收發函數

有了這些後接收函數brcm_skb_recv()就可以完整了,其中關於報頭 brcm_hdr的處理可以略過,由於是空想的協議,含義是可以自己設定的:

int 

brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,     
    struct packet_type *ptype, struct net_device *orig_dev)     
{
 struct brcm_hdr *bhdr;
 struct brcm_rx_stats *rx_stats;
 int op, brcm_port;

 skb = skb_share_check(skb, GFP_ATOMIC);
 if(!skb)
  goto err_free;
 bhdr = (struct brcm_hdr *)skb->data;
 op = bhdr->brcm_tag.brcm_53242_op;
 brcm_port = bhdr->brcm_tag.brcm_53242_src_portid- 23;     

 rcu_read_lock();     

 // drop wrong brcm tag packet     
 if (op != BRCM_RCV_OP || brcm_port < 1      
  || brcm_port > 27)      
  goto err_unlock;     

 skb->dev = find_brcm_dev(dev, brcm_port);     
 if (!skb->dev) {     
  goto err_unlock;     
 }     

 rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats,     
          smp_processor_id());     
 rx_stats->rx_packets++;     
 rx_stats->rx_bytes += skb->len;     
 skb_pull_rcsum(skb, BRCM_HLEN);     
         
 switch (skb->pkt_type) {     
 case PACKET_BROADCAST: /* Yeah, stats collect these together.. */ 
  /* stats->broadcast ++; // no such counter :-( */ 
  break;     

 case PACKET_MULTICAST:     
  rx_stats->multicast++;     
  break;     

 case PACKET_OTHERHOST:     
  /* Our lower layer thinks this is not local, let's make sure.    
   * This allows the VLAN to have a different MAC than the    
   * underlying device, and still route correctly.    
   */ 
  if (!compare_ether_addr(eth_hdr(skb)->h_dest,     
     skb->dev->dev_addr))     
   skb->pkt_type = PACKET_HOST;     
  break;     
 default:     
  break;     
 }     

 // set protocol     
 skb->protocol = bhdr->brcm_encapsulated_proto;     
         
 // reorder skb     
 skb = brcm_check_reorder_header(skb);     
 if (!skb) {     
  rx_stats->rx_errors++;     
  goto err_unlock;     
 }     
          
 netif_rx(skb);     
 rcu_read_unlock();     
 return NET_RX_SUCCESS;     
         
err_unlock:     
 rcu_read_unlock();     
         
err_free:     
 kfree_skb(skb);     
 return NET_RX_DROP;     
}

同時,發送函數brcm_dev_hard_start_xmit()可以完整了,同樣,其中關於brcm_hdr的處理可以略過:

static netdev_tx_t brcm_dev_hard_start_xmit(struct sk_buff *skb,     
         struct net_device *dev)     
{     
 int i = skb_get_queue_mapping(skb);     
 struct netdev_queue *txq = netdev_get_tx_queue(dev, i);     
 struct brcm_ethhdr *beth = (struct brcm_ethhdr *)(skb->data);     
 unsigned int len;     
 u16 brcm_port;     
 int ret;     
         
 /* Handle non-VLAN frames if they are sent to us, for example by DHCP.    
  *    
  * NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING    
  * OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs...    
  */ 
 if (beth->h_brcm_proto != htons(ETH_P_BRCM)){     
  //unsigned int orig_headroom = skb_headroom(skb);     
  brcm_t brcm_tag;     
  brcm_port = brcm_dev_info(dev)->brcm_port;     
  if (brcm_port == BRCM_ANY_PORT) {     
   brcm_tag.brcm_op_53242 = 0;     
   brcm_tag.brcm_tq_53242 = 0;     
   brcm_tag.brcm_te_53242 = 0;     
   brcm_tag.brcm_dst_53242 = 0;     
  }else {     
   brcm_tag.brcm_op_53242 = BRCM_SND_OP;     
   brcm_tag.brcm_tq_53242 = 0;     
   brcm_tag.brcm_te_53242 = 0;     
   brcm_tag.brcm_dst_53242 = brcm_port + 23;     
  }     
         
  skb = brcm_put_tag(skb, *(u32 *)(&brcm_tag));     
  if (!skb) {     
   txq->tx_dropped++;     
   return NETDEV_TX_OK;     
  }     
 }     
         
 skb_set_dev(skb, brcm_dev_info(dev)->real_dev);     
 len = skb->len;     
 ret = dev_queue_xmit(skb);     
         
 if (likely(ret == NET_XMIT_SUCCESS)) {     
  txq->tx_packets++;     
  txq->tx_bytes += len;     
 } else 
  txq->tx_dropped++;     
         
 return ret;     
}

注冊設備

接收通過dev_add_pack(),就可以融入協議棧了,前面幾篇的分析已經講過 通過ptype_base對報文進行脫殼。現在要融入的發送,函數已經完成了,既然發送是一種嵌套的調用,並且是由dev來推過的, 那麼發送函數的融入一定在設備進行注冊時,作為設備的一種發送方法。

創建一種設備時,一定 會有設備的XXX_setup()初始化,大部分設備都會用ether_setup()來作初始化,再進行適當更改。下面是brcm_setup():

void brcm_setup(struct net_device *dev)     
{     
 ether_setup(dev);     
         
 dev->priv_flags  |= IFF_BRCM_TAG;     
 dev->priv_flags  &= ~IFF_XMIT_DST_RELEASE;     
 dev->tx_queue_len = 0;     
         
 dev->netdev_ops  = &brcm_netdev_ops;     
 dev->destructor  = free_netdev;     
 dev->ethtool_ops = &brcm_ethtool_ops;     
         
 memset(dev->broadcast, 0, ETH_ALEN);     
}

其中發送函數就在brcm_netdev_ops中,每層設備都會這樣調用:dev->netdev_ops->ndo_start_xmit()。

static const struct net_device_ops brcm_netdev_ops = {     
    .ndo_change_mtu     = brcm_dev_change_mtu,     
    .ndo_init       = brcm_dev_init,     
    .ndo_uninit     = brcm_dev_uninit,     
    .ndo_open       = brcm_dev_open,     
    .ndo_stop       = brcm_dev_stop,     
    .ndo_start_xmit =  brcm_dev_hard_start_xmit,     
    .ndo_validate_addr  = eth_validate_addr,     
    .ndo_set_mac_address    = brcm_dev_set_mac_address,     
    .ndo_set_rx_mode    = brcm_dev_set_rx_mode,     
    .ndo_set_multicast_list = brcm_dev_set_rx_mode,     
    .ndo_change_rx_flags    = brcm_dev_change_rx_flags,     
    //.ndo_do_ioctl     = brcm_dev_ioctl,     
    .ndo_neigh_setup    = brcm_dev_neigh_setup,     
    .ndo_get_stats      = brcm_dev_get_stats,     
};

而設備的初始化應該發生在創建設備時,也就是向網絡注冊它時,也就是register_brcm_dev(),注冊一個新設備, 需要知道它的下層設備real_dev以及唯一標識brcm設備的brcm_port。首先確定該設備沒有被創建,然後用alloc_netdev_mq創建 新設備new_dev,然後設置相關屬性,特別是它的私有屬性brcm_dev_info(new_dev),然後添加它到brcm_group_hash中,最後發 生真正的注冊register_netdevice()。

static int register_brcm_dev(struct 

net_device *real_dev, u16 brcm_port)     
{     
 struct net_device *new_dev;     
 struct net *net = dev_net(real_dev);     
 struct brcm_group *grp;     
 char name[IFNAMSIZ];     
 int err;     
         
 if(brcm_port >= BRCM_PORT_MASK)     
  return -ERANGE;     
         
 // exist yet     
 if (find_brcm_dev(real_dev, brcm_port) != NULL)     
  return -EEXIST;     
         
 snprintf(name, IFNAMSIZ, "brcm%i", brcm_port);     
 new_dev = alloc_netdev_mq(sizeof(struct brcm_dev_info), name,     
      brcm_setup, 1);     
 if (new_dev == NULL)     
  return -ENOBUFS;     
 new_dev->real_num_tx_queues = real_dev->real_num_tx_queues;     
 dev_net_set(new_dev, net);     
 new_dev->mtu = real_dev->mtu;     
         
 brcm_dev_info(new_dev)->brcm_port = brcm_port;     
 brcm_dev_info(new_dev)->real_dev = real_dev;     
 brcm_dev_info(new_dev)->dent = NULL;     
 //new_dev->rtnl_link_ops = &brcm_link_ops;     
         
 grp = brcm_find_group(real_dev);     
 if (!grp)     
  grp = brcm_group_alloc(real_dev);     
          
 err = register_netdevice(new_dev);     
 if (err < 0)     
  goto out_free_newdev;     
          
 /* Account for reference in struct vlan_dev_info */ 
 dev_hold(real_dev);     
 brcm_group_set_device(grp, brcm_port, new_dev);     
         
 return 0;     
         
out_free_newdev:     
 free_netdev(new_dev);     
 return err;     
}

ioctl

由於brcm設備可以存在多個,並且和下層設備不是固定的對應關系,因此它的創 建應該可以人為控制,因此通過ioctl由用戶進行創建。這裡只為brcm提供了兩種操作-添加與刪除。一種設備添加一定是與下層 設備成關系的,因此添加時需要手動指明這種下層設備,然後通過__dev_get_by_name()從網絡空間中找到這種設備,就可以調 用register_brcm_dev()來完成注冊了。而設備的刪除則是直接刪除,直接刪除unregister_brcm_dev()。

static int brcm_ioctl_handler(struct net *net, void __user *arg)     
{     
    int err;     
    struct brcm_ioctl_args args;     
    struct net_device *dev = NULL;     
         
    if (copy_from_user(&args, arg, sizeof(struct brcm_ioctl_args)))     
        return -EFAULT;     
         
    /* Null terminate this sucker, just in case. */ 
    args.device1[23] = 0;     
    args.u.device2[23] = 0;     
         
    rtnl_lock();     
         
    switch (args.cmd) {     
    case ADD_BRCM_CMD:     
    case DEL_BRCM_CMD:     
        err = -ENODEV;     
        dev = __dev_get_by_name(net, args.device1);     
        if (!dev)     
            goto out;     
         
        err = -EINVAL;     
        if (args.cmd != ADD_BRCM_CMD && !is_brcm_dev(dev))     
            goto out;     
    }     
         
    switch (args.cmd) {     
    case ADD_BRCM_CMD:     
        err = -EPERM;     
        if (!capable(CAP_NET_ADMIN))     
            break;     
        err = register_brcm_dev(dev, args.u.port);     
        break;     
         
    case DEL_BRCM_CMD:     
        err = -EPERM;     
        if (!capable(CAP_NET_ADMIN))     
            break;     
        unregister_brcm_dev(dev, NULL);     
        err = 0;     
        break;     
                 
    default:     
        err = -EOPNOTSUPP;     
        break;     
    }     
out:     
    rtnl_unlock();     
    return err;     
}

這些是brcm協議模塊的主體部分了,當然它還不完整,在下篇中繼續完成brcm協議的添加,為它完善一些細節:proc 文件系統, notifier機制等等,以及內核Makefile的編寫,當然還有協議的測試。相關源碼在下篇中打包上傳。

Copyright © Linux教程網 All Rights Reserved