內核版本:2.6.34
實現思路:
報文在網絡協議棧中的流動,對於接收來講,是 對報文的脫殼的過程,由於報文是已知的輸入,只要逐個解析協議號;對於發送來講,是各層發送函數的嵌套調用,由於沒有已 知的輸入,只能按事先設計好的協議進行層層構造。但無論報文怎樣的流動,核心是報文所在設備(skb->dev)的變化,相當 於各層之間傳遞的交接棒。
按照上述思路,brcm協議接收的處理作為模塊brcm_packet_type加入 到ptype_base中就可以了;brcm協議發送的處理則復雜一點,發送的嵌套調用完全是依賴於設備來推動的,因此要有一種新創建 的設備X,插入到vlan設備和網卡設備之間。
因此,至少要有brcm_packet_type來加入ptype_base和register_brcm_dev() 來向系統注冊設備X。進一步考慮,設備X在全局量init_net中有存儲,但我們還需要知道設備X與vlan設備以及網卡設備是何種 組織關系,所以在這裡設計了brcm_group_hash來存儲這種關系。為了對設備感興趣的事件作出響應,添加自己的notifier到 netdev_chain中。另外,為了用戶空間具有一定控制能力(如創建、刪除),還需要添加brcm相關的ioctl調用。為了讓它看起來 更完整,一種新的設備在proc中也應有對應項,用來調試和查看設備。
從最簡單開始
要 讓網絡協議棧能夠接收一種新協議是很簡單的,由於已經有報文作為輸入,我們要做的僅僅是編寫好brcm_packet_type,然後在 注冊模塊時只用做一件事:dev_add_pack。
static int __init brcm_proto_init(void) { dev_add_pack(&brcm_packet_type); } static struct packet_type brcm_packet_type __read_mostly = { .type = cpu_to_be16(ETH_P_BRCM), .func = brcm_skb_recv, /* BRCM receive method */ }; int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type *ptype, struct net_device *orig_dev) { struct brcm_hdr *bhdr; struct brcm_rx_stats *rx_stats; skb = skb_share_check(skb, GFP_ATOMIC); if(!skb) goto err_free; bhdr = (struct brcm_hdr *)skb->data; rcu_read_lock(); skb_pull_rcsum(skb, BRCM_HLEN); // set protocol skb->protocol = bhdr->brcm_encapsulated_proto; // reorder skb skb = brcm_check_reorder_header(skb); if (!skb) goto err_unlock; netif_rx(skb); rcu_read_unlock(); return NET_RX_SUCCESS; err_unlock: rcu_read_unlock(); err_free: kfree_skb(skb); return NET_RX_DROP; }
注冊這個模塊後,協議棧就能正常接收帶brcm報頭的報文的,代碼中ETH_P_BRCM是brcm的協議號,BRCM_HLEN是brcm 的報頭長度。正是由於有報文作為輸入,接收變得十分簡單。
但這僅僅是能接收而已,發送的報 文還是不帶brcm報頭的,而且接收的這段代碼也很粗略,沒有變更skb的設備,沒有記錄流量,沒有對brcm報頭作有意義的處理 ,下面逐一進行添加。
設備的相關定義
一種設備就是net_device類型,而每種設備都有自 己的私有變量,它存儲在net_device末尾,定義如下,其中real_dev指向下層設備,這是最基本屬性,其余可以視需要自己設定 ,brcm_rx_stats則是該設備接收流量統計:
struct brcm_dev_info{ struct net_device *real_dev; u16 brcm_port; unsigned char real_dev_addr[ETH_ALEN]; struct proc_dir_entry *dent; struct brcm_rx_stats __percpu *brcm_rx_stats; }; struct brcm_rx_stats { unsigned long rx_packets; unsigned long rx_bytes; unsigned long multicast; unsigned long rx_errors; };
設備間的關系問題
如果brcm僅僅是只有一個設備,則無需數據結構來存儲這種關系, 一個全局全變的brcm_dev就可以了。這裡的設計考慮的是復雜的情況,可以存在多個下層設備,多個brcm設備,之間沒有固定的 關系。所以需要一種數據結構來存儲這種關系- brcm_group_hash。下面是一個簡單的圖示:
各個數據結構定義如下:
static struct hlist_head brcm_group_hash [BRCM_GRP_HASH_SIZE]; struct brcm_group { struct hlist_node hlist; struct net_device *real_dev; int nr_ports; int killall; struct net_device *brcm_devices_array[BRCM_GROUP_ARRAY_LEN]; struct rcu_head rcu; };
brcm_group_hash作為全局變量存在,以hash表形式組織,brcm_group被插入到brcm_group_hash中,brcm_group存 儲了它與下層設備的關系(eth與brcm),real_dev指向e下層設備,而brcm設備則存儲在brcm_devices_array數組中。
下面完成由下層設備轉換成brcm設備的函數,brcm_port是報頭中的值,可以自己設定它的含義,這裡設定它表 示報文來自於哪個端口。
struct net_device *find_brcm_dev(struct net_device *real_dev, u16 brcm_port) { struct brcm_group *grp = brcm_find_group(real_dev); if (grp) brcm_dev = grp->brcm_devices_array[brcm_port]; return NULL; }
因為在接收報文時,報文到達brcm層開始處理時,skb->dev指向的仍是下層設備,這時通過skb->dev查到 brcm_group->real_dev相匹配的hash項,然後通過報文brcm報頭的信息,確定brcm_group->brcm_devices_array中哪個 brcm設備作為skb的新設備;
而在發送報文時,報文到達brcm層開始處理時,skb->dev指向的是brcm 設備,為了繼續向下傳遞,需要變更為它的下層設備,在設備數據net_device的私有數據部分,一般會存儲一個指針,指向它的 下層設備,因此skb->dev只要變更為brcm_dev_info(dev)->real_dev。
流量統計
在數據結構中,brcm設備的私有數據brcm_dev_info中brcm_rx_stats記錄接收的流量信息;而dev->_tx[index]則會記錄發送 的流量信息。
在接收函數brcm_skb_rcv()中對於成功接收的報文會增加流量統計:
rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats, smp_processor_id()); rx_stats->rx_packets++; rx_stats->rx_bytes += skb->len;
在發送函數brcm_dev_hard_start_xmit()中對於發送的報文會增加相應流量 統計:
if (likely(ret == NET_XMIT_SUCCESS)) { txq->tx_packets++; txq->tx_bytes += len; } else txq->tx_dropped++;
而brcm_netdev_ops->ndo_get_stats()即brcm_dev_get_stats()函數,則會將brcm網卡設 備中記錄的發送和接收流量信息匯總成通用的格式net_device_stats,像ifconfig等命令使用的就是net_device_stats轉換後的 結果。
完整收發函數
有了這些後接收函數brcm_skb_recv()就可以完整了,其中關於報頭 brcm_hdr的處理可以略過,由於是空想的協議,含義是可以自己設定的:
int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev, struct packet_type *ptype, struct net_device *orig_dev) { struct brcm_hdr *bhdr; struct brcm_rx_stats *rx_stats; int op, brcm_port; skb = skb_share_check(skb, GFP_ATOMIC); if(!skb) goto err_free; bhdr = (struct brcm_hdr *)skb->data; op = bhdr->brcm_tag.brcm_53242_op; brcm_port = bhdr->brcm_tag.brcm_53242_src_portid- 23; rcu_read_lock(); // drop wrong brcm tag packet if (op != BRCM_RCV_OP || brcm_port < 1 || brcm_port > 27) goto err_unlock; skb->dev = find_brcm_dev(dev, brcm_port); if (!skb->dev) { goto err_unlock; } rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats, smp_processor_id()); rx_stats->rx_packets++; rx_stats->rx_bytes += skb->len; skb_pull_rcsum(skb, BRCM_HLEN); switch (skb->pkt_type) { case PACKET_BROADCAST: /* Yeah, stats collect these together.. */ /* stats->broadcast ++; // no such counter :-( */ break; case PACKET_MULTICAST: rx_stats->multicast++; break; case PACKET_OTHERHOST: /* Our lower layer thinks this is not local, let's make sure. * This allows the VLAN to have a different MAC than the * underlying device, and still route correctly. */ if (!compare_ether_addr(eth_hdr(skb)->h_dest, skb->dev->dev_addr)) skb->pkt_type = PACKET_HOST; break; default: break; } // set protocol skb->protocol = bhdr->brcm_encapsulated_proto; // reorder skb skb = brcm_check_reorder_header(skb); if (!skb) { rx_stats->rx_errors++; goto err_unlock; } netif_rx(skb); rcu_read_unlock(); return NET_RX_SUCCESS; err_unlock: rcu_read_unlock(); err_free: kfree_skb(skb); return NET_RX_DROP; }
同時,發送函數brcm_dev_hard_start_xmit()可以完整了,同樣,其中關於brcm_hdr的處理可以略過:
static netdev_tx_t brcm_dev_hard_start_xmit(struct sk_buff *skb, struct net_device *dev) { int i = skb_get_queue_mapping(skb); struct netdev_queue *txq = netdev_get_tx_queue(dev, i); struct brcm_ethhdr *beth = (struct brcm_ethhdr *)(skb->data); unsigned int len; u16 brcm_port; int ret; /* Handle non-VLAN frames if they are sent to us, for example by DHCP. * * NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING * OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs... */ if (beth->h_brcm_proto != htons(ETH_P_BRCM)){ //unsigned int orig_headroom = skb_headroom(skb); brcm_t brcm_tag; brcm_port = brcm_dev_info(dev)->brcm_port; if (brcm_port == BRCM_ANY_PORT) { brcm_tag.brcm_op_53242 = 0; brcm_tag.brcm_tq_53242 = 0; brcm_tag.brcm_te_53242 = 0; brcm_tag.brcm_dst_53242 = 0; }else { brcm_tag.brcm_op_53242 = BRCM_SND_OP; brcm_tag.brcm_tq_53242 = 0; brcm_tag.brcm_te_53242 = 0; brcm_tag.brcm_dst_53242 = brcm_port + 23; } skb = brcm_put_tag(skb, *(u32 *)(&brcm_tag)); if (!skb) { txq->tx_dropped++; return NETDEV_TX_OK; } } skb_set_dev(skb, brcm_dev_info(dev)->real_dev); len = skb->len; ret = dev_queue_xmit(skb); if (likely(ret == NET_XMIT_SUCCESS)) { txq->tx_packets++; txq->tx_bytes += len; } else txq->tx_dropped++; return ret; }
注冊設備
接收通過dev_add_pack(),就可以融入協議棧了,前面幾篇的分析已經講過 通過ptype_base對報文進行脫殼。現在要融入的發送,函數已經完成了,既然發送是一種嵌套的調用,並且是由dev來推過的, 那麼發送函數的融入一定在設備進行注冊時,作為設備的一種發送方法。
創建一種設備時,一定 會有設備的XXX_setup()初始化,大部分設備都會用ether_setup()來作初始化,再進行適當更改。下面是brcm_setup():
void brcm_setup(struct net_device *dev) { ether_setup(dev); dev->priv_flags |= IFF_BRCM_TAG; dev->priv_flags &= ~IFF_XMIT_DST_RELEASE; dev->tx_queue_len = 0; dev->netdev_ops = &brcm_netdev_ops; dev->destructor = free_netdev; dev->ethtool_ops = &brcm_ethtool_ops; memset(dev->broadcast, 0, ETH_ALEN); }
其中發送函數就在brcm_netdev_ops中,每層設備都會這樣調用:dev->netdev_ops->ndo_start_xmit()。
static const struct net_device_ops brcm_netdev_ops = { .ndo_change_mtu = brcm_dev_change_mtu, .ndo_init = brcm_dev_init, .ndo_uninit = brcm_dev_uninit, .ndo_open = brcm_dev_open, .ndo_stop = brcm_dev_stop, .ndo_start_xmit = brcm_dev_hard_start_xmit, .ndo_validate_addr = eth_validate_addr, .ndo_set_mac_address = brcm_dev_set_mac_address, .ndo_set_rx_mode = brcm_dev_set_rx_mode, .ndo_set_multicast_list = brcm_dev_set_rx_mode, .ndo_change_rx_flags = brcm_dev_change_rx_flags, //.ndo_do_ioctl = brcm_dev_ioctl, .ndo_neigh_setup = brcm_dev_neigh_setup, .ndo_get_stats = brcm_dev_get_stats, };
而設備的初始化應該發生在創建設備時,也就是向網絡注冊它時,也就是register_brcm_dev(),注冊一個新設備, 需要知道它的下層設備real_dev以及唯一標識brcm設備的brcm_port。首先確定該設備沒有被創建,然後用alloc_netdev_mq創建 新設備new_dev,然後設置相關屬性,特別是它的私有屬性brcm_dev_info(new_dev),然後添加它到brcm_group_hash中,最後發 生真正的注冊register_netdevice()。
static int register_brcm_dev(struct net_device *real_dev, u16 brcm_port) { struct net_device *new_dev; struct net *net = dev_net(real_dev); struct brcm_group *grp; char name[IFNAMSIZ]; int err; if(brcm_port >= BRCM_PORT_MASK) return -ERANGE; // exist yet if (find_brcm_dev(real_dev, brcm_port) != NULL) return -EEXIST; snprintf(name, IFNAMSIZ, "brcm%i", brcm_port); new_dev = alloc_netdev_mq(sizeof(struct brcm_dev_info), name, brcm_setup, 1); if (new_dev == NULL) return -ENOBUFS; new_dev->real_num_tx_queues = real_dev->real_num_tx_queues; dev_net_set(new_dev, net); new_dev->mtu = real_dev->mtu; brcm_dev_info(new_dev)->brcm_port = brcm_port; brcm_dev_info(new_dev)->real_dev = real_dev; brcm_dev_info(new_dev)->dent = NULL; //new_dev->rtnl_link_ops = &brcm_link_ops; grp = brcm_find_group(real_dev); if (!grp) grp = brcm_group_alloc(real_dev); err = register_netdevice(new_dev); if (err < 0) goto out_free_newdev; /* Account for reference in struct vlan_dev_info */ dev_hold(real_dev); brcm_group_set_device(grp, brcm_port, new_dev); return 0; out_free_newdev: free_netdev(new_dev); return err; }
ioctl
由於brcm設備可以存在多個,並且和下層設備不是固定的對應關系,因此它的創 建應該可以人為控制,因此通過ioctl由用戶進行創建。這裡只為brcm提供了兩種操作-添加與刪除。一種設備添加一定是與下層 設備成關系的,因此添加時需要手動指明這種下層設備,然後通過__dev_get_by_name()從網絡空間中找到這種設備,就可以調 用register_brcm_dev()來完成注冊了。而設備的刪除則是直接刪除,直接刪除unregister_brcm_dev()。
static int brcm_ioctl_handler(struct net *net, void __user *arg) { int err; struct brcm_ioctl_args args; struct net_device *dev = NULL; if (copy_from_user(&args, arg, sizeof(struct brcm_ioctl_args))) return -EFAULT; /* Null terminate this sucker, just in case. */ args.device1[23] = 0; args.u.device2[23] = 0; rtnl_lock(); switch (args.cmd) { case ADD_BRCM_CMD: case DEL_BRCM_CMD: err = -ENODEV; dev = __dev_get_by_name(net, args.device1); if (!dev) goto out; err = -EINVAL; if (args.cmd != ADD_BRCM_CMD && !is_brcm_dev(dev)) goto out; } switch (args.cmd) { case ADD_BRCM_CMD: err = -EPERM; if (!capable(CAP_NET_ADMIN)) break; err = register_brcm_dev(dev, args.u.port); break; case DEL_BRCM_CMD: err = -EPERM; if (!capable(CAP_NET_ADMIN)) break; unregister_brcm_dev(dev, NULL); err = 0; break; default: err = -EOPNOTSUPP; break; } out: rtnl_unlock(); return err; }
這些是brcm協議模塊的主體部分了,當然它還不完整,在下篇中繼續完成brcm協議的添加,為它完善一些細節:proc 文件系統, notifier機制等等,以及內核Makefile的編寫,當然還有協議的測試。相關源碼在下篇中打包上傳。