內核版本:2.6.34
實現思路:
報文在網絡協議棧中的流動,對於接收來講,是 對報文的脫殼的過程,由於報文是已知的輸入,只要逐個解析協議號;對於發送來講,是各層發送函數的嵌套調用,由於沒有已 知的輸入,只能按事先設計好的協議進行層層構造。但無論報文怎樣的流動,核心是報文所在設備(skb->dev)的變化,相當 於各層之間傳遞的交接棒。
按照上述思路,brcm協議接收的處理作為模塊brcm_packet_type加入 到ptype_base中就可以了;brcm協議發送的處理則復雜一點,發送的嵌套調用完全是依賴於設備來推動的,因此要有一種新創建 的設備X,插入到vlan設備和網卡設備之間。
因此,至少要有brcm_packet_type來加入ptype_base和register_brcm_dev() 來向系統注冊設備X。進一步考慮,設備X在全局量init_net中有存儲,但我們還需要知道設備X與vlan設備以及網卡設備是何種 組織關系,所以在這裡設計了brcm_group_hash來存儲這種關系。為了對設備感興趣的事件作出響應,添加自己的notifier到 netdev_chain中。另外,為了用戶空間具有一定控制能力(如創建、刪除),還需要添加brcm相關的ioctl調用。為了讓它看起來 更完整,一種新的設備在proc中也應有對應項,用來調試和查看設備。
從最簡單開始
要 讓網絡協議棧能夠接收一種新協議是很簡單的,由於已經有報文作為輸入,我們要做的僅僅是編寫好brcm_packet_type,然後在 注冊模塊時只用做一件事:dev_add_pack。
static int __init
brcm_proto_init(void)
{
dev_add_pack(&brcm_packet_type);
}
static struct packet_type brcm_packet_type __read_mostly = {
.type = cpu_to_be16(ETH_P_BRCM),
.func = brcm_skb_recv, /* BRCM receive method */
};
int brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *ptype, struct net_device *orig_dev)
{
struct brcm_hdr *bhdr;
struct brcm_rx_stats *rx_stats;
skb = skb_share_check(skb, GFP_ATOMIC);
if(!skb)
goto err_free;
bhdr = (struct brcm_hdr *)skb->data;
rcu_read_lock();
skb_pull_rcsum(skb, BRCM_HLEN);
// set protocol
skb->protocol = bhdr->brcm_encapsulated_proto;
// reorder skb
skb = brcm_check_reorder_header(skb);
if (!skb)
goto err_unlock;
netif_rx(skb);
rcu_read_unlock();
return NET_RX_SUCCESS;
err_unlock:
rcu_read_unlock();
err_free:
kfree_skb(skb);
return NET_RX_DROP;
}
注冊這個模塊後,協議棧就能正常接收帶brcm報頭的報文的,代碼中ETH_P_BRCM是brcm的協議號,BRCM_HLEN是brcm 的報頭長度。正是由於有報文作為輸入,接收變得十分簡單。
但這僅僅是能接收而已,發送的報 文還是不帶brcm報頭的,而且接收的這段代碼也很粗略,沒有變更skb的設備,沒有記錄流量,沒有對brcm報頭作有意義的處理 ,下面逐一進行添加。
設備的相關定義
一種設備就是net_device類型,而每種設備都有自 己的私有變量,它存儲在net_device末尾,定義如下,其中real_dev指向下層設備,這是最基本屬性,其余可以視需要自己設定 ,brcm_rx_stats則是該設備接收流量統計:
struct brcm_dev_info{
struct net_device *real_dev;
u16 brcm_port;
unsigned char real_dev_addr[ETH_ALEN];
struct proc_dir_entry *dent;
struct brcm_rx_stats __percpu *brcm_rx_stats;
};
struct brcm_rx_stats {
unsigned long rx_packets;
unsigned long rx_bytes;
unsigned long multicast;
unsigned long rx_errors;
};
設備間的關系問題
如果brcm僅僅是只有一個設備,則無需數據結構來存儲這種關系, 一個全局全變的brcm_dev就可以了。這裡的設計考慮的是復雜的情況,可以存在多個下層設備,多個brcm設備,之間沒有固定的 關系。所以需要一種數據結構來存儲這種關系- brcm_group_hash。下面是一個簡單的圖示:

各個數據結構定義如下:
static struct hlist_head brcm_group_hash
[BRCM_GRP_HASH_SIZE];
struct brcm_group {
struct hlist_node hlist;
struct net_device *real_dev;
int nr_ports;
int killall;
struct net_device *brcm_devices_array[BRCM_GROUP_ARRAY_LEN];
struct rcu_head rcu;
};
brcm_group_hash作為全局變量存在,以hash表形式組織,brcm_group被插入到brcm_group_hash中,brcm_group存 儲了它與下層設備的關系(eth與brcm),real_dev指向e下層設備,而brcm設備則存儲在brcm_devices_array數組中。
下面完成由下層設備轉換成brcm設備的函數,brcm_port是報頭中的值,可以自己設定它的含義,這裡設定它表 示報文來自於哪個端口。
struct net_device *find_brcm_dev(struct
net_device *real_dev, u16 brcm_port)
{
struct brcm_group *grp = brcm_find_group(real_dev);
if (grp)
brcm_dev = grp->brcm_devices_array[brcm_port];
return NULL;
}
因為在接收報文時,報文到達brcm層開始處理時,skb->dev指向的仍是下層設備,這時通過skb->dev查到 brcm_group->real_dev相匹配的hash項,然後通過報文brcm報頭的信息,確定brcm_group->brcm_devices_array中哪個 brcm設備作為skb的新設備;
而在發送報文時,報文到達brcm層開始處理時,skb->dev指向的是brcm 設備,為了繼續向下傳遞,需要變更為它的下層設備,在設備數據net_device的私有數據部分,一般會存儲一個指針,指向它的 下層設備,因此skb->dev只要變更為brcm_dev_info(dev)->real_dev。
流量統計
在數據結構中,brcm設備的私有數據brcm_dev_info中brcm_rx_stats記錄接收的流量信息;而dev->_tx[index]則會記錄發送 的流量信息。
在接收函數brcm_skb_rcv()中對於成功接收的報文會增加流量統計:
rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats, smp_processor_id()); rx_stats->rx_packets++; rx_stats->rx_bytes += skb->len;
在發送函數brcm_dev_hard_start_xmit()中對於發送的報文會增加相應流量 統計:
if (likely(ret == NET_XMIT_SUCCESS)) {
txq->tx_packets++;
txq->tx_bytes += len;
} else
txq->tx_dropped++;
而brcm_netdev_ops->ndo_get_stats()即brcm_dev_get_stats()函數,則會將brcm網卡設 備中記錄的發送和接收流量信息匯總成通用的格式net_device_stats,像ifconfig等命令使用的就是net_device_stats轉換後的 結果。
完整收發函數
有了這些後接收函數brcm_skb_recv()就可以完整了,其中關於報頭 brcm_hdr的處理可以略過,由於是空想的協議,含義是可以自己設定的:
int
brcm_skb_recv(struct sk_buff *skb, struct net_device *dev,
struct packet_type *ptype, struct net_device *orig_dev)
{
struct brcm_hdr *bhdr;
struct brcm_rx_stats *rx_stats;
int op, brcm_port;
skb = skb_share_check(skb, GFP_ATOMIC);
if(!skb)
goto err_free;
bhdr = (struct brcm_hdr *)skb->data;
op = bhdr->brcm_tag.brcm_53242_op;
brcm_port = bhdr->brcm_tag.brcm_53242_src_portid- 23;
rcu_read_lock();
// drop wrong brcm tag packet
if (op != BRCM_RCV_OP || brcm_port < 1
|| brcm_port > 27)
goto err_unlock;
skb->dev = find_brcm_dev(dev, brcm_port);
if (!skb->dev) {
goto err_unlock;
}
rx_stats = per_cpu_ptr(brcm_dev_info(skb->dev)->brcm_rx_stats,
smp_processor_id());
rx_stats->rx_packets++;
rx_stats->rx_bytes += skb->len;
skb_pull_rcsum(skb, BRCM_HLEN);
switch (skb->pkt_type) {
case PACKET_BROADCAST: /* Yeah, stats collect these together.. */
/* stats->broadcast ++; // no such counter :-( */
break;
case PACKET_MULTICAST:
rx_stats->multicast++;
break;
case PACKET_OTHERHOST:
/* Our lower layer thinks this is not local, let's make sure.
* This allows the VLAN to have a different MAC than the
* underlying device, and still route correctly.
*/
if (!compare_ether_addr(eth_hdr(skb)->h_dest,
skb->dev->dev_addr))
skb->pkt_type = PACKET_HOST;
break;
default:
break;
}
// set protocol
skb->protocol = bhdr->brcm_encapsulated_proto;
// reorder skb
skb = brcm_check_reorder_header(skb);
if (!skb) {
rx_stats->rx_errors++;
goto err_unlock;
}
netif_rx(skb);
rcu_read_unlock();
return NET_RX_SUCCESS;
err_unlock:
rcu_read_unlock();
err_free:
kfree_skb(skb);
return NET_RX_DROP;
}
同時,發送函數brcm_dev_hard_start_xmit()可以完整了,同樣,其中關於brcm_hdr的處理可以略過:
static netdev_tx_t brcm_dev_hard_start_xmit(struct sk_buff *skb,
struct net_device *dev)
{
int i = skb_get_queue_mapping(skb);
struct netdev_queue *txq = netdev_get_tx_queue(dev, i);
struct brcm_ethhdr *beth = (struct brcm_ethhdr *)(skb->data);
unsigned int len;
u16 brcm_port;
int ret;
/* Handle non-VLAN frames if they are sent to us, for example by DHCP.
*
* NOTE: THIS ASSUMES DIX ETHERNET, SPECIFICALLY NOT SUPPORTING
* OTHER THINGS LIKE FDDI/TokenRing/802.3 SNAPs...
*/
if (beth->h_brcm_proto != htons(ETH_P_BRCM)){
//unsigned int orig_headroom = skb_headroom(skb);
brcm_t brcm_tag;
brcm_port = brcm_dev_info(dev)->brcm_port;
if (brcm_port == BRCM_ANY_PORT) {
brcm_tag.brcm_op_53242 = 0;
brcm_tag.brcm_tq_53242 = 0;
brcm_tag.brcm_te_53242 = 0;
brcm_tag.brcm_dst_53242 = 0;
}else {
brcm_tag.brcm_op_53242 = BRCM_SND_OP;
brcm_tag.brcm_tq_53242 = 0;
brcm_tag.brcm_te_53242 = 0;
brcm_tag.brcm_dst_53242 = brcm_port + 23;
}
skb = brcm_put_tag(skb, *(u32 *)(&brcm_tag));
if (!skb) {
txq->tx_dropped++;
return NETDEV_TX_OK;
}
}
skb_set_dev(skb, brcm_dev_info(dev)->real_dev);
len = skb->len;
ret = dev_queue_xmit(skb);
if (likely(ret == NET_XMIT_SUCCESS)) {
txq->tx_packets++;
txq->tx_bytes += len;
} else
txq->tx_dropped++;
return ret;
}
注冊設備
接收通過dev_add_pack(),就可以融入協議棧了,前面幾篇的分析已經講過 通過ptype_base對報文進行脫殼。現在要融入的發送,函數已經完成了,既然發送是一種嵌套的調用,並且是由dev來推過的, 那麼發送函數的融入一定在設備進行注冊時,作為設備的一種發送方法。
創建一種設備時,一定 會有設備的XXX_setup()初始化,大部分設備都會用ether_setup()來作初始化,再進行適當更改。下面是brcm_setup():
void brcm_setup(struct net_device *dev)
{
ether_setup(dev);
dev->priv_flags |= IFF_BRCM_TAG;
dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
dev->tx_queue_len = 0;
dev->netdev_ops = &brcm_netdev_ops;
dev->destructor = free_netdev;
dev->ethtool_ops = &brcm_ethtool_ops;
memset(dev->broadcast, 0, ETH_ALEN);
}
其中發送函數就在brcm_netdev_ops中,每層設備都會這樣調用:dev->netdev_ops->ndo_start_xmit()。
static const struct net_device_ops brcm_netdev_ops = {
.ndo_change_mtu = brcm_dev_change_mtu,
.ndo_init = brcm_dev_init,
.ndo_uninit = brcm_dev_uninit,
.ndo_open = brcm_dev_open,
.ndo_stop = brcm_dev_stop,
.ndo_start_xmit = brcm_dev_hard_start_xmit,
.ndo_validate_addr = eth_validate_addr,
.ndo_set_mac_address = brcm_dev_set_mac_address,
.ndo_set_rx_mode = brcm_dev_set_rx_mode,
.ndo_set_multicast_list = brcm_dev_set_rx_mode,
.ndo_change_rx_flags = brcm_dev_change_rx_flags,
//.ndo_do_ioctl = brcm_dev_ioctl,
.ndo_neigh_setup = brcm_dev_neigh_setup,
.ndo_get_stats = brcm_dev_get_stats,
};
而設備的初始化應該發生在創建設備時,也就是向網絡注冊它時,也就是register_brcm_dev(),注冊一個新設備, 需要知道它的下層設備real_dev以及唯一標識brcm設備的brcm_port。首先確定該設備沒有被創建,然後用alloc_netdev_mq創建 新設備new_dev,然後設置相關屬性,特別是它的私有屬性brcm_dev_info(new_dev),然後添加它到brcm_group_hash中,最後發 生真正的注冊register_netdevice()。
static int register_brcm_dev(struct
net_device *real_dev, u16 brcm_port)
{
struct net_device *new_dev;
struct net *net = dev_net(real_dev);
struct brcm_group *grp;
char name[IFNAMSIZ];
int err;
if(brcm_port >= BRCM_PORT_MASK)
return -ERANGE;
// exist yet
if (find_brcm_dev(real_dev, brcm_port) != NULL)
return -EEXIST;
snprintf(name, IFNAMSIZ, "brcm%i", brcm_port);
new_dev = alloc_netdev_mq(sizeof(struct brcm_dev_info), name,
brcm_setup, 1);
if (new_dev == NULL)
return -ENOBUFS;
new_dev->real_num_tx_queues = real_dev->real_num_tx_queues;
dev_net_set(new_dev, net);
new_dev->mtu = real_dev->mtu;
brcm_dev_info(new_dev)->brcm_port = brcm_port;
brcm_dev_info(new_dev)->real_dev = real_dev;
brcm_dev_info(new_dev)->dent = NULL;
//new_dev->rtnl_link_ops = &brcm_link_ops;
grp = brcm_find_group(real_dev);
if (!grp)
grp = brcm_group_alloc(real_dev);
err = register_netdevice(new_dev);
if (err < 0)
goto out_free_newdev;
/* Account for reference in struct vlan_dev_info */
dev_hold(real_dev);
brcm_group_set_device(grp, brcm_port, new_dev);
return 0;
out_free_newdev:
free_netdev(new_dev);
return err;
}
ioctl
由於brcm設備可以存在多個,並且和下層設備不是固定的對應關系,因此它的創 建應該可以人為控制,因此通過ioctl由用戶進行創建。這裡只為brcm提供了兩種操作-添加與刪除。一種設備添加一定是與下層 設備成關系的,因此添加時需要手動指明這種下層設備,然後通過__dev_get_by_name()從網絡空間中找到這種設備,就可以調 用register_brcm_dev()來完成注冊了。而設備的刪除則是直接刪除,直接刪除unregister_brcm_dev()。
static int brcm_ioctl_handler(struct net *net, void __user *arg)
{
int err;
struct brcm_ioctl_args args;
struct net_device *dev = NULL;
if (copy_from_user(&args, arg, sizeof(struct brcm_ioctl_args)))
return -EFAULT;
/* Null terminate this sucker, just in case. */
args.device1[23] = 0;
args.u.device2[23] = 0;
rtnl_lock();
switch (args.cmd) {
case ADD_BRCM_CMD:
case DEL_BRCM_CMD:
err = -ENODEV;
dev = __dev_get_by_name(net, args.device1);
if (!dev)
goto out;
err = -EINVAL;
if (args.cmd != ADD_BRCM_CMD && !is_brcm_dev(dev))
goto out;
}
switch (args.cmd) {
case ADD_BRCM_CMD:
err = -EPERM;
if (!capable(CAP_NET_ADMIN))
break;
err = register_brcm_dev(dev, args.u.port);
break;
case DEL_BRCM_CMD:
err = -EPERM;
if (!capable(CAP_NET_ADMIN))
break;
unregister_brcm_dev(dev, NULL);
err = 0;
break;
default:
err = -EOPNOTSUPP;
break;
}
out:
rtnl_unlock();
return err;
}
這些是brcm協議模塊的主體部分了,當然它還不完整,在下篇中繼續完成brcm協議的添加,為它完善一些細節:proc 文件系統, notifier機制等等,以及內核Makefile的編寫,當然還有協議的測試。相關源碼在下篇中打包上傳。