1. 前言
由“linux cpufreq framework(3)_cpufreq core”的描述可知,cpufreq policy負責設定cpu調頻的一個大致范圍,而cpu的具體運行頻率,則需要由相應的cufreq governor決定(可自行調節頻率的CPU除外,後面會再詳細介紹)。那到底什麼是cpufreq
governor?它的運行機制是什麼?這就是本文要描述的內容。
2. cpufreq governor的實現
2.1 struct cpufreq_governor
kernel通過struct cpufreq_governor抽象cpufreq governor,如下:
1: /* include/linux/cpufreq.h */
2: struct cpufreq_governor {
3: char name[CPUFREQ_NAME_LEN];
4: int initialized;
5: int (*governor) (struct cpufreq_policy *policy,
6: unsigned int event);
7: ssize_t (*show_setspeed) (struct cpufreq_policy *policy,
8: char *buf);
9: int (*store_setspeed) (struct cpufreq_policy *policy,
10: unsigned int freq);
11: unsigned int max_transition_latency; /* HW must be able to switch to
12: next freq faster than this value in nano secs or we
13: will fallback to performance governor */
14: struct list_head governor_list;
15: struct module *owner;
16: };
name,該governor的名稱,唯一標識某個governor。
initialized,記錄governor是否已經初始化okay。
max_transition_latency,容許的最大頻率切換延遲,硬件頻率的切換必須小於這個值,才能滿足需求。
governor_list,用於將該governor掛到一個全局的governor鏈表(cpufreq_governor_list)上。
show_setspeed和store_setspeed,回憶一下“linux cpufreq framework(3)_cpufreq core”所描述的“scaling_setspeed ”,有些governor支持從用戶空間修改頻率值,此時該governor必須提供show_setspeed和store_setspeed兩個回調函數,用於響應用戶空間的scaling_setspeed請求。
governor,cpufreq governor的主要功能都是通過該回調函數實現,該函數借助不同的event,以狀態機的形式,實現governor的啟動、停止等操作,具體請參考後續的描述。
2.2 governor event
kernel將governor的控制方式抽象為下面的5個event,cpufreq core在合適的時機(具體參考下面第3章的介紹),以event的形式(.governor回調),控制governor完成相應的調頻動作。
1: /* include/linux/cpufreq.h */
2:
3: /* Governor Events */
4: #define CPUFREQ_GOV_START 1
5: #define CPUFREQ_GOV_STOP 2
6: #define CPUFREQ_GOV_LIMITS 3
7: #define CPUFREQ_GOV_POLICY_INIT 4
8: #define CPUFREQ_GOV_POLICY_EXIT 5
CPUFREQ_GOV_POLICY_INIT,policy啟動新的governor之前(通常在cpufreq policy剛創建或者governor改變時)發送。governor接收到這個event之後,會進行前期的准備工作,例如初始化一些必要的數據結構(如timer)等。並不是所有governor都需要這個event,後面如果有時間,我們以ondemand governor為例,再介紹它的意義。
CPUFREQ_GOV_START啟動governor。
CPUFREQ_GOV_STOP、CPUFREQ_GOV_POLICY_EXIT,和前面兩個event的意義相反。
CPUFREQ_GOV_LIMITS,通常在governor啟動後發送,要求governor檢查並修改頻率值,使其在policy規定的有效范圍內。
2.3 governor register
所有governor都是通過cpufreq_register_governor注冊到kernel中的,該接口比較簡單,查找是否有相同名稱的governor已經注冊,如果沒有,將這個governor掛到全局的鏈表即可,如下:
1: int cpufreq_register_governor(struct cpufreq_governor *governor)
2: {
3: int err;
4:
5: if (!governor)
6: return -EINVAL;
7:
8: if (cpufreq_disabled())
9: return -ENODEV;
10:
11: mutex_lock(&cpufreq_governor_mutex);
12:
13: governor->initialized = 0;
14: err = -EBUSY;
15: if (__find_governor(governor->name) == NULL) {
16: err = 0;
17: list_add(&governor->governor_list, &cpufreq_governor_list);
18: }
19:
20: mutex_unlock(&cpufreq_governor_mutex);
21: return err;
22: }
23: EXPORT_SYMBOL_GPL(cpufreq_register_governor);
3 governor相關的調用流程
3.1 啟動流程
“linux cpufreq framework(3)_cpufreq core”中介紹過,添加cpufreq設備時,會調用cpufreq_init_policy,該接口的主要功能是為當前的cpufreq policy分配並啟動一個cpufreq
governor,如下:
1: static void cpufreq_init_policy(struct cpufreq_policy *policy)
2: {
3: struct cpufreq_governor *gov = NULL;
4: struct cpufreq_policy new_policy;
5: int ret = 0;
6:
7: memcpy(&new_policy, policy, sizeof(*policy));
8:
9: /* Update governor of new_policy to the governor used before hotplug */
10: gov = __find_governor(per_cpu(cpufreq_cpu_governor, policy->cpu));
11: if (gov)
12: pr_debug("Restoring governor %s for cpu %d\n",
13: policy->governor->name, policy->cpu);
14: else
15: gov = CPUFREQ_DEFAULT_GOVERNOR;
16:
17: new_policy.governor = gov;
18:
19: /* Use the default policy if its valid. */
20: if (cpufreq_driver->setpolicy)
21: cpufreq_parse_governor(gov->name, &new_policy.policy, NULL);
22:
23: /* set default policy */
24: ret = cpufreq_set_policy(policy, &new_policy);
25: if (ret) {
26: pr_debug("setting policy failed\n");
27: if (cpufreq_driver->exit)
28: cpufreq_driver->exit(policy);
29: }
30: }
9~13行:首先查看是否在hotplug之前最後使用的governor(保存在per cpu的全局變量cpufreq_cpu_governor中),如果有,則直接使用這個governor。
14~15行:如果沒有,則使用默認的governor----CPUFREQ_DEFAULT_GOVERNOR,該governor在include/linux/cpufreq.h中定義,可以通過kernel配置項選擇,可選的governor包括performace、powersave、userspace、ondmand和conservative五種。
20~21行:如果cpufreq driver提供了setpolicy接口,則說明CPU可以在policy指定的有效范圍內,確定具體的運行頻率,因此不再需要governor確定運行頻率。但如果此時的governor是performace和powersave兩種,則有必要通知到cpufreq driver,以便它的setpolicy接口可以根據實際情況正確設置頻率范圍。怎麼通知呢?通過struct cpufreq_policy結構中的policy變量(名字很費解啊!),可選的值有兩個,CPUFREQ_POLICY_PERFORMANCE和CPUFREQ_POLICY_POWERSAVE。
24行:調用cpufreq_set_policy,啟動governor,代碼如下。
1: static int cpufreq_set_policy(struct cpufreq_policy *policy,
2: struct cpufreq_policy *new_policy)
3: {
4: ...
5: if (cpufreq_driver->setpolicy) {
6: policy->policy = new_policy->policy;
7: pr_debug("setting range\n");
8: return cpufreq_driver->setpolicy(new_policy);
9: }
10:
11: if (new_policy->governor == policy->governor)
12: goto out;
13:
14: pr_debug("governor switch\n");
15:
16: /* save old, working values */
17: old_gov = policy->governor;
18: /* end old governor */
19: if (old_gov) {
20: __cpufreq_governor(policy, CPUFREQ_GOV_STOP);
21: up_write(&policy->rwsem);
22: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
23: down_write(&policy->rwsem);
24: }
25:
26: /* start new governor */
27: policy->governor = new_policy->governor;
28: if (!__cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT)) {
29: if (!__cpufreq_governor(policy, CPUFREQ_GOV_START))
30: goto out;
31:
32: up_write(&policy->rwsem);
33: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_EXIT);
34: down_write(&policy->rwsem);
35: }
36:
37: /* new governor failed, so re-start old one */
38: pr_debug("starting governor %s failed\n", policy->governor->name);
39: if (old_gov) {
40: policy->governor = old_gov;
41: __cpufreq_governor(policy, CPUFREQ_GOV_POLICY_INIT);
42: __cpufreq_governor(policy, CPUFREQ_GOV_START);
43: }
44:
45: return -EINVAL;
46:
47: out:
48: pr_debug("governor: change or update limits\n");
49: return __cpufreq_governor(policy, CPUFREQ_GOV_LIMITS);
50: }
5~9行,對應上面20~21行的邏輯,如果有setpolicy接口,則直接調用,不再進行後續的governor操作,因此使用CPUFREQ_POLICY_PERFORMANCE和CPUFREQ_POLICY_POWERSAVE兩個值,變相的傳遞governor的信息。
11~12行,如果新舊governor相同,直接返回。
19~24行,如果存在舊的governor,停止它,流程是:
CPUFREQ_GOV_STOP---->CPUFREQ_GOV_POLICY_EXIT
剩余的代碼:啟動新的governor,流程是:CPUFREQ_GOV_POLICY_INIT---->CPUFREQ_GOV_START---->CPUFREQ_GOV_LIMITS
3.2 調頻流程
前面已經多次提到基於cpufreq governor的調頻思路,這裡再總結一下:
1)有兩種類型的cpu:一種只需要給定調頻范圍,cpu會在該范圍內自行確定運行頻率;另一種需要軟件指定具體的運行頻率。
2)對第一種cpu,cpufreq policy中會指定頻率范圍policy->{min, max},之後通過setpolicy接口,使其生效即可。
3)對第二種cpu,cpufreq policy在指定頻率范圍的同時,會指明使用的governor。governor在啟動後,會動態的(例如啟動一個timer,監測系統運行情況,並根據負荷調整頻率),或者靜態的(直接設置為某一個合適的頻率值),設定cpu運行頻率。
kernel document對這個過程有詳細的解釋,如下:
Documentation\cpu-freq\governors.txt
CPU can be set to switch independently | CPU can only be set
within specific "limits" | to specific frequencies
"CPUfreq policy"
consists of frequency limits (policy->{min,max})
and CPUfreq governor to be used
/ \
/ \
/ the cpufreq governor decides
/ (dynamically or statically)
/ what target_freq to set within
/ the limits of policy->{min,max}
/ \
/ \
Using the ->setpolicy call, Using the ->target/target_index call,
the limits and the the frequency closest
"policy" is set. to target_freq is set.
It is assured that it
is within policy->{min,max}
4 常用的governor介紹
最後,我們介紹一下kernel中常見的cpufreq governor。
1)Performance
性能優先的governor,直接將cpu頻率設置為policy->{min,max}中的最大值。
2)Powersave
功耗優先的governor,直接將cpu頻率設置為policy->{min,max}中的最小值。
3)Userspace
由用戶空間程序通過scaling_setspeed文件修改頻率。
4)Ondemand
根據CPU的當前使用率,動態的調節CPU頻率。
5)Conservative
類似Ondemand,不過頻率調節的會平滑一下,不會忽然調整為最大值,又忽然調整為最小值。
原創文章,轉發請注明出處。蝸窩科技,www.wowotech.net。