List, HList, and Hash Table

List

List 可以說是 Linux kernel 中最基礎的資料結構，它的定義如下 (include/linux/types.h):

struct list_head {
    struct list_head *next, *prev;
};

List 的操作定義在: include/linux/list.h

list_empty(head) - tests whether a list is empty
list_add(new_entry, head) - adds a new entry. Insert a new entry after the specified head.
list_del(entry) - deletes entry from list.
list_for_each(pos, head) - iterate over a list

這個 list 結構有幾個特點:

它是一個循環的雙向鏈結，沒有特定的"頭"和"尾"節點，每個節點都是可以是"頭"或是"尾"。這是一個非常優雅的設計，沒有"頭"和"尾"的特例，新增節點時只要將前一個節點的 next 指標和後面的節點的 prev 指標指向新節點，然後將新節點的 prev 和 next 指標各別指向前後節點 (為了方便說明，下面的程式碼和 kernel 有些許不同):
```
void __list_add(struct list_head *new_node,
                struct list_head *prev,
                struct list_head *next)
{
    prev->next = new_node;
    next->prev = new_node;
    new_node->prev = prev;
    new_node->next = next;
}
```
移除節點時，只需把前節點的 next 指向後節點，後節點的 prev 指向前節點:
```
void __list_del(struct list_head * prev, struct list_head * next)
{
    prev->next = next;
    next->prev = prev;
}

// list_del - deletes entry from list
void list_del(struct list_head *entry)
{
    __list_del(entry->prev, entry->next);
}
```
這樣完全不需要特別檢查是不是"頭"或"尾"節點，程式碼看起來非常優雅。要注意的是，從 list 中移除節點不包含釋放節點所佔的記憶體。

只要一直順著 next 指標就可以將整個 list 走訪一遍。 (由 head->next 開始，當 pos 指回到 head 時停止尋訪)

/**
 * list_for_each  - iterate over a list
 * @pos:  the &struct list_head to use as a loop cursor.
 * @head: the head for your list.
 */
#define list_for_each(pos, head) \
  for (pos = (head)->next; pos != (head); pos = pos->next)

這是一個泛化 (generic) 的設計，任何 struct 只要將 list_head 加入它的欄位中，這個 struct 就可以變成鏈結串列的節點。第一眼看到這個設計可能會覺得有點納悶，這跟我們在資料結構教科書上看到的鏈結串列設計不同。我們比較熟悉的鏈結串列結構應該像是這樣子:
```
struct file_system_type {
    const char* name;

    struct file_system_type* next; /* next file_system_type in list */
}

有一個相同資料結構的 next 指標指向下一個節點。
```

這樣的說明有點抽象，讓我們實際來看一個例子:

在下面的例子中，struct object 有一個 list_head 型態的成員: node

struct object
{
    int id;
    char name[16];

    struct list_head node;
};

void list_add_example()
{
    LIST_HEAD(obj_list); // declare a list head

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    list_add(&obj1.node, &obj_list); // add a new entry after obj_list

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    list_add(&obj2.node, &obj_list); // add a new entry after obj_list

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    list_add(&obj3.node, &obj_list); // add a new entry after obj_list
}

在經過上面的操作之後，這個 list 會變成下面這個樣子:

struct object 實際上是由 list_head 串起來。

利用 list_for_each 來走訪整個鏈結串列:

struct list_head* iter;
list_for_each(iter, &obj_list) {
    struct object* obj = list_entry(iter, struct object, node);
    printf("%s\n", obj->name);
}

輸出結果:

obj3
obj2
obj1

上面程式碼的一個重點在於利用 list_entry() 取得實體 struct object 的位址，而 list_entry() 實際上是由 container_of() 這個 macro 實現。

/**
 * list_entry - get the struct for this entry
 * @ptr:  the &struct list_head pointer.
 * @type: the type of the struct this is embedded in.
 * @member: the name of the list_head within the struct.
 */
#define list_entry(ptr, type, member) \
    container_of(ptr, type, member)

container_of()

我們已知 struct object 的成員 node 的記憶體位址，那如果我們也知道成員 node 在 struct object 裡的偏移量(offset)，這樣我們是不是可以推算出這個 struct object 結構體的位址？只要將成員 node 的位址依照偏移量往回推就可以得到 struct object 結構體的位址。

這個偏移量的推算可以由 offsetof() 這個 macro 完成:

http://man7.org/linux/man-pages/man3/offsetof.3.html

#undef offsetof
#ifdef __compiler_offsetof
#define offsetof(TYPE, MEMBER) __compiler_offsetof(TYPE, MEMBER)
#else
#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)
#endif

container_of() 這個神奇的 macro 就是利用 offsetof() 推算外部結構體的起始位址。以上面的例子來說，相對成員變數 node 而言，struct object 是它的容器(container)，container of node 就是 struct object。

/**
 * container_of - cast a member of a structure out to the containing structure
 * @ptr:    the pointer to the member.
 * @type:   the type of the container struct this is embedded in.
 * @member: the name of the member within the struct.
 */
#define container_of(ptr, type, member) ({                 \
    const typeof( ((type *)0)->member ) *__mptr = (ptr);   \
    (type *)( (char *)__mptr - offsetof(type, member) );})
struct object* obj = container_of(ptr, struct object, node);

上面的示意圖假設 struct object 沒有經過任何的 padding 及 alignment。我們知道成員 node 的位址 ptr 的值，偏移量是 20，struct object* obj 的位址可以很簡單的由 ptr - 20 = addr 來求得。推薦閱讀: The Magical container_of() Macro

HList and Hashtable

HList 用於 hash table 的實作，它的資料結構定義在 include/linux/types.h 之中:

struct hlist_head {
    struct hlist_node *first;
};

struct hlist_node {
    struct hlist_node *next, **pprev;
};

HList 的操作與 List 一樣定義在: include/linux/list.h ，以 hlist_ 開頭
hlist_head 和 hlist_node 用於 hash table 中 bucket 的實作，具有相同 hash value 的節點會放在同一條 hlist 中。為了節省空間，hlist_head 只使用一個 first 指標指向 hlist_node，沒有指向串列尾節點的指標。
hlist_node 有兩個指標，但是需要注意的是 pprev 是指標的指標，它指向的是前一個節點的 next 指標。這樣的好處是即使要刪除的節點是"最前頭的節點"時，也可以通過 *pprev = next 直接修改指標的指向。
```
// deletes entry from hlist
void __hlist_del(struct hlist_node* entry)
{
    struct hlist_node *next = entry->next;
    struct hlist_node **pprev = entry->pprev;
    *pprev = next;
    if (next)
      next->pprev = pprev;
}
```
在 Linux kernel 3.7 之後採用由 Sasha Levin 實作的通用型 hash table (LWN: A generic hash table)，使用 DEFINE_HASHTABLE(name, bits) 的 macro 來宣告 hash table:
- name: the name of the hash table
- bits: the number of bits of hash values
第二的參數 bits 比較特別，它代表的是 hash value 的有效位元數。若 bits = 3，hash value 的範團會是 0~7，若 bits = 10，hash value 的範團會是 0 ~ 1023。前者需要準備 8 個 buckets，後者則需要 1024 個 buckets，bits 參數會決定 hash table 的 buckets 的數量 (=2 ^bits)。hashtable 實作上會以 hlist_head array 的方式來配置。

舉例來說， DEFINE_HASHTABLE(htable, 3) 會展開成:
```
struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };
```
這是一個有 8 個 buckets 的 hash table。 (初始化的語法請參照 GCC Designated Initializers)

經由 hash function 將值映射到這 8 個 buckets 中，當衝突發生時，直接加到 hlist_head 後的串列中。
hash table 相關的操作定義在 include/linux/hashtable.h
- hash_init - initialize a hash table
- hash_empty - check whether a hashtable is empty
- hash_add - add an object to a hashtable
- hash_del - remove an object from a hashtable
- hash_for_each - iterate over a hashtable
- hash_for_each_possible - iterate over all possible objects hashing to the same bucket

hashtable 預設的 hash function 定義在 include/linux/hash.h ，有 32 和 64 兩個版本，如下:

/* 2^31 + 2^29 - 2^25 + 2^22 - 2^19 - 2^16 + 1 */
#define GOLDEN_RATIO_PRIME_32 0x9e370001UL

static inline u32 hash_32(u32 val, unsigned int bits)
{
    /* On some cpus multiply is faster, on others gcc will do shifts */
    u32 hash = val * GOLDEN_RATIO_PRIME_32;

    /* High bits are more random, so use them. */
    return hash >> (32 - bits);
}

/*  2^63 + 2^61 - 2^57 + 2^54 - 2^51 - 2^18 + 1 */
#define GOLDEN_RATIO_PRIME_64 0x9e37fffffffc0001UL

static __always_inline u64 hash_64(u64 val, unsigned int bits)
{
    u64 hash = val;

#if defined(CONFIG_ARCH_HAS_FAST_MULTIPLIER) && BITS_PER_LONG == 64
    hash = hash * GOLDEN_RATIO_PRIME_64;
#else
    /*  Sigh, gcc can't optimise this alone like it does for 32 bits. */
    u64 n = hash;
    n <<= 18;
    hash -= n;
    n <<= 33;
    hash -= n;
    n <<= 3;
    hash += n;
    n <<= 3;
    hash -= n;
    n <<= 4;
    hash += n;
    n <<= 2;
    hash += n;
#endif

    /* High bits are more random, so use them. */
    return hash >> (64 - bits);
}

Hashtable example:

下面用例子來說明 hashtable 的使用，將上面 list 例子中的 struct object 改成 hlist 的版本:

struct object
{
    int id;
    char name[16];

    struct hlist_node node;
};

void hashtable_example()
{
    // define a hash table with 2^3(=8) buckets
    DEFINE_HASHTABLE(htable, 3);
    // => struct hlist_head htable[8] = { [0 ... 7] = HLIST_HEAD_INIT };

    struct object obj1 = {
        .id = 1,
        .name = "obj1",
    };
    hash_add(htable, &obj1.node, obj1.id);

    struct object obj2 = {
        .id = 2,
        .name = "obj2",
    };
    hash_add(htable, &obj2.node, obj2.id);

    struct object obj3 = {
        .id = 3,
        .name = "obj3",
    };
    hash_add(htable, &obj3.node, obj3.id);

    struct object obj9 = {
        .id = 9,
        .name = "obj9",
    };
    hash_add(htable, &obj9.node, obj9.id);
}

經過上面的操作之後，hash table 呈現如下:

obj1 和 obj9 的 hash value 衝突，放入同一個 bucket 的串列中

以 hash_for_each_possible() 尋訪 bucket 內所有的節點(hlist_node)，因為 hash value 可能會有衝突的關係，同一個 bucket 內可能會有不同 key value 的節點，所以需要檢查是不是要查找的 key value。

int key = 1;
struct object* obj;
hash_for_each_possible(htable, obj, node, key) {
    if(obj->id == key) {
        printf("key=%d => %s\n", key, obj->name);
    }
}

小結

List, HList and hashtable 是 Linux kernel 中常見的資料結構，它們的設計是經過高手長時間淬煉過的精華，但因為它們的實作及使用方式和我們常見的設計不同，對初次接觸 Linux kernel 的新手肯定會覺得相當的困惑，在了解之後，又會非常佩服 kernel hackers 在設計上的巧思。它們被設計的十分優雅，高效又易於使用。

範例程式

上面的範例程式可以在下面的 github 連結找到:

https://github.com/danielmaker/linux_study/tree/master/list_example

Daniel Jslin

May the source be with you