Matsui 對 5 輪 DES 的線性攻擊
我試圖理解 Mitsuru Matsui 的“ DES Cipher 的線性密碼分析方法”,特別是他在第 5 節末尾描述的關於 5 輪 DES 的攻擊。我跟踪了 3 輪的攻擊,這是它的數學:
對於 5 輪 DES,我提煉了一些東西,所以只有 4 種類型的變數:
- 明文位(PL、PH 表示低、高)。
- 密文位(CL、CH)。
- 密鑰位( $ K_i $ ).
- 來自輪輸出的高位( $ r_i $ 為了 $ i $ -第輪)。
因為每個變數要麼是其中一個,要麼是其中兩個的異或。Matsui 說應該在第 1 輪和第 5 輪使用紅色方程,在第 2 輪和第 4 輪使用綠色方程。設置將是這樣的:
我的問題是:
- 如何使用這 4 個方程來求解 5 輪 DES?我用於 3 輪的方法似乎不起作用。
- 兩個線性近似可以連接需要什麼條件?或者,為什麼松井在 5 輪 DES 中使用兩條不同的軌跡,而不是像他在 3 輪 DES 中使用的那樣只使用綠色的(他找到的第一個),如果綠色的偏差比紅色的高?
- 一般來說,如何將這種攻擊擴展到 $ n $ -圓形DES?如何在算法上對任何 Feistel 密碼執行此操作?
謝謝!
編輯:我設法證明了 5 輪近似值,但老實說,我不明白我做了什麼,我只是遵循方程式,一些變數消失了:
我很想了解這通常是如何工作的。在最後的松井表中,他使用了“AB-BA”之類的近似軌跡。這是什麼 - 正式的意思,為什麼 A 和 B 可以雙向連接?
找到了答案。
首先,我將稍微改變一下符號,使方程變得更加對稱。
使用這種表示法,明文單詞 $ PH $ 在松井的論文中變成 $ x_0 $ , 和 $ PL $ 變成 $ x_1 $ . 密文單詞 $ CH $ 變成 $ x_{n+1} $ , 和 $ CL $ 變成 $ x_n $ .
我們可以找到一個近似值 $ n $ 通過圖形搜尋進行舍入。此圖中的節點將如下所示:
$$ \bigoplus_{i=0}^{n+1} x_i[\delta_i] = \bigoplus_{i=1}^n k_i[\epsilon_i] \tag{1} $$
在哪裡 $ \delta_i $ 和 $ \epsilon_i $ 是位遮罩。
要查看邊緣是什麼,假設我們有一個像這樣的 1 輪近似值:
$$ x[\alpha] \oplus F(x, k)[\beta] = k[\gamma] \tag{2} $$
我們可以在輪次實例化它,比如說, $ i $ , 要得到:
$$ x_i[\alpha] \oplus F(x_i, k_i)[\beta] = k_i[\gamma] \tag{3} $$
現在我們可以使用 Feistel 結構,知道 $ x_{i+1} = x_{i-1} \oplus F(x_i, k_i) $ , 重寫 $ F(x_i, k_i) $ 作為 $ x_{i+1} \oplus x_{i-1} $ , 所以:
$$ x_i[\alpha] \oplus \left(x_{i+1} \oplus x_{i-1}\right)[\beta] = k_i[\gamma] \tag{4} $$
這將是我們圖中的一條邊。也就是說,對於所有線性近似,以及所有實例化輪次 $ i $ ,我們可以創建這樣的邊緣。如果邊的來源是方程 $ (1) $ ,邊緣的目標是以下等式,由下式獲得 $ \operatorname{XOR} $ 在 $ (1) $ 和 $ (4) $ :
$$ \bigoplus_{j=0}^{i-2} x_j[\delta_j] \oplus x_{i-1}[\delta_{i-1} \oplus \beta] \oplus x_i[\delta_i \oplus \alpha] \oplus x_{i+1}[\delta_{i+1} \oplus \beta] \bigoplus_{j=i+2}^{n+1} x_j[\delta_j] = k_i[\epsilon_j \oplus \gamma] \bigoplus_{j=1}^n k_j[\epsilon_j] \tag{5} $$
一個 $ n $ -round 近似是這樣一個節點 $ (1) $ , 面具在哪裡 $ \delta_2 \dots \delta_{n-1} $ 都是 $ 0 $ . 也就是說,等式只涉及明文、密文和密鑰。
我們從一個簡單的近似開始我們的圖搜尋,其中 $ \forall i, \delta_i = 0 $ , 和 $ \forall i, \gamma_i = 0 $ . 看看我們實際上會採用什麼邊緣,一些命名法:
- 一個狀態 $ x_i $ 被稱為“隱藏”時 $ 1 < i < n $ . 也就是說,它既不是明文也不是密文。一張面具 $ \delta_i $ 在相同條件下被稱為“隱藏”。
我們只會佔優勢 $ e $ ,這將我們從一個節點 $ v: \bigoplus_{i=0}^{n+1} x_i[\delta_i] = \bigoplus_{i=1}^n k_i[\epsilon_i] $ 到一個節點 $ w: \bigoplus_{i=0}^k x_i[\delta’i] = \bigoplus{i=1}^k k_i[\epsilon’_i] $ , 當最多 2 個隱藏遮罩在 $ w $ 是非零的。
當所有隱藏遮罩都為零時,我們到達結束狀態,並且我們不在開始節點處。
我們使用堆積引理來計算權重,我們可以在對數空間中這樣做以提高精度。
下面是複製附錄中 Matsui 的結果表的 C++ 原始碼。我已經使用 Dijkstra 的算法進行圖形搜尋,但實際上它是矯枉過正的,動態程式解決方案也可以。這是因為我們關心的唯一路徑是在它們應用一輪近似的位置增加(即,它們從空近似開始,並在輪次應用它,例如,1、2、3、5、6、8、 9、10,並達到最終狀態)。然而,Dijkstra 已經立即執行,因此無需過多考慮。
這裡唯一特定於 DES 的是一輪近似值
one_round_approximations
。修改它可以找到任何 Feistel 網路的線性鏈,給定圓形函式的近似值。對於
NUM_ROUNDS = 10
,此程式碼輸出:Best approximation: round_number: 10 state masks = [1: [7, 18, 24, 29], 10: [15], 11: [7, 18, 24, 29]] key masks = [1: [22], 2: [44], 3: [22], 5: [22], 6: [44], 7: [22], 9: [22]] applied approximations: [-ACD-DCA-A] Probability: 0.500047 Log2(Bias): -14.3904
這與松井的論文完全吻合。
// Finds chains of linear approximations in a Feistel cipher. // // A round of a Feistel cipher can be described as: // x0 x1 // | k | // | | | // v v | // + <- F --- // | | // v v // x2 // // Where + is bitwise xor, and F is a keyed permutation function. Algebraically, // x2 = x0 + F(k, x1) (1) // The wire that carries x1 remains unmodified, and will be swapped with x2 // before the next round in the Feistel network. The whole network then looks // like this, where ===><== means we swap the two wires: // x0 x1 // | k1 | // | | | // v v | // + <- F --- // | | // v v // x2 x1 // | | // x1===><==x2 // | | // | k2 | // | | | // v v | // + <- F --- // | | // v v // x3 x2 // | | // ... // // // A one-round approximation to F is three bitmasks, alpha, beta, gamma, such // that // x[alpha] + F(x, k)[beta] = k[gamma] // holds with some probability p. Here the notation a[m] means that we xor the // bits in the bit string a, indicated by the mask m. So a[0b101] means // ((a & 100) >> 2) ^ (a & 1). // // We note that equation (1) tells us that F(k, x1) = x2 ^ x0. In general, if we // look at the i-th round, equation (1) tells us that // F(x_{i + 1}, k_{i + 1}) = x_{i + 2} + x_i (2) // // Thus, if we have an approximation like: // // x[alpha] + F(x, k)[beta] = k[gamma] // // We can instantiate it at any given round, for example: // // x1[alpha] + F(x1, k1)[beta] = k1[gamma] // // And then use equation (2) to rewrite this as: // // x1[alpha] + (x2 + x0)[beta] = k1[gamma] // // In this way we obtain equations for the values of the wires in the Feistel // network. In general, they are always going to be of the form: // // x_{i + 1}[alpha] + (x_{i + 2} + x_i)[beta] = k_{i + 1}[gamma] (3) // // Our goal is to start with a very simple equation: // // x0[0] + x1[0] = 0 // // which holds with probability 1, and applying these equations we got, reach an // equation that involves only: // * x0, the high word in the plaintext. // * x1, the low word in the plaintext. // * x_{n+1}, the high word in the ciphertext. // * x_n, the low word in the ciphertext. // * Some round keys k_i. // and we want to know the probability with which they hold. We call this sort // of equation a linear approximation of the full cipher. // // This program considers a graph where each node is of the form: // (m_0, m_1, ..., m_{N+1}, km_0, km_1, ..., km_0, p) // where N is the number of rounds, each m_i is a 32-bit bitmask, each km_i is a // 64-bit bitmask, and p is a probability. The meaning of this node is: // // with probability p, // (\sum_{i=0}^{N + 1} x_i[m_i]) + (\sum_{i=0}^{N-1} k_i[km_i]) = 0 // // where x_i are the values of the wires in the Feistel network, k_i are the // round keys, m_i are bitmasks for the x_i, and km_i are masks for the k_i. // // Starting at the node where m_i = 0 forall i, and km_j = 0 forall j, and p = // 1, we want to reach a state that represents a linear approximation of the // full cipher. // // The edges of this graph are going to be using some 1-round approximation, // instantiated at some round in the network. For instance, if we have the node // // x_0[0b101] + x_1[0b11] = k_1[0b110] (4) // // and we know the equation of the form (3): // x_{i+1}[0b1011] + (x_{i + 2} + x_i)[0b11] = k_{i + 1}[0b101] // // we could instantiate this at i = 1, to get // // x_2[b1011] + (x_3 + x_1)[0b11] = k_2[0b101] (5) // If we then xor this equation (5) with (4), we get // // x_0[0b101] + x_2[0b1011] + x_3[0b11] = k_1[0b110] + k_2[0b101] // // which is another node in our graph. In this way we explore the graph until we // reach a linear approximation of the full cipher. #include <array> #include <cstdint> #include <iostream> #include <set> #include <unordered_map> #include <cassert> #include <vector> #include <cmath> #include <optional> constexpr size_t NUM_ROUNDS = 10; // Shows a 64-bit bitmask, showing only the bit indices that are "on". std::ostream& show_mask(std::ostream& o, uint64_t m) { int i = 0; bool first = true; o << "["; while (m) { if (m & 1) { if (!first) { o << ", "; } else { first = false; } o << i; } m >>= 1; ++i; } o << "]"; return o; } // The meaning of this approximation is that with probability p = probability(), // x[alpha] + F(x, k)[beta] = k[gamma] // For any 32-bit string x, and 48-bit string k. // // The bias is defined as |probability() - 0.5|. struct OneRoundApproximation { const char* name; uint32_t alpha; uint32_t beta; uint64_t gamma; // Only need 48 bits. double log2_bias; // log_2(bias) double probability() const { double x = std::pow(2.0, log2_bias) + 0.5; return std::max(x, 1.0 - x); } friend auto operator<=>(const OneRoundApproximation&, const OneRoundApproximation&) = default; friend std::ostream& operator<<(std::ostream& o, const OneRoundApproximation& ra) { o << ra.name; return o; } }; // This is an approximation that relates some plaintext bits, some ciphertext // bits, some key bits, and possibly some hidden state bits, in a linear way, // using fixed bit masks. // // The first 2 states are the 2 plaintext words, the last 2 states are the 2 // ciphertext words, and every other state is a hidden state - it is the value // of a wire in the Feistel network. struct Approximation { std::array<uint32_t, NUM_ROUNDS + 2> state_mask; std::array<uint64_t, NUM_ROUNDS> round_key_mask; std::array<std::optional<OneRoundApproximation>, NUM_ROUNDS> applied_approximations; int round_number; friend auto operator<=>(const Approximation&, const Approximation&) = default; friend std::ostream& operator<<(std::ostream& o, const Approximation& a) { o << "round_number: " << a.round_number << std::endl; o << "state masks = ["; int cnt = 0; for (size_t i = 0; i < NUM_ROUNDS + 2; ++i) { if (!a.state_mask[i]) continue; if (cnt++ > 0) o << ", "; o << i << ": "; show_mask(o, a.state_mask[i]); } o << "]" << std::endl; cnt = 0; o << "key masks = ["; for (size_t i = 0; i < NUM_ROUNDS; ++i) { if (!a.round_key_mask[i]) continue; if (cnt++ > 0) o << ", "; o << i << ": "; show_mask(o, a.round_key_mask[i]); } o << "]" << std::endl; o << "applied approximations: ["; for (size_t i = 0; i < NUM_ROUNDS; ++i) { auto ma = a.applied_approximations[i]; if (ma == std::nullopt) { o << "-"; } else { o << *ma; } } o << "]" << std::endl; return o; } }; // This is an approximation that holds with a given probability. struct WeightedApproximation { Approximation a; double log2_bias; double probability() const { double x = std::pow(2.0, log2_bias) + 0.5; return std::max(x, 1.0 - x); } }; std::array<OneRoundApproximation, 5> one_round_approximations = {{ {"A", 0x8000, 0x21040080, 0x400000, std::log2(std::abs(12.0/64.0 - 0.5))}, {"B", 0xd8000000, 0x8000, 0x6c0000000000ULL, std::log2(std::abs(22.0/64.0 - 0.5))}, {"C", 0x20000000, 0x8000, 0x100000000000ULL, std::log2(std::abs(30.0/64.0 - 0.5))}, {"D", 0x8000, 0x1040080, 0x400000, std::log2(std::abs(42.0/64.0 - 0.5))}, {"E", 0x11000, 0x1040080, 0x880000, std::log2(std::abs(16.0/64.0 - 0.5))} }}; // Given an existing approximation, what happens if we xor a one-round // approximation onto it? Specifically, that one-round approximation will be // instantiated at round `position`. Approximation apply_one_round_approximation(const Approximation& a, const OneRoundApproximation& o, size_t position) { Approximation b = a; b.round_number = position + 1; b.state_mask[position] ^= o.beta; b.state_mask[position + 1] ^= o.alpha; b.state_mask[position + 2] ^= o.beta; b.round_key_mask[position] ^= o.gamma; b.applied_approximations[position] = o; return b; } // Returns how many hidden states have a nonzero mask in the given // approximation. size_t HiddenSize(Approximation& b) { size_t result = 0; for (size_t i = 2; i < NUM_ROUNDS; ++i) { result += b.state_mask[i] != 0; } return result; } std::vector<WeightedApproximation> Transitions( const WeightedApproximation& a, const OneRoundApproximation& o) { // For a given approximation of the first j rounds of the cipher, we can apply // the round approximation either at the current state in the Feistel network, // or the next one. That is, a one-round approximation of the form: // alpha * x_{i + 1} + beta * (x_{i+2} + x_i) = gamma * k_i // Can be xorred onto the current state, which is of the form: // // \sum_{k=0}^N state_mask_k * x_k + \sum_{k=0}^N key_mask * key_i = 0 // // Where the sum is the xor of every bit in the argument. // // This will make some state_masks zero, and some nonzero. We only want to // take the transition when the number of nonzero masks for hidden states is // at most 2, because this is all that's needed to make progress in the // Feistel networks we've seen (DES). A state is hidden when it's neither x_1, // x_2, x_{N-1}, or x_N. That is, when it's neither one of the two plaintext // words, nor one of the two ciphertext words. // This can be made faster and more space efficient, but I don't particularly // care for now. // We search for a round where to instantiate the one-round approximation. std::vector<WeightedApproximation> result; for (size_t i = a.a.round_number; i < NUM_ROUNDS; ++i) { Approximation b = apply_one_round_approximation(a.a, o, i); if (HiddenSize(b) > 2) continue; result.push_back( {.a = std::move(b), .log2_bias = 1 + a.log2_bias + o.log2_bias}); } return result; } std::ostream& operator<<(std::ostream& o, const WeightedApproximation& wa) { o << wa.a << std::endl; o << "Probability: " << wa.probability() << std::endl; o << "Log2(Bias): " << wa.log2_bias << std::endl; return o; } // Just a standard hash function to be able to put Approximations in a hash // table. template <class T> inline void hash_combine(std::size_t& seed, const T& v) { std::hash<T> hasher; seed ^= hasher(v) + 0x9e3779b9 + (seed << 6) + (seed >> 2); } size_t HashApproximation(const Approximation& a) { size_t h = 0; for (size_t i = 0; i < NUM_ROUNDS + 2; ++i) { hash_combine(h, a.state_mask[i]); } hash_combine(h, a.round_number); return h; }; int main() { // We'll use Dijkstra's algorithm to explore this graph. The weight of an edge // is the log2 of the bias of the one-round approximation it represents. auto compare_by_probability = [](const WeightedApproximation& a, const WeightedApproximation& b) { if (a.log2_bias > b.log2_bias) return true; if (a.a.round_key_mask < b.a.round_key_mask) return true; return HashApproximation(a.a) < HashApproximation(b.a); }; // This is the queue of nodes we still have to visit. std::set<WeightedApproximation, decltype(compare_by_probability)> queue; // This tells us which nodes have already been visited. Note in queue we store // the nodes with a weight, whereas this hash stores just the approximation // itself, without a probability. This is to be able to modify the estimated // weights of each approximation as we traverse the graph, as per Dijkstra's // algorithm. std::unordered_map<Approximation, decltype(queue.begin()), decltype(&HashApproximation)> seen(1, &HashApproximation); // Our initial approximation has no masks, and holds with probability 1, and // so its bias is 1 - 0.5 = 0.5.. WeightedApproximation wa = {.a = Approximation{}, .log2_bias = std::log2(0.5)}; auto it = queue.insert(wa).first; seen.emplace(wa.a, it); // We keep track of the best approximation we've seen so far. The only // approximations we'll care about are the ones that relate plaintexts, // ciphertexts, and key bits. For this, they must have a round number of // NUM_ROUNDS. This means `wa` is actually not a valid best approximation to // the full cipher, and will be overwritten the first time we find any valid // approximation. WeightedApproximation best_approximation = wa; while (!queue.empty()) { WeightedApproximation v = queue.extract(queue.begin()).value(); // We signal it's been popped off the queue. seen[v.a] = queue.end(); // If this is an approximation to the full cipher (i.e. all NUM_ROUNDS of // it), and does not involve any hidden states (i.e. only plaintexts, // ciphertexts, and key bits), then it's a good candidate for the best // linear approximation for the whole cipher. We'll keep the most likely // one, that is the one with the highest bias. if (v.a.round_number == NUM_ROUNDS && HiddenSize(v.a) == 0) { if (best_approximation.a.round_number == 0 || best_approximation.log2_bias < v.log2_bias) { best_approximation = v; } continue; } // We now traverse all edges outgoing from approximation`v`, by listing all // possible one-round approximations we could apply at this approximation. for (const auto& o : one_round_approximations) { for (WeightedApproximation w : Transitions(v, o)) { auto it = seen.find(w.a); if (it == seen.end()) { // If we've never seen this approximation before, add it to the queue, // and mark it as seen. This must be a new element in the queue, and // we assert so. auto [jt, inserted] = queue.insert(std::move(w)); assert(inserted); seen.emplace(jt->a, std::move(jt)); } else { // We've seen it before. If we've already popped it out of the queue, // no need to do anything, we already found the shortest weight path // to it, by the invariant of Dijkstra's algorithm.. if (it->second == queue.end()) continue; // Relax the edge if possible. if (it->second->log2_bias < w.log2_bias) { queue.erase(it->second); auto jt = queue.insert(w).first; it->second = jt; } } } } } std::cout << "Best approximation: " << std::endl << best_approximation << std::endl; }