Key-Derivation

基於記憶密碼的密鑰派生函式?

  • August 7, 2021

如何為基於密碼的密鑰派生設計記憶體硬函式?防止來自並行機器的蠻力攻擊。

什麼樣的設計可以與 Skein 配合得很好?Skein論文(pdf) 有一個時間硬 PBKDF(基本上散列密碼重複很多次),但不是一個記憶硬的。

記憶硬函式被設計成內部計算依賴於相對較大的狀態。這些函式不應具有允許對手在不立即使用狀態的情況下計算結果的快捷方式(至少不會導致非常高的成本)。這樣就不可能開發出不需要大量記憶體的快速硬體。

PBKDF 中沒有針對記憶體硬功能的一種特定設計。您可以查看記憶體硬 PBKDF 的各種現有設計(scrypt、Argon2),以獲取記憶體硬功能的範例。


讓我們定義 3 種不同的方法,從低級到高級:

  1. 評論中特別提到了 Catena,因為它只需要一個通用的雜湊方法來執行。如果可以使用任何安全散列函式,那麼 Skein 應該是一個安全的選擇。
  2. KMAC 現在已被 NIST 定義為安全的 HMAC 替代品。因此,您可以將 scrypt 與 PBKDF2 一起使用,其中 HMAC 被 KMAC 替換。
  3. scrypt 使用 PBKDF2 作為底層 CPU 硬函式。只需將 scrypt 中的 PBKDF2 函式替換為另一個 CPU 硬函式,例如 Skein 的 PBKDF。

這些是提示,提供安全證明留給讀者。

目前,單個 GPU 集群可以突破近 10000 億( $ 10^{10} $ ) 每秒的密碼/雜湊值,想像一下狀態攻擊者或黑客團隊可以用硬體 ASIC、超級電腦或殭屍網路做些什麼。基本上,16-17 個字元的密碼( $ 2^{80} $ 熵)可以在幾小時或幾天內用蠻力打破,我們甚至還沒有進入量子時代。

事情變得更糟了,因為散列通常是在單個 CPU 上生成的,而世界 CPU 數量正在增加,因此攻擊者的可用性。

對此有兩種解釋:

  1. 開發人員生成密碼雜湊的速度太快且沒有加鹽,因此當黑客恢復雜湊數據庫時,他們可以通過蠻力恢復密碼。或者他們嘗試生成具有低熵輸入的散列。
  2. GPU 使用許多小核心來執行此任務,如果每個核心都為算法使用大量記憶體,它們很快就會耗盡記憶體。

本地加密過程的弱點通常不是算法,而是由密碼生成的雜湊密鑰,如果一個人能記住 15 個字元,那麼攻擊者寧願嘗試這個低熵輸入的所有變體,而不是破解加密的 256 位隨機密鑰數據。因此**,如果加密密鑰生成緩慢,系統受到暴力攻擊**的機會就會減少。

有幾個已知的此類拉伸函式,例如 scrypt,您可以在下面找到一個簡單範例,說明如何製作此類函式以避免大規模並行計算攻擊。

通過:gist.github.com/anonymous/c0edbaf71e0be3d07db3(創建於 2015 年 10 月 22 日)

// HKF - Memory Hard Key Derivation Function v 1.4
// Adrian Pirvu - October 14, 2015
//
// A memory hard key derivation function without useless overhead
// 3.4s with 1G memory and 1 million rounds on first generation i7 CPU
// INPUT: password - up to 256 bytes; salt - no limitation;  megabytes: memory you intend to use; rounds: memory pickup rounds;
// OUTPUT: stretched key hash (current 256 bytes) - you can either simple hash to less this value or use part of it.
// Arc4 used as a one way random generator, but any stream cipher can be used
// No need to generate memory diffusion like others functions, I only make sure that the final hash that I get is irreversible and not biased
// No side attacks considered, if the attacker has access to your RAM then an activity tracking RAT already took your password or screenshots of your mouse clicks
// Tags: fast memory hard function, password hashing, memory hard key stretching, memory hard key strengthening, scrypt, bcrypt, sequential memory hard function
// Looking for feedback and some possible GPUs cluster tests. If anyone is interested in further development or a possible public project, please let me know.



#include "stdafx.h"
//#include <string.h>
//#include <time.h>
//#include <windows.h>


extern "C"
{

   unsigned int i = 0, j = 0, m = 0, z = 0;
   unsigned char s[256], k[256];

   void RngInit(unsigned char* key, int keyLength, unsigned char* salt, int saltLength);
   unsigned char RngNextByte();
   unsigned int RngNextInt();


   //unsigned int TEST_MEGABYTES = 1024;
   //unsigned int TEST_ROUNDS = 1000000;


   // return a hashable strengthen key of desired length from password and salt
   __declspec(dllexport) void GetHashKey(unsigned char* password, int passwordLength, unsigned char* salt, int saltLength, int megabytes, int rounds, unsigned char* output)
   {
       // dinamically allocates memory 4 * 1024 * megabytes
       int ROWS = 4 * 1024 * megabytes; // each row is 256 bytes (256 * 4 * 1024 = 1k * 1024 = 1M)
       int COLUMNS = 64; // * 4 bytes per int = 256 bytes
       unsigned int* box = new unsigned int[ROWS * COLUMNS];
       unsigned int tmp = 0;

       // init the random engine and get a temporary key
       RngInit(password, passwordLength, salt, saltLength);
       unsigned int key[64];
       for (int col = 0; col < 64; col++)
           key[col] = RngNextInt();

       // reinit the engine for safety and get a start key and the session salt
       RngInit((unsigned char*) key, 256, salt, saltLength);
       unsigned int pepper[64];
       for (int col = 0; col < 64; col++)
       {
           key[col] = RngNextInt();
           pepper[col] = RngNextInt();
           box[col] = key[col]; // first row is the start key
       }

       // fills the memory fast, with a bit of diffusion (we really don't need much)
       unsigned int random = key[34] * key[56] + key[62];
       for (int row = 1; row < ROWS; row++)
       {
           // breaks the row random and switch parts
           int index = box[(row - 1) * COLUMNS + 56] & 0x000000FF;
           memcpy((unsigned char*) &box[row * COLUMNS] + index, (unsigned char*) &box [(row - 1) * COLUMNS], 256 - index);
           memcpy((unsigned char*) &box[row * COLUMNS], (unsigned char*) &box [(row - 1) * COLUMNS] + 256 - index, index);

           // xor each number with the session salt and pseudorandom number
           for (int col = 0; col < 64; col++)
           {
               random = (2147483629 * random + 2147483587); // maybe something with longer period?
               box[row * COLUMNS + col] = (box[row * COLUMNS + col] + pepper[col]) ^ random;
           }
       }

       // start with the last row to avoid precalculations and algo redesign
       for (int col = 0; col < 64; col++)
           key[col] = box[(ROWS - 1) * COLUMNS + col];


       //clock_t start = clock();

       // intensively get random integers from memory and mix them into the pseudorandom key
       for (int rnd = 0; rnd < rounds; rnd++)
           for (int col = 0; col < 64; col++)
           {
               i = (i + 1) % 256;
               j = (j + s[i]) % 256;
               tmp = s[i];
               s[i] = s[j];
               s[j] = tmp;
               unsigned int sum1 = (s[i] + s[j]);
               unsigned int column = sum1 % 64;
               unsigned int row = (key[col] + sum1 + i + j) % ROWS;
               sum1 = sum1 % 256;

               i = (i + 1) % 256;
               j = (j + s[i]) % 256;
               tmp = s[i];
               s[i] = s[j];
               s[j] = tmp;
               unsigned int sum2 = (s[i] + s[j]) % 256;

               unsigned int total = (sum1 << (16 + (sum2 % 8))) + (sum2 << (sum1 % 8));
               key[col] = (key[col] + total) ^ (box [row * COLUMNS + column] + pepper[col]);
               //key[col] += RngNextInt() ^ (box [row * COLUMNS + column] + key[col]);
           }

       //clock_t diff = clock() - start;
       //char message[1000];
       //float seconds = ((float) diff) / CLOCKS_PER_SEC;
       //sprintf(message, "Seconds: %0.3f\r\n\0", seconds);
       //OutputDebugStringA(message);


       // build output
       for (int col = 0; col < 256; col++)
           output[col] = RngNextByte() ^ ((unsigned char*) key)[col];

       // cleanup
       for (int col = 0; col < COLUMNS; col++)
       {
           box[(ROWS - 1) * COLUMNS + col] ^= box[(ROWS - 1)* COLUMNS + col];
           key[col] ^= key[col];
           pepper[col] ^= pepper[col];
       }
       ROWS = 0;
       COLUMNS = 0;

       // should overwrite memory
       delete[] box;
   }



   // init RNG engine 
   void RngInit(unsigned char* key, int keyLength, unsigned char* salt, int saltLength)
   {
       unsigned char tmp = 0;

       i = j = 0;
       for (i = 0; i < 256; i++)
       {
           s[i] = i;
           k[i] = key[i % keyLength];
       }

       j = 0;
       for (i = 0; i < 256; i++)
       {
           j = (j + s[i] + k[i]) % 256;
           tmp = s[i];
           s[i] = s[j];
           s[j] = tmp ;
       }

       // discard start to bypass key scheduler flaws, also mix the salt
       i = j = m = z = 0;
       for (int idx = 0; idx < 3000 + k[251] + saltLength; idx++)
       {
           i = (i + 1) % 256;
           if (saltLength > 0)
               j = (m + s[(j + s[i]) % 256] + salt[idx % saltLength]) % 256;
           else
               j = (m + s[(j + s[i]) % 256]) % 256;
           m = (m + i + s[j]) % 256;
           tmp = s[i];
           s[i] = s[j];
           s[j] = tmp;
           z = s[(j + s[(i + s[(z + m) % 256]) % 256]) % 256];
       }
   }



   // obtain a random byte from Rng engine
   unsigned char RngNextByte()
   {
       unsigned char tmp = 0;

       // use RC4, is faster
       i = (i + 1) % 256;
       j = (j + s[i]) % 256;
       tmp = s[i];
       s[i] = s[j];
       s[j] = tmp;
       return (s[i] + s[j]) % 256;

       //i = (i + 1) % 256;
       //j = (m + s[(j + s[i]) % 256]) % 256;
       //m = (m + i + s[j]) % 256;
       //tmp = s[i];
       //s[i] = s[j];
       //s[j] = tmp;
       //z = s[(j + s[(i + s[(z + m) % 256]) % 256]) % 256];
       //return z;
   }



   // obtain a random int from Rng engine
   unsigned int RngNextInt()
   {
       unsigned char tmp = 0;
       unsigned char output[4];
       for (int idx = 0; idx < sizeof(int); idx++)
           output[idx] = RngNextByte();

       return *((unsigned int*) output);
   }



   //// main function, for test purposes
   //int main(int argc, char * argv[])
   //{
   //  const char* password = "monkey";
   //  const char* salt = "superman";
   //  
   //  unsigned char output[256];
   //  GetHashKey((unsigned char*) password, strlen(password), (unsigned char*) salt, strlen(salt), TEST_MEGABYTES, TEST_ROUNDS, output);

   //  //FILE* file = fopen("d:\\key.dat", "wb");
   //  //fwrite((void*) key, 1, 100000000, file);
   //  //fflush(file);
   //  //fclose(file);
   //}

}

引用自:https://crypto.stackexchange.com/questions/27363