Homomorphic Statistics
The purpose of this tutorial is to calculate the mean and variance for all the data using approximately 10 million encrypted records. This will enable more accurate and reliable data analysis. Additionally, by using Liberate.FHE as in this tutorial, you can confirm that it is possible to implement Homomorphic operations with very simple code.
In this tutorial, you can find the complete codes for mean and variance as follows:
Setup
Import necessary packages
Import the package you want to use. fhe
is an essential package from the Liberate Library for homomorphic encryption. fhe.presets
is a package that provides convenience for importing pre-configured homomorphic parameters.
Generate CKKS engine
To create the CKKS engine, we utilize pre-configured parameters. For this tutorial, the engine has been created using the gold
parameter. In the case of gold
, the value of logN
is 16, there are 4 special primes
, and a total of 34 levels are available.3.
Generate encryption keys
We will generate the keys to be used in the computation. The keys used in this tutorial are as follows.
sk
: The secret key is a confidential piece of information that is kept secret by the data owner or a trusted entity. The secret key is typically used for decrypting data. In homomorphic encryption, it is used to decrypt the results of computations performed on encrypted data. The secrecy of the secret key is crucial for the security of the encrypted data. If compromised, an attacker could decrypt the results and potentially gain access to sensitive information.pk
: The public key is a component of the homomorphic encryption system that is openly shared. The public key is used for encrypting data. While it can be used to encrypt data, it cannot be used to decrypt the results of computations performed on that data. Unlike the secret key, the public key is shared openly and does not need to be protected in the same way. Its security is not compromised even if it is known to potential adversaries.evk
: The evaluation key is a cryptographic component used in homomorphic encryption schemes that support relinearization. It is used during the relinearization process to transform ciphertexts, typically after multiplication operations. While not as sensitive as the secret key, the relinearization key may still require protection, and its use is generally limited to trusted entities.gk
: The Galois key is a cryptographic component used in certain homomorphic encryption schemes to support specific operations, particularly rotations and linear transformations in the Galois field. The Galois key enables the efficient execution of mathematical operations on encrypted data while preserving the confidentiality of the underlying information. While the Galois key is an important component, it is not as sensitive as the secret key in homomorphic encryption.
Generate data and encrypt data
In this tutorial, we will calculate the average and variance of approximately 10 million data points. These 10 million data points will be generated assuming they represent the ages of the population in a specific city. Therefore, we assume the age ranges from 0 years old to 99 years old and generate them using a random function. Additionally, we will trim the data slightly to match the number of security parameters we have set, rather than exactly 10 million. Since we have set logN
to 16, the number of available slots is 32,768. Hence, the number of ciphertexts we will use is approximately round(10,000,000/32,768) = 305. We will then proceed with encoding and encrypting using the encrypt function. With the data prepared, we can now begin by calculating the mean (average).
Mean
The code for calculating mean in Liberate.FHE is as follows.
Calculate the total of all the values in the dataset.
To calculate the average, divide the sum by the number of values .
Therefore, we use the add
function to calculate the sum of all ciphertexts.
To calculate the average of all encrypted texts, use the mean provided by Liberate.FHE.
Variance
Now, we can use the average value obtained in this way to calculate the variance.
The formula to calculate dispersion is as follows.
Calculate the average of the data : Utilize the previously computed average.
Subtract the average from each data point.
Calculate the sum of all squared deviations.
Divide the sum of squared deviation by the number of values.
To calculate the difference between each data point and the previously calculated average, we can use the sub
function from Liberate.FHE. Then, we can calculate the square of the differences using the square
function. By summing up the squared differences and using the mean
function, we can obtain the variance.