Differential privacy - pagesweturned

### Differential privacy > Can we provide any guarantee in the absence of knowing external/auxiliary information? ![[attachments/Screenshot 2023-07-19 at 11.58.49 PM.png]] The queries are sent to a trusted curated who forwards the queries to the database. The curator adds some noise to the response and returns to the user. - Private DB $D$ with Alice’s information - Private DB $D’$ without Alice’s information - $D$ and $D'$ differ in a single row - DP scheme $K$ is defined as, - With $K$, let $P[K(D) \space in \space S]$ be the probability of output of $K$ being in $S$ - DP says that such probability does not change significantly between $D$ and $D_1$ $𝑃[𝐾 𝐷 \space 𝑖𝑛 \space 𝑆] 𝑃[𝐾 𝐷’ \space 𝑖𝑛 \space 𝑆] ≤ 𝑒^𝜀 ∼ 1 ± 𝜀 \space for \space small \space 𝜀$![[attachments/Screenshot 2023-07-20 at 12.04.52 AM.png]] > [!tip] The guarantee says, whether Alice participates in the study or not, the probability that the user has her information remains the same. ## Implementation ### Example: Running red lights A survey wants to assess how many people run a red light. Participants will be concerned if law enforcement finds out if they run red lights. We need to find a way to collect correct responses from participants while guaranteeing privacy. - Assume survey participants answer question correctly because of DP guarantee - Curator protocol ```Python # on receiving R from DB, flip a coin coin_result = flip_coin() if coin_result == "tail": return R else: coin_result_2 = flip_coin() if coin_result_2 == "tail": return "NO" else: return "YES" ``` #### DP protocol utility Let - $N =$ total number of queries (one for each user) - $n =$ number of queries returning yes - $p =$ fraction of people breaking the law $n = N*p*3/4 + (1-p)*N*1/4$ $p = 2*(n/N) - 1/2$ If we make sufficient number of queries, we can estimate $p$. #### DP protocol privacy - If DB has yes for Alice, ¾ probability that answer is yes - If DB has no for Alice, ¼ probability that answer is yes > This creates Plausible Deniability $\frac{𝑃 [𝐶𝑢𝑟𝑎𝑡𝑜𝑟 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆 \space | \space 𝑃𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆]} {𝑃 [𝐶𝑢𝑟𝑎𝑡𝑜𝑟 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆 \space |\space 𝑃𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑁𝑂]} = \frac{3/4}{1/4} = 3 $ $ε = ㏑ 3$ Typically, curator adds some noise to response $R$ to return $R'$. This needs to be done carefully so that noise does not easy get cancelled out after multiple queries. *Gaussian* or *Laplace* mechanisms are used to derive noise from these distributions. #### Local differential privacy Users can locally perturb their data and share it. In this case, we don’t need to trust a curator. > [!example] $ε$ can also be called as the privacy budget. We consider small values to be good here. $ε = 0$ is perfect, because here the probability that privacy is breached will be $0$.