### Differential privacy
> Can we provide any guarantee in the absence of knowing external/auxiliary information?
![[attachments/Screenshot 2023-07-19 at 11.58.49 PM.png]]
The queries are sent to a trusted curated who forwards the queries to the database. The curator adds some noise to the response and returns to the user.
- Private DB $D$ with Alice’s information
- Private DB $D’$ without Alice’s information
- $D$ and $D'$ differ in a single row
- DP scheme $K$ is defined as,
- With $K$, let $P[K(D) \space in \space S]$ be the probability of output of $K$ being in $S$
- DP says that such probability does not change significantly between $D$ and $D_1$
$𝑃[𝐾 𝐷 \space 𝑖𝑛 \space 𝑆] 𝑃[𝐾 𝐷’ \space 𝑖𝑛 \space 𝑆] ≤ 𝑒^𝜀 ∼ 1 ± 𝜀 \space for \space small \space 𝜀$![[attachments/Screenshot 2023-07-20 at 12.04.52 AM.png]]
> [!tip] The guarantee says, whether Alice participates in the study or not, the probability that the user has her information remains the same.
## Implementation
### Example: Running red lights
A survey wants to assess how many people run a red light. Participants will be concerned if law enforcement finds out if they run red lights. We need to find a way to collect correct responses from participants while guaranteeing privacy.
- Assume survey participants answer question correctly because of DP guarantee
- Curator protocol
```Python
# on receiving R from DB, flip a coin
coin_result = flip_coin()
if coin_result == "tail":
return R
else:
coin_result_2 = flip_coin()
if coin_result_2 == "tail":
return "NO"
else:
return "YES"
```
#### DP protocol utility
Let
- $N =$ total number of queries (one for each user)
- $n =$ number of queries returning yes
- $p =$ fraction of people breaking the law
$n = N*p*3/4 + (1-p)*N*1/4$
$p = 2*(n/N) - 1/2$
If we make sufficient number of queries, we can estimate $p$.
#### DP protocol privacy
- If DB has yes for Alice, ¾ probability that answer is yes
- If DB has no for Alice, ¼ probability that answer is yes
> This creates Plausible Deniability
$\frac{𝑃 [𝐶𝑢𝑟𝑎𝑡𝑜𝑟 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆 \space | \space 𝑃𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆]}
{𝑃 [𝐶𝑢𝑟𝑎𝑡𝑜𝑟 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑌𝐸𝑆 \space |\space 𝑃𝑎𝑟𝑡𝑖𝑐𝑖𝑝𝑎𝑛𝑡 \space 𝑟𝑒𝑠𝑝𝑜𝑛𝑠𝑒 \space 𝑖𝑠 \space 𝑁𝑂]} = \frac{3/4}{1/4} = 3 $
$ε = ㏑ 3$
Typically, curator adds some noise to response $R$ to return $R'$. This needs to be done carefully so that noise does not easy get cancelled out after multiple queries. *Gaussian* or *Laplace* mechanisms are used to derive noise from these distributions.
#### Local differential privacy
Users can locally perturb their data and share it. In this case, we don’t need to trust a curator.
> [!example] $ε$ can also be called as the privacy budget. We consider small values to be good here. $ε = 0$ is perfect, because here the probability that privacy is breached will be $0$.