Automating Privacy Reviews with LLMs: Challenges & Opportunities | #BridgePrivacySummit

By Privado AI

Summary

## Key takeaways - **LLMs Match Junior Reviewers on False Negatives**: LLMs fall short in terms of false negatives, which are projects containing privacy risks that the LLM does not detect, but the number is quite low and comparable to that of a junior human reviewer. [07:45], [08:17] - **Summarization Cuts Review Workload 85%**: LLMs as summarization tools reduce the workload of a human reviewer by around 85% when measured as the number of words to be reviewed. [08:49], [09:00] - **Poor Docs Cause LLM Hallucinations**: LLMs tend to highlight unlikely risks because they misinterpret information due to insufficient documentation, and if information is ambiguous or incomplete, LLMs make assumptions based on their training and hallucinate. [09:10], [09:43] - **Code Scanning + LLMs Hits 95% Accuracy**: The code scanning engine creates a graph representation, and LLMs guide to find collection, storage, and sharing paths with very high accuracy, achieving 95% accuracy on data flow results by filtering false positives. [11:53], [12:58] - **On-Prem Deployment Limits Model Size**: Customers not wanting to share source code requires deploying engines to their own environment on a single GPU machine with max 10,000/year cost using small fine-tuned models. [13:09], [13:42] - **Context-Dependent Risk Shapes LLM Use**: LLM use like junior analyst performance is context dependent; appropriate in some cases but not when regulators are investigating a product. [18:06], [18:39]

Topics Covered

LLMs Match Junior Reviewers on False Negatives
LLMs Cut Review Workload 85% Despite False Positives
LLMs Boost Code Scanning Accuracy to 95%
Team Up with Existing AI Initiatives for Privacy
Automation Inevitable as Privacy Scales with AI

Full Transcript

[Music] so welcome everyone to our panel on automating privacy complies with llms um my name is angin bazak I lead privacy and security architecture at Uber so we

all know privacy engineering is hitting a Tipping Point teams are over stretch regulations are evolving and our systems are getting more complex so large

language models such as deep seek they're you know being hailed sometimes as the next B thing maybe the Magic Bullet that will automate the Privacy compliance for us but is it really true

or are we just adding another layer of complexity without actually solving the underlying issues so we'll dive into successes pitfalls and and you know

experiences around using llms in privacy engineering I have a great panel here we're very fortunate to have three experts um a b2c social media company

someone from a B2B Enterprise with handling a lot of data and a a very inovative vendor experience experimenting with llm so uh let's see

how they approached llms I'll let them introduce themselves let's start with uh iOS iOS you want to introduce yourself hi again thank you my name is IOS I'm

Zing manager at snap for the Privacy by Design Team I've been with the team for like a little bit over five years prior to that I was a Duke working on differential privacy so I'm very happy

to to hear and I have to say that the opinions I will share with everyone are my own and they do not express the ones of my employer yeah thank you for the introduction Stefano

you want to go next yes thank you um I'm Stefano I'm privacy architect at here Technologies and here is a busino business company that builds maps and

offers services on location data for use cases like Logistics electric vehicles autonomous driving and and many others um also I would say that today I not

speak on behalf of here I will offer my personal opinions only and so as in at here I'm part of the privacy and responsible AI team working on technical aspects mostly in particular I've been

experimenting with using llms to automate different kinds of compliant tasks and one of these experiments that I'm going to talk about today is about using LMS as assistance to help with

technical privacy reviews where we work with teams and ensure products architectures and B privacy best practices thank you Stefano and finally we have Ur

from Privado hey everyone so excited to be here so my background is a BL of program analysis privacy and AI I spent about eight years in grad school working

on AI powered um SC code scanning tools and after that I jumped into the world of privacy at Amazon where I spent about four years for building automated

privacy Auditors for Alexa AI models and now I'm at pado leading the AI research and development projects and at pad I get a unique opportunity to leverage my

academic research and Industry experience and putting it simply at P llms are llms augment the Privacy code scanning engine to guide the search

space to achieve higher cacy and to improve the quality of the results as we aim at finding answers to critical questions like uh what personal data is

being collected by my software where is this personal data being stored with whom this personal data being shared and does any of this personal data processing activity violate my company

policies or any Global regulations all right thank you um so let let's start exploring real world applications I'm interested uh or

panelists their experiences in using llms in privacy engineering where do they Thrive or H in your opinion have the potential and where do they fall

short um let's start with EOS EOS I know you you know taking a look as a first pass review tool or summarization mainly on uh privacy reviews tell us about your

experiences please yeah like I'm thinking like the like the space of privacy by design for example right you have like this privacy review game where

on the one hand you have your your customers which is like PMS Engineers right they're trying to create a product and there's a privac team lawyers privac engineers and so on that try to we're

all trying to do what's best for the organization right like create like a safe experience for the users and create products right so in cases

like the way I see llms in places where the organization the engineers like the customers the Privacy customers they don't really understand this privacy game they could come up with you and

create a spec for you that you review that it's like you know doesn't have the details so there's going to be a lot of back and forth figuring out the prives details so you can proceed with with your privac analysis

right and especially for more Junior people are like they could submit like a 25 page and in design document then you know how do you review that right so

that is one space right like hey can you use this amazing technology that assists privacy customers in how they can improve their own experience right like

as a completion tool right as a oh these are the things that the Privacy reviewer would be looking for right so that is one and second anding like the very first thing that you said introducing

the panel is like how do you scale this business right if you have like hundreds of reviews per month right to to conduct and how how do you deal with the low

complexity ones do you have like a lawyer and a private engineer looking at every little logging request or do you have like a three tool that could be

powered from llm right h so these are like the two you know high level ideas where I can see these Technologies really helping our space and where have you seen

limitations where you think you know summarization whether helping with triaging things that they're not there

yet so I don't know like like so here's the thing right speaking of limitations it's like if if you're going to use it as as a as a completion tool

right is it really helping the customer right is it improving their experience and this is something that you would have to test you would have to interview your customers to see like is it really helping them or they just annoyed and

just want to speak to a person right that is one and on the other side limitation you know triing what are you overlooking right

what what are the potential privacy risks that this this m could could could expose you to and it could be it could be okay you know it could be privacy

risk that the organization as a whole um is comfortable making right so for these two applications these are like the two things that we have to be a little bit more careful

with um Stefano I know you also worked on um using lolms in privacy reviews um you know you run experiments with

classification and summarization can you give us a highlight of what was very promising what for well and what still needs for the work yeah um as you said

we were also trying to see how to use these llms to scale up our pry R views so we tested a couple of things we tested LMS as classifiers to identify which projects include privacy risks and

are in need of in-depth review we measure different metrics uh the most relevant are the false negatives false negatives are projects that contain a privacy risk that the LM does not detect

so these missing detections are problematic because they can lead to products that contain privacy risks we also measured FSE positives which are privacy risk that the model identifies

but are not existing or not relevant this is less of a problem but it does not add privacy risk to the organization but it creates additional work for the human reviewer um so during this test we found

that llms fall short in terms of false negatives the good news is that the number of false negatives is quite low and it's comparable to that of a junior human reviewer um what we typically do

with Junior reviewers is we them up with experts and having more smes review the same document ensures that all issues are identified would be interesting to test if having different llms reach

consensus would also similarly reduce the number of false negatives we also tried llms as summarization tools where the modelist has to summarize the document and

highlight privacy risks um we found that the llms reduces the workload of a human reviewer by around 85% when measured as the number of words to be reviewed

um there is still a high false positive ratio but probably improvements in LM reasonings could reduce that further and when we were looking into these false positives we noticed a

couple of things um LMS tend to highlight unlikely risks because they Mis interpret information uh for example they may highlight a risk that users can provide personal data as input to a

product when in fact this product is not exposed to and users um this is typically caused by insufficient documentations we know that teams are short on time they tend to be concise and omit wellknown

information um but the same information may not be well known to the llm and if information is ambiguous or incomplete LMS tend to make assumptions based on their training and that means they

hallucinate or produce errors so I think documentation quality is the main challenge with any review whether it's done by human or by an AI and to address this we built a knowledge base that

contains the most used acronyms description of data sources description of products and so on another cause of false positive were visuals um Architects prefer

visualizations and graphs to explain flows and complex Concepts and so it's crucial to provide this information to the llm and we tested multimodel llms but at the time of testing they were not

accurate at understanding the visuals uh we tried creating textual descriptions from the diagram sources but we found it difficult to support all various formats and syntax that are out there so if I

understand you correctly uh um that document quality has a big effect on the outcome that llms try to make a judgment

even though the information is lacking um which might might lead to a lot of false positives um false negative rat meaning llms missing risk is low or

comparable to a junior uh expert um and visuals is is a difficult problem okay this is very interesting um but stepping out from the review side let's hear from

from or on um other use cases I know or you've been working on COD scanning using llms interpreting results also work with scanning DPA so can you tell

us more about that of course yeah sure so yeah we do have uh maybe some kind of unique use case PR we do work with the source code and sometimes we do work

with the uh Network traffic data as well uh how we to our experience llm J in general works the best when it's complementary to the main approach uh

and the main approach in this case is the cot scanning approach uh this cot scanning engine creates a reliable representation of the software which is

a graph and then AI comes into uh find which parts of this graphs are interesting right so which part of this graph uh is collecting the personal data

and where is the sto storage events happen in the graph and what what are the pths from the collection points to the third party sharing notes so in all

of these tasks llms are very effective with very high um accuracy results indeed so this helps us to improve the coverage uh but in addition to this uh

we do get some results directly from scanning engine uh for example the data flows that go into the third parties or or to to the log piles we can have false positives in those coming from C

scanning because of some approximations being made in the Cod scanning algorithm however AI is very effective in telling hey there is a incorrect uh overflow

happening here because of that you are showing a data flow which actually is not true and AI is helping us to achieveable 95% accuracy with our data

flow results there uh the the shortcomings are mostly about costs and data sharing concerns so yes we do work with customers data and of it's often

the case the customers don't want to share their source code with us in those cases we have to deploy all of our engines to their own pyam environment and that actually limits how big we can

go so uh we have to do all of this with a relatively small model that can be deployed on a single GPU machine uh and ideally not with all of cost right so we

want to keep the cost minimum to uh maximum 10,000 year with a single GPU machine uh and being able to achieve very good results um with fine-tuned

models uh that are uh like tailored for specific use cases and maybe another challenge is customizability uh we sometimes see for example uh customers

uh personal data that uh that a customer thinks is very critical is not that critical for another customer or a customizability in terms of regulations

as well a us uh govern company might have different uh policies um different priorities compared to a company that operates in the EU and AI

becomes challenging to generalize in those cases and DP is an example of that uh data processing addendum is there's a template uh provided by The gdpr

Regulators however that template is rarely followed and that makes extracting the important information from that document is very challenging um and even even you when you achieve

the good results like you extract all the information uh some companies say for example okay this is not very interesting for us what we want to do is actually can you tell me the security measures that are mentioned in the

document so or the hosting locations mentioned in the document and the document has no mention of that so these are the challenges when it comes to applying llms in uh in these contextes

so if I hear correctly lots of uh promising work to support uh human assessment especially in code scanning

piece but um it's essentially a the dma that you know if you want to do on Prem with with you know more more privacy you would need investment in the hardware

versus you could utilize something at A lender but then you have to think privacy and security um so yeah security confidentiality maybe more because yeah this is the company's source code for

some uh probably uh intellectual property right IP data it could be considered that way so yes we definitely do not share for example any customer data with part all the AI models that we

develop are internal they stay in our Cloud but again for some customers even this is not okay they don't want to share any data even with us that restricts how big we can go we have to deploy everything has to be Deployable

to their own PRS basically very interesting so I think we discussed you know Cod scans DPA scanning privacy reviews summarization

lots of different use cases um I want to talk about some practical considerations for adoption so many companies have minimal privacy engineering teams if any

um should they even be considering llms at this stage um and if they do what points should they really uh prioritize um let's start with Stefano Stefano what

do you think yeah um I think that most priv teams I know are stretched in and and every little time for Innovation and and in this case I think that teaming up

with teams is a good idea uh for example the there may be teams in your organization that are experimenting with llms for other purposes for example a compliance team that is building a FAQ

chatbot or a development team that is working on automat autom automatic documentation uh or security team building some automated security testing

and so on um these teams have already completed the groundw to justify uh investment in building AI tools for their use cases and they have also gained support of their leadership and

this experience can help you as well demonstrate value for your own use case and uh for your with your own leadership and also from a technical point of view reusing and combining parts of these

systems can speed up building and testing a privacy focused llm especially if your team lacks require technical skills there is one critical asset that is full responsibility of the Privacy

team though which is building a data set for testing uh for example a collection of old past privacy decisions this can be time consuming as M addition

so essentially you know kind of look into your organization who is doing what reuse forces join forces and also get the support of leadership interesting

EOS what do you think the last part about the support of leadership I think is like a critical thing like any application of of of any technology real really needs to be

context dependent right so what is the up what is the it for risk and so on so forth right for example stepan earlier said that oh this is as good as a junior

privacy analyst right okay great so in certain context this might be like an appropriate thing thing to do for example in a privacy review in some other context maybe the regulator is already investigating you on a

particular product and maybe you don't want to make a claim have our brightest Junior analysts on it right so it is very context dependent so that is the first pillar I think

as as a practical consideration the second and I think we would touched on it a little bit earlier about confidentiality is like how do you deploy this thing right are you going to

do it in-house right where you need to build the infra or maybe you have the INF but then you have to agree on the ROI right or are you going to use a vendor and then what is the business

agreement with the vendor because you're sending them in this case for the privacy of your game you don't send them your code but you do send them Your Design documents your privacy your past

privacy decisions and so on so forth right is this what is the appropriate Choice here again this is a decision that engineering leadership and

organization leadership has to make right and the last um the last thing I have in my mind about how how you how we think about you know changing the Status

Quo to enable scaling is like when we measure success for an automation to for an Lon in this case how do you measure exactly because if

you say I'm going to measure time to complete a review right of course it's going to be a success right um but is that what you should be

measuring should we celebrate that or should we be mindful of hey what is a false positive rate what is a false negative rate which is more

important because I I think I think or would you touch on it like we shouldn't have like say like this and accountabil out of the window the machine should be

complimentary to the main practice and to the mission of what we're trying to compx right H so I think like these three things would be like as as as we adopt

the technology these three things should be in our mind as like hey in practice we're going to have to make decisions in these questions so you have to think

about the Privacy risk you have to think about your metric what are you actually measuring and also automation shouldn't be a substitute for accountability all all

great points and or um I think you know as a when they're serving the industry different players you must hear you know

sometimes AI just doesn't work um for us what do you say to that yeah I think uh that is probably uh

a a misconception I wish I would say but before coming to that I want I want to touch upon the the size of the team again I'm inclined to say smaller the

Privacy team the stronger the reason to use AI to achieve privacy governance uh but this decision also should align with the company's General AI approach if for

example the engineers are able to use co-pilot for development and the AI team doesn't use uh AI to sorry the Privacy team doesn't use AI for governance that's a losing game you will never

catch up with what Engineers are are proding the products will will evolve in TX and you will try to govern them with your current

practices and and coming to the the uh AI pitfalls uh I have I'm when I Sur my uh non-technical friends working in the

Privacy field what I usually hear is oh it didn't work it doesn't do the job uh but I think the misconception is uh we are expecting the AI to do to do our job

it's not going to work AI will not take our a will not do our jobs for us instead we should be using AI to uh become more productive in in privacy F

right so uh we should be using AI with the right context uh to solve the problems uh that we know we can solve ourselves so if you don't know how to solve the problem or we don't provide

the sufficient context to the AI to solve the problem then yeah we are going to get crappy results but when integrated into the existing systems with the right context to solve a very

well defined problem with sufficient examples yeah will do very good job I'm pretty confident on that so let's um try to think about the

future um I mean or you said you know this is a good uh tool to support um and you you know you don't want to fall behind especially all of your engineers

are are utilizing it um but let's let's look into the future five years from now on um you know nobody can predict the future but we can always have guesses um

is it realistic to think that in five years most of the Privacy reviews um will be automated using llms or do you think we will still be debating the same

trade-offs and limitations that we are discussing today so iOS you w to uh start yeah like I'm not I'm not a gambling person but if I have debate like automation is the way forward like

100% there's no question in my mind right we've seen this in computer science time and time again whether it's writing um mine code to writing assembly to writing high level code and so on so

forth right and I can't see a world where the most repetitive tasks are done from are conducted by the most Brilliant Minds

right and there was a question on about it we should leverage this Machin to create deep cycles for us to so that we're able to tackle the most gnarly

things right whether it's a lawyer drafting a the company policy for like a new regulation or is it us creating like differential privacy or this or that um

I think that's a future you can't you can't have uh humans you know reviewing every single UI T and every single logging addition to to the logging

systems right some things will you know privacy will you know evolve and some things need to be automated that that's my that's my take

that's your take yeah so a large percent will probably automated but there will still be need for human assessment um or what do you think you already touched

upon it a bit that privacy problem will be bigger um so we need automation but what do you think yeah again to to maybe to repeat a

little uh it's a losing game if you are not using AI to repeat to solve the Privacy problems the engineers are using it and it's it's going to it will never go down from here it will always

escalate it will always be more data collected more AI in our lives for example in the health seere there will be uh I mean again this is a little bit

uh maybe future telling uh there will be a trade off between Ai and between health and privacy for example so there will be all these I there's already this is already happening but it will only

grow bigger uh these variable Health devices will collect a lot of personal data from our body from our day-to-day lives and all that personal data needs to be governed and again with the scale of AI this is going

to be huge privacy problem if you are not using uh automated tools to uh prevent privacy incidents and and the

Govern uh in the like at the starting at the design and then continues uh uh like steps of uh prevention tools uh we are

not going to cope with the scale there yes so uh to maybe uh wrap it up I would say AI with with AI the the scale is

growing very fast and the best tool to address that is again using AI to uh mitigate that's very interesting so we

will need AI to tackle some of the AI issues Stefano your thoughts yeah I agree with EOS and great points v um I also think that automation

is is needed for what they said but I also think that human oversight on privacy reviews will be required for a long time even if the whole review itself gets automated by AI um because

oversight is a key principle of responsible Ai and I see that in future um expect respons responsible AI discussion to become more and more Central to product development and

driving AI products to special in narrow functionalities why specialized AI models that requ require less data to work um will first of all raise less p concerns we'll have lower Network cost

of transferring data H you can think about some proposals of sending over the Internet one screenshot every second and also they will have lower compute cost and may allow even running models

locally so saving even more on compute costs so uh we're approaching the end of our panel um I'm checking if there are

any questions but otherwise any closing remarks on the feature of um Ai and privacy engineering um or you want to

start yeah so I mentioned the the uh like multi-layered approach to privacy governance like again privacy wide design that's the design phase and at

implementation I think provides a very nice Solution by C scanning and then when you deploy these Solutions you still need monitoring there right uh even pushing the in in the middle phase

when you are implementing one thing for example we are thinking at p is currently we are uh scanning the code in the repository shifting this further to the left would be for example if your

company has a privacy assistant pretty much like cod pilot baked into the development environment of your engineers they can immediately warn the engineer about some violation if they

are for example writing a function that shares data with social media uh company that is not sanctioned uh with the compan policy so yeah there there's a

lot of sment uh in in the company I would say using AI for for solving privacy problems thank you yes your closing

thoughts like the the some of the words right like that were like responsibility like oversight scale like we should consider all these things as we

deploying these Technologies right be mindful what we're doing and um yeah like generally I'm you know cautiously optimistic about the future and when the

M right there's not enough PSD students coming out of universities to help us so we need the m to help us here somehow yeah thank you

Stefano yeah I think AI has great potential to facilitate the work of privacy engineers and not only PIV engineers and and in turn to make products more private um we just have to

understand how to best use AI for it all right thank you [Music]

Loading...

Loading video analysis...