Stopping AI Oversharing with Knostic

Large language models are most useful to your business when they have access to your data. But these models also overshare by default, providing need-to-know information without sophisticated access controls. But organizations that try to limit the data accessed by an LLM risk undersharing within their organization, not giving the information users need to do their jobs more efficiently.

In this episode, Sounil Yu, CTO at Knostic, explains how they address internal knowledge segmentation, offer continuous assessments, and help prevent oversharing while also identifying under-sharing opportunities. Joining him are our panelists, Ross Young, CISO-in-residence at Team8, and David Cross, CISO at Atlassian.

Got feedback? Join the conversation on LinkedIn.

Huge thanks to our sponsor, Knostic

Knostic protects enterprises from LLM oversharing by applying need-to-know access controls to AI tools like Microsoft 365 Copilot. Get visibility into overshared data, fix risky exposures, and deploy AI confidently—without data leakage. If you’re rolling out Copilot or Glean, you need Knostic.

Full Transcript

[Voiceover] Connecting security solutions with security leaders. Security You Should Know starts now.

[Rich Stroffolino] Welcome to Security You Should Know. Today, we’re talking with Knostic and what they’re doing in AI security. And the problem that they’re addressing is that AI models are oversharing, and we have to find a way to solve it if we’re going to be using these. So, helping us get answers to these questions are David Cross, CISO at Atlassian, and Ross Young, CISO-in-residence at Team8. So, Ross, I’m going to start with you here. Why are AI models oversharing so much?

[Ross Young] Yeah, I think we have to understand how much we need dynamic permissions. Let’s say Jackie is on the accounting team and needs all the access to the accounting data and we train the model, and it learns everything from all the different systems. And then later on, Jackie isn’t on the accounting team; she’s on the procurement team. Did we make sure that model now removes Jackie’s old accounting access, or can she get everything that she used to have even though it’s no longer necessary for her new role? Those little questions as we start to think about dynamic access and changing roles are going to be really, really important because if Jackie gets phished and they say, “Hey, you lost all this accounting data,” and she has no longer need-to-know, that’s a big issue for you and your regulators.

[Rich Stroffolino] All right, David, I’m going to throw the question to you. Why are these AI models oversharing? Do you agree with what Ross said?

[David Cross] I do. I think it’d give it some more context to that, no pun intended, I think there’s an element where in many companies, you’re taking all the data in. Let’s take earnings data, right? And that everyone should have access to earnings data. But actually, for public companies, there’s a window of who should have access and when, right? And so, there’s times that maybe after earnings everyone can have access to that or for a period of time. But when you’re in that closed period, right, is should everyone have access? Well, the answer is probably not, right?

And so how can you have the full context in your environment is when someone should have access to data, and maybe there’s restrictions on just the time windows. And I think that’s some of the more difficult things now is LLMs are the best place to…supporting all these scenarios, but then how do you have an access control model that’s not the traditional DACL model? Read access, execute access, deny, I think it’s much more complex than that.

[Rich Stroffolino] Well, today helping us navigate that complexity, we’re going to be talking with Sounil Yu, co-founder and CTO from Knostic. So, to start out, Sunil, we need the answers to three essential questions. First, how do I explain the value of your solution to my CEO? What does your solution do and what does it not do? And then the pricing model. So, can you give us these preliminaries?

[Sounil Yu] Sure. So, thanks for having me on. What we do is primarily focused on giving large language models discretion. Think of the large language models that we deploy for enterprise search as being like a CEO who knows everything inside the organization, but if a marketing intern asks a question such as, “Hey, what’s next quarter sales revenue?” the CEO has discretion and says, “Well, let me talk to you about how maybe your marketing campaigns are helping our sales numbers, but I’m not going to give you the specific sales numbers.” Ability to have that discretion is what humans have, but the machines don’t have that.

And so, what we’re doing is teaching these machines how to have discretion by offering a way to segment knowledge based on your role and your job function. So, the way that we do that is by actually probing our systems to figure out what is being shared, what’s not being shared, and defining those rules within the organization.

The pricing model that we use to be able to frame this is by probing from different vantage points the number of different profiles in your organization and the number of different topics as the multiplication of those two values – the number of profiles and the number of topics – that determines what the pricing is for our services.

[Rich Stroffolino] Excellent. Okay. So, we’ve gotten a taste for the solution. Thank you so much for that, Sounil. I’m sure though – Ross, David – you have a lot of questions. David, I’m going to come to you first. What other questions do you have about Knostic?

[David Cross] Sounil, one of the things I’ve been wondering is I, it was a couple weeks ago, I listened to a Gartner webcast, and they’re talking about customers using like Copilot and things like that and they got all this data. But then they realized they don’t know how to restrict the access into the Copilot world for maybe scenarios like earnings or things like that. Does Knostic help with anything like that?

[Sounil Yu] Yeah. So, one of the big challenges that we have with these large language models is that the value comes from actually grabbing a lot of the underlying content that’s available inside your enterprise. And really, the reason why most people roll out large language models is to actually address the undersharing problem.

We have a bigger undersharing problem than we have an oversharing problem, but the oversharing problem is particularly acute and that’s what we noticed first. But that the challenge is if I try to fix the oversharing problem by removing permissions, by removing content, I just exacerbate the undersharing problem that we started with. And so, the challenge is that we can’t fix it by just fixing the data problems that we have. And by the way, everyone has data problems.

Things are overshared already. And what we’re seeing is that these large language models are just exacerbating or accelerating the discovery of this overshared content. So, our view is we got to fix a knowledge problem with a knowledge-level approach which is how do you actually define need-to-know inside the organization so that you can then instruct and provide proper guardrails for each individual, each role within the organization?

[Rich Stroffolino] All right. Ross, I’m going to throw to you. What questions do you have for Sounil and Knostic?

[Ross Young] Yeah, I’d love to hear you talk a little bit more about the types of data. For a long time, it’s very easy to classify as data and say this is secret, top secret, and top secret doesn’t leave the organization. But what I’m finding is a lot of times, CISOs have things like source code where we’ve never actually classified that. How are you using knowledge sharing on other pieces like that and how do we control that access to making sure that doesn’t leave the company when someone changes jobs to our competitor?

[Sounil Yu] So, first of all, we’re focused on the problem around the content being shared internally, so we’re not actually focused on the external sharing of this content. There’s going to be plenty of other solutions, and we’re not focused on that particular problem. We’re focused primarily on internal knowledge segmentation. So, the way that we currently classify content, it’s at this very crude level of confidential, highly confidential, or in the national security space – secret, top secret, and so on and so forth. Anyone who’s been in that space also recognizes just because you have a secret clearance doesn’t mean that you can see all things secret. And it’s the same sort of construct inside of an organization as well.

Just because you are part of the organization doesn’t mean you should have access to all things confidential. So, what we have to do is to move it up a level, have a semantic understanding of what this content really is about, and then align that to your job function itself. What is it that your job function needs to be able to get your job done and to provide, again, those sort of boundaries to the large language model so that it can answer it in the proper context?

[David Cross] Yeah. So, Sounil, I can’t resist now saying that having the context and then the knowledge graph and things like that is certainly, as Ross, you brought up the scenario of things leaking out. Then I start getting worried like, “Oh, are we going to have mandatory access controls and actually access controls to [Inaudible 00:07:08] data?” But if we don’t have that, then how do we prevent the scenarios where we’ve seen certainly at RSA last week where people like, hey, doing all these prompts, injections, things like that, to get around it to find the data that they know is there but they’re restricted to. Does Knostic help in those scenarios as well without making everyone do MACL and ACL scenarios?

[Sounil Yu] Yeah. So, first of all, let’s actually frame what we’re doing. So, well, you mentioned mandatory access controls, and that’s actually what we have today for these large language models. It’s a one-size-fits-all. Everyone applies the same guardrail. If you’re trying to, for example, use the public ChatGPT, no one can ask about biological weapons. But what if your job function actually deals with biological weapons? Maybe you should have a need-to-know for that.

Well, it’s the same sort of… The problem within the enterprise is similar. What we need is discretionary access controls, and as I mentioned at the beginning, these large language models lack discretion. So, what we need to do is to give it the parameters for discretion and that’s what we’re calling context. What’s the context? It’s based on your job function and what you do and what you need to know to get your job done. Now, there’s another piece which is we are right now acting as a policy decision point. What you were bringing up was a policy enforcement function.

And what we’ve seen repeatedly is that the policy enforcement is very weak today and that’s just where we are with the technology. I’m not sure if we’re going to be able to solve that. In fact, that’s not the focus that we have today. It’s something that we’re going to rely upon other providers to do at this point in time.

[Ross Young] So, when I start to think about different companies, one of the things I always like to look at is what are the metrics that I’m going to use from your tool each month to demonstrate the value of a tool. Can you talk about what are the things you like to measure with your tool and how that shows value to the CISO and their leadership team?

[Sounil Yu] Sure. So, one of the ways that we’re actually showing value is by measuring the extent of oversharing but what it also enables us to do is to also evaluate the extent of the undersharing as well. Which is actually quite fascinating because the reason why we can assess both angles is because we’re coming in and evaluating what is shared from the vantage point of a user that has really no permissions or just part of the everyone group as well as a user that has permissions to everything.

Again, the CEO, right? Now, what that does is it sets the lower and upper boundaries of what knowledge is inside your organization and to whom it’s shared. So, now imagine on the lower bound, I can say, “What is being shared that’s sensitive to the person that has no permissions?” That gives us a sense of what’s overshared. But on the other end, there are people who have a need to know for various topics, and what is actually being shared with them?

Do they not get what they should know? And that’s the upper bounds and that gives us an understanding of both the potential risks but also the potential possibilities what can be shared inside the organization itself.

[David Cross] Coming back to that, it kind of thought maybe think of another question in this area is sometimes is just having the policies in place and decision-making things like that, but it’s also reporting, right? We know we all live in a world of audits and compliance, right? And if we don’t really have good reporting, then how can you actually pass these audits? Any thoughts on how this LM space, right, can deal with that?

[Sounil Yu] Sure. So, one of the things that we have learned through just the practice of security is that the way that you have assurance is by doing an end-to-end test. You do an end-to-end test and some of those, we call it a pentest, some people call it a red team, but at the end of the day, you’re performing a test to see what is actually happening with a system. And that’s essentially what we’re doing. We’re basically performing end-to-end tests to determine what is actually being shared or undershared with respect to these different topics and to whom. When it comes to audits or auditing, that’s essentially what they’re doing too.

They’re saying, “Hey, let me go in and actually try it and see what happens.” And we see our tool as being potentially one that auditors would use but we certainly would want to see practitioners using it as well to be proactive before the auditor comes. But we are talking to audit firms and working through some arrangements to see if we can equip them with these tools, which will ultimately provide that end-to-end view to see whether or not the oversharing is happening within someone’s organization.

[David Cross] Yeah, I can’t resist following up on one element of that. It sounds fantastic. I think so those of us that have our Dart and SoC [Phonetic 00:11:23] and things like that, we want to have all these synthetic tests, like is my alerts actually capturing these things? So, it kind of sounds like you’ve been thinking the same way with LLMs. It’s like yes, is the right person getting the right access or not, right? And having the same kind of philosophy.

[Sounil Yu] That’s right. And you know what? AI is a really big black box to a lot of folks, and because of that, we can really effectively evaluate the output whereas we can’t really affect the way evaluate what happens in between the input and the output. It’s sort of like pollution. If you’re trying to regulate pollution, a lot of times we don’t really know how the pollution gets created. At the end of the day, we would just basically regulate the output and say, “Look, you need to stay within certain boundaries.” And so, that’s essentially how we’re looking at it today. These AI systems – big black box – at the end of the day, does it overshare or not? And if it does, we got a problem, let’s go fix it. If it doesn’t – hey, you’re good.

[Ross Young] I really like the concept that you mentioned before of the undersharing opportunity. Can you maybe just tell a simple success story of a customer or someone else who found that, hey, if we had just granted this access, it would have really helped the mission?

[Sounil Yu] Sure. So, one of the things that many organizations do, especially with Microsoft 365, well, a lot of people have that turned on and they’re using Copilot for N365 as well, many folks have turned on what’s called restricted SharePoint search. And what it does is essentially turns SharePoint from a default share and deny by exception to a default deny and share by exception. Essentially it turns SharePoint into unSharePoint. And when you deploy, when you turn it into unSharePoint, it kind of ruins the value of Copilot itself. And so, we’ve actually done that test and say what do you lose?

What value do you lose when you turn on restricted SharePoint search? And it’s pretty substantial. It kind of, again, you’re paying 30 bucks a head for this resource but if you turn on something like SharePoint restricted search, you end up with a dumb Copilot, a dumb LLM, and what’s the value in that? And so, the ability to measure the lower bound and the upper bound I mentioned earlier is actually pretty beneficial because the number one reason why we roll these out is because we’re trying to solve the undersharing problem. But the number two reason why we don’t roll them out is because of the oversharing problem. They’re two sides of the same coin.

[David Cross] So, Sounil, coming back to element about the end-to-end perspective, so then it sounds like then you can actually help companies and actually having the continuous pentesting itself to making sure. Like what’s being shared, does the person have the need to know or not? And not having to constantly do one-off kind of tests of the functionalities. Is that correct?

[Sounil Yu] That’s correct. So, what we’re doing is essentially providing that continuous assessment to see what’s overshared or undershared. But beyond that, actually, what we’re really trying to do is to define those discretionary need-to-know boundaries. That is actually the core of what we’re really doing – capturing those boundaries. Because those boundaries are not just for Copilot or your internal LLM.

They are also for Salesforce, they’re also for Slack, they’re also for Notion, they’re also for Atlassian, they’re also for all these different products that we’re using that have LLMs in there as well. Each of those need to follow the same rules of discretion, and what we’re looking at is how do we provide those rules in a consistent way across all these different applications that we have.

[Ross Young] And I love what you said there because I hate going to the managers to say, “Hey, do a quarterly review of all your employees and remove all the access.” It’s a lot of time from a lot of people, and if there’s a tool that does that, that’s a lot of time savings for managers across a large company.

[Sounil Yu] Yeah, and if you think about it, all our access is predicated on a starting point which is what is your need to know. We’re using that as a first-class citizen. We’re treating that with our first principles approach to say let’s define need to know first, and everything trickles down from there.

[Rich Stroffolino] All right, Sounil. What’s one thing we didn’t ask about that we need to know about Knostic?

[Sounil Yu] Well, I use the word need-to-know, and a lot of us see it as a restrictive thing, “Sorry, I can’t tell you because you don’t have a need to know.” But as I suggested earlier, need-to-know is also a permissive thing, and when we consider that sort of approach, it’s revolutionary. It really changes the nature of who we are in the security function because typically, we’re the department of no, N-O, but if we look at it as a permissive thing, we can become in the department of K-N-O-W. How do I deliver a content that is within your need-to-know boundary but doesn’t include things that are not in your need-to-know boundary? That is, I think, transformative within how we look at ourselves in security, but also how we can help enable the business and accelerate the business through tools like Copilot and LLMs inside the enterprise.

[Rich Stroffolino] Well, that’s just about it for this episode of Security You Should Know. To learn more, head on over to Knostic.ai. A big thank you to Ross Young and David Cross for helping us learn more about Knostic, and thank you to you, Sounil Yu from Knostic, for your time and being game to answer all of these questions. And thank you for listening to Security You Should Know.

[Voiceover] That wraps up another episode of Security You Should Know. If you like this program, please subscribe, tell your friends, and leave us a review. All companies showcased on this program are sponsors of CISO Series. If your company would like to be spotlighted and interviewed by our security leaders, go to our contact page on CISOseries.com or just email us at info@CISOseries.com. Thank you for listening to Security You Should Know, connecting security solutions with security leaders.

Rich Stroffolino
Rich Stroffolino is a podcaster, editor, and writer based out of Cleveland, Ohio. Since 2015, he's worked in technology news podcasting and media. He dreams of someday writing the oral history of Transmeta.