Great question? Maybe.
The answer? Obviously depends on how you view it. In a nutshell, CASB monitors the overall cloud usage. DLP is built for protecting data in the cloud or wherever it is.
In fact, the question is itself a bit rhetorical. Maybe it is equivalent to asking: If you want to drink a great cup of coffee, would you go to Starbucks or Cheesecake Factory? Cheesecake factory will give you sub-standard coffee with a great cheesecake. And Starbucks will give a great cup of coffee with a sub-standard (or no) cheesecake. It truly depends on how you view and prioritize what you really want.
Who could potentially resonate with this article?
If you are an IT/security leader working at an organization that was either born in the cloud or has gone through the necessary digital revolution to be and live more or less in the cloud, you might find contents of this article useful for shaping your cloud security strategy. If you are not an IT/security leader, I really appreciate your time and hope I keep my language simple with minimal use of excessively technical words.
So, what is this all about?
CASB stands for Cloud Access Security Broker. This term gained popularity between 2012 and 2018 with the rise in cloud adoption across all industry verticals. CASB was the technology to detect and prevent users from signing up for and using cloud applications that were not authorized by the company's IT/security policies. As companies started easing out restrictions on which cloud applications are allowed to be used, employees started using more and more applications which were not necessarily sanctioned by the organization's IT. This problem was popularly referred to as Shadow IT. This called for a tool to monitor usage of unsanctioned applications by organization's employees and notify it to the IT for decommissioning those applications.
Why is this important? Security and cost. IT needs to know the security posture of every cloud application that the organization's employees are using for any business purposes from a confidentiality, integrity and availability standpoint. IT additionally needs to know every dollar that is spent on every SaaS license to ensure IT spend stays within the budget.
DLP stands for Data Loss Prevention. Data security challenges have existed for a while but it keeps getting new wings as market landscape shifts. In simple words,
This is exactly what DLP should be doing to protect organization's data assets. Data loss prevention is quite different from data loss protection and are used as though they are the same, which isn't. We will discuss more on why they are different in a separate article. Data is indeed your most valuable commodity and the primary intellectual property of your organization. Whether it is your customer's sensitive data which needs protection as per global or local consumer privacy regulations like GDPR, CCPA, HIPAA, etc. or it is your own organization's internal sensitive documents like copyright information or intellectual property. Any such information landing on the hands of an untrusted user or an untrusted application could be detrimental to your organization.
Why is this important? Even if your organization sets tight security policies, the weakest security link is always the inadvertent end user. It is nearly impossible to prevent the end user from sending that email containing that sensitive data to that external unexpected person. Or prevent the end user to upload customer data to his personal Google Drive; or share the Salesforce password with all users on a public Slack or Microsoft Teams channel; or take the customer's credit card and SSN over a phone call to record into the company's CRM; or make a SOX file accessible outside the organization on Box or Dropbox. That is why you need a tool to continuously monitor your end user's actions across all applications and devices to maintain continuous data security and privacy compliance.
Now let's get to the meat..
Is DLP a feature of CASB or is it a pillar by itself? This is a deep question and to really answer it well let us try peeling the onion layer by layer. Let me list out some instances and let me categorize them into CASB or DLP use case based on the above definitions.
1. A user creates an account on Airtable using his personal email id and uploads company customer information --- this may look like a DLP use case but since Airtable was not used with a company account it is actually a CASB use case.
2. A user uses company email address and emails a file containing HIPAA data --- this is a DLP use case.
3. A user takes company codebase and uploads to his personal Github account --- this may again appear like a DLP use case as company code base is critical data, but is in fact a CASB use case since the user is using personal Github account for company work which is not sanctioned usage.
4. A user checks in AWS keys into enterprise Github account -- this is a DLP use case.
5. A user uploads company data to personal Google Drive -- this is again a CASB use case and may appear like DLP use case.
6. A user makes a SOX file in enterprise Box account accessible outside the financial org -- this is a DLP use case.
What is the underlying assumption in each of the above examples? CASB is all about preventing users from using unauthorized or unsanctioned applications. Although one primary use case is protecting company sensitive data from landing into unsanctioned applications. But it truly does not matter how sensitive the data is - just the fact that the user is using an unsanctioned application for company work is itself a strong reason to decommission the usage of that application. Whereas a true DLP solution needs to first perform deep content introspection to fully understand what the data means from a sensitivity and criticality point of view and then govern the data as per the company's information security or compliance policies. DLP needs to contextually understand every aspect of the data using vision, deep learning, and (really) every other buzzword you've come across in the ML/AI industry. In a nutshell,
CASB is all about answering the following question: does the data landing into an unsanctioned application being used by the end user match with any of the organization's or its partner data?
DLP is all about answering the following question: is the data landing into or leaving via my enterprise sanctioned application inline with my enterprise information security and compliance policies given it's sensitivity, criticality and access levels.
But does that mean CASB cannot have DLP as a feature or DLP cannot have CASB offerings as features? Of course not. Cheesecake factory does sell very expensive and probably not interesting coffee. And a few Starbucks locations do sell poor tasting cheesecakes. And there are real reasons behind it. Listed below are three primary reasons which make it really hard for CASB to sell great DLP and DLP to offer good CASB features.
1. Market Dynamics: CASB use cases stem from IT and Information Security policies like preventing unauthorized or unsanctioned cloud applications usage until the necessary approval is provided. Whereas, DLP use cases stem from Compliance, Data Privacy, Legal and Information Security combined - given my company, it's vertical, it's customers and the sanctioned cloud applications - what are the data privacy regulations and other compliance requirements around data security that I need to implement.
2. Coverage Requirements: CASB needs to spread a wide net and monitor every internet bound traffic from the end user's device to be able to inform the IT of the use of any non-sanctioned business applications. DLP on the other hand needs to stay laser focused on the most critical business applications that are sanctioned and heavily adopted organization wide. Additionally, DLP needs to be able to solve the problem horizontally across all devices on which the sanctioned application is used.
3. Technology Gaps: CASB primarily needs to be able to perform analysis of the telemetry data captured from the network and endpoint logs. And since CASB's goal often is to instantly block the user from using an unsanctioned application, it needs to perform quick and dirty regular expression checks in those logs. It is technologically not possible to deploy complex machine learning models when the traffic needs to be monitored in path. DLP on the other hand needs to perform automatic and thorough data classification of the company's data assets across all business applications. It needs to perform deep data introspection and possibly even look at historically created and archived data since the start of the organization. And then it needs to continuously monitor for any user actions that might be escalating the access of that company's sensitive data beyond the corporate information security and compliance policies.
So if a CASB company comes and sells DLP at a premium rate, do not forget to ask the following step by step hard questions:
1. Does your DLP engine perform automatic data classification or does it rely on my IT/security team to classify and tag the data manually or using some other tool?
2. If yes to above, what is your false positive and true positive detection rate? Do you classify images, documents, and all other file formats or just plain inline text?
3. If yes to above, what level of contextual analysis does your engine perform? Does it look at the complete conversation or the document and perform natural language processing to understand the semantic meaning or uses quick and dirty regular expression patterns for detections? When you look at an image do you perform plain old OCR or do you take multiple vision based signals into account for more accurate classification?
4. If yes to above, is your DLP engine able to tie into your CASB engine to perform instant blocking of actions or instant remediation of violations? What is the latency involved?
5. If yes to above, is your DLP engine deployed inline with the CASB engine or does it need a separate deployment?
6. If yes to above, call it a non technical sales person bluff. I have my explanation below. The same list of arguments above can be reversed for anyone selling DLP and adding CASB functionalities as premium feature (although rarely anyone is doing that right now).
It is very clear that CASB and DLP are built to solve orthogonal (yet similar looking) challenges and need to have different deployment models, different coverage requirements and different technology stacks. To enable coverage across all web and internet bound applications, CASB should ideally be deployed as an agent on the user's device or on top of the corporate VPN (as a SWG, forward-reverse proxy, or as SASE architecture). To ensure coverage across every device on which the enterprise sanctioned application is being utilized, DLP should ideally be deployed at the application API layer (plugin on top of SaaS application). Because the ideal deployment of CASB is at the endpoint or network VPN layer, it is technically not possible for CASB to deploy complex machine learning models while staying in-path of the network bound traffic without slowing down the user's device or adding mammoth latencies to business applications and killing end user productivity. DLP must perform deeper data classification using the most advanced natural language processing, vision and deep learning technologies for continuous monitoring and prevention of data misuse or abuse. DLP for cloud applications should literally be a 1-click deploy via the API layer which additionally gives it the ability to quickly train on historical data of the organization without requiring months of configuration and tuning.
So do you somewhat agree that it is unfair to sell DLP with CASB or vice versa? Please leave your comments below. And based on your understanding, is there a simplified stack that you can use for your cloud security strategy? Would love to discuss with you!
If you are interested in a horizontal DLP solution covering all your major cloud applications with the features discussed above, check out what we are building here at Gamma AI.