Friday, January 17, 2025 — CSOH Meeting Recap

Friday, January 17, 2025 — Welcome and AI Presentation Discussion

Quick recap. The meeting involved discussions on the use of AI models, with a focus on the risks associated with them and the importance of protecting enterprise data privacy. The team also explored the potential risks and benefits of using private versus open-source language models for sensitive data, and discussed a technique for using large language models without exposing sensitive data. Lastly, the meeting touched on the importance of systems-level orchestration skills in the AI field and the need for learning from open-source AI communities.

2025-01AIVulnerabilitiesGuest SpeakerCommunity

Show 6 discussion topics

Welcome and AI Presentation Discussion

The meeting began with greetings and introductions, including a welcome to new members. Shawn expressed gratitude for the team's participation and mentioned the group's 2-year anniversary. A LinkedIn thread was shared for networking purposes. Mario then presented a topic related to AI, and the meeting was recorded for future reference. Katie, a new member from Oxford, introduced herself and shared her interest in cloud security. The conversation ended with Mario's presentation on AI.

Protecting Enterprise Data Privacy With LLMs

The meeting discusses the importance of protecting enterprise data privacy from threats posed by large language models (LLMs) like ChatGPT. Mario and Walid highlight that 21% of data shared with LLMs is private, with 27% of employee-shared data being confidential. They emphasize upcoming AI data privacy regulations across the US, mandating data preprocessing, removing personal identifiable information, and robust data classification before LLM interaction. Noncompliance can lead to significant fines up to 4% of global revenue or $20 million. The speakers stress the need to address data privacy as a pressing requirement given the increasing adoption of LLMs and the risks of exposing sensitive information inadvertently.

AI Model Security Risks Discussed

Mario discussed the top 10 security risks associated with AI models, with prompt injection being the most significant. He explained two types of prompt injection: direct, where users bypass security controls by manipulating data, and indirect, where users trick the AI model into revealing confidential information. Mario also highlighted the challenges of undoing these actions once they've been done, given the significant investment in creating the AI models. He further discussed the risks of data loss prevention, code injection, and the use of AI agents for software development. Lastly, he touched on the potential risks of multimodal AI, which includes voice and image inputs. The conversation ended with Mario preparing to demonstrate a website that can be used for indirect injection.

Private LLMs for Sensitive Data

The team discussed the potential risks and benefits of using private versus open-source language models (LLMs) for sensitive data. They agreed that owning a private LLM in an air-gapped environment is the most secure way to handle confidential information. However, they also noted that running a private LLM can be expensive and complex. The team also discussed the idea of fine-tuning a base model on a specific task to enhance its performance and security. They concluded that the best approach is to not send sensitive data to an LLM in the first place, but to obfuscate or mask the data before sending it to a third-party LLM. The team also touched on the topic of RBAC and RAG, but did not delve into detail.

Training AI for PII Masking

Walid discussed the process of training an AI model to identify and mask critical personal information (PII) for businesses. He explained that the model could be fine-tuned on specific PII types, such as names or addresses, and then deployed to mask data before it's sent to an LLM. Walid also demonstrated how to use the Ubi platform to upload a dataset, label it, and fine-tune a model. He emphasized the importance of achieving high accuracy in the model's performance and suggested ways to improve it. The conversation ended with Walid demonstrating the model in action, anonymizing data before sending it to an LLM.

Protecting Data With LLMs and AI

The meeting discussed a technique for using large language models (LLMs) without exposing sensitive data. Walid and Mario presented a multi-step process where an entity recognition model first identifies and masks sensitive information like names and account numbers in the input text. This masked text is then sent to an LLM which generates a response without ever seeing the actual sensitive data. Finally, the sensitive information is de-anonymized and inserted back into the LLM's response. This allows utilizing LLMs while protecting private data. The discussion also touched on topics like prompt engineering for controlled data access, joining open-source AI communities for learning, and the need for systems-level orchestration skills in the AI field.

↑ All meeting recaps

Friday, January 17, 2025 — Meeting Recap