White Paper | SafeIdea

I. Executive summary

Large language models have moved from novelty to daily practice tool. With that adoption comes a structural conflict: frontier AI requires data to deliver reasoning, and the duty of confidentiality requires that client data not be shared with third parties the attorney does not control. This conflict cannot be resolved by contract or by terms of service. It is architectural.

Two recent rulings make the architectural problem operational. In February 2026, the U.S. District Court for the Southern District of New York held in U.S. v. Heppner that AI-assisted documents processed through a consumer-cloud LLM are not protected by attorney-client privilege on the facts of that case. In January 2026, Judge Sidney H. Stein affirmed Magistrate Judge Ona T. Wang's order compelling OpenAI to produce 20 million ChatGPT conversation logs in the In re OpenAI consolidated copyright litigation.

These rulings may be revised, distinguished, or overturned. The structural problem they expose will not be. Cloud AI providers retain conversation data, are subject to compulsory legal process, and are increasingly the target of training-data extraction research. The technical question is no longer whether to use AI, but whether the attorney can demonstrate that confidential client information stayed under their control.

SafeIdea is compliance infrastructure for AI in legal practice. A local application that runs on the attorney's machine. Its patent-pending Masking Engine identifies confidential entities in documents and prompts and replaces them with stable placeholders before transmission to any cloud LLM. Real names are restored locally when the response returns. Original documents never leave the firm's control. Every matter can produce a signed, chained, tamper-evident Compliance Receipt, the documented "reasonable efforts" artifact under Rule 1.6 and Formal Opinion 512.

II. The AI transformation in legal practice

Artificial intelligence has moved from experimental curiosity to daily practice tool. Attorneys use AI systems for document review, contract analysis, legal research, correspondence drafting, and strategic planning. The efficiency gains are substantial: tasks requiring hours can be completed in minutes.

This transformation brings corresponding risks. Most AI tools operate as cloud services: content leaves the attorney's computer, travels to remote servers operated by the AI provider, and is processed in an environment the attorney does not control. AI systems may learn from inputs, store representations in model weights, and potentially reproduce content in ways that static databases cannot.

The central question is not whether attorneys will use AI, that ship has sailed. The question is how attorneys can use AI while satisfying their professional obligations to protect client information.

III. Legal and ethical framework

3.1 Model Rule 1.6: confidentiality of information

Model Rule 1.6(a) provides that a lawyer shall not reveal information relating to the representation of a client unless the client gives informed consent. Rule 1.6(c) requires lawyers to make reasonable efforts to prevent the inadvertent or unauthorized disclosure of, or unauthorized access to, information relating to the representation.

The Comments to Rule 1.6 acknowledge that what constitutes "reasonable efforts" depends on the circumstances. Comment [18] specifically addresses electronic communications, noting that lawyers must take reasonable precautions to prevent information from coming into the hands of unintended recipients.

3.2 Model Rule 1.1: competence

Model Rule 1.1 requires lawyers to provide competent representation. Comment [8] explicitly addresses technology: lawyers should keep abreast of changes in the law and its practice, including the benefits and risks associated with relevant technology. For AI tools, this means attorneys must understand not just how to use these systems, but how they process and potentially retain information.

3.3 ABA Formal Opinion 512

On July 29, 2024, the ABA released Formal Opinion 512 addressing generative AI in legal practice. The opinion emphasizes that attorneys must understand whether AI systems are "self-learning" and mandates informed consent before using client data in AI tools. Critically, the opinion states that boilerplate consent in engagement letters is insufficient, specific, informed consent is required.

3.4 U.S. v. Heppner (S.D.N.Y., February 2026)

In U.S. v. Heppner, Judge Jed Rakoff held that the defendant's exchanges with the Claude AI platform were not protected by attorney-client privilege or the work-product doctrine. The holding is fact-bound on three findings: the conversations were not the product of communication between client and attorney, were not undertaken for the purpose of obtaining legal advice, and the platform's terms permitted disclosure to the provider and to regulators, eliminating any reasonable expectation of confidentiality.

The court did not hold that AI-assisted documents through any third-party cloud are unprotected; commentary observes that the analysis might differ if counsel had directed the use or if the AI platform's terms preserved confidentiality and prohibited training on user inputs. The opinion's load-bearing signal for legal practice is structural: where the AI service has the right to view, retain, or train on user content, the user's reasonable expectation of confidentiality is at minimum contested.

3.5 In re OpenAI and the Wang preservation/production orders

In the consolidated copyright multidistrict litigation captioned In re OpenAI, Magistrate Judge Ona T. Wang issued preservation and production orders compelling OpenAI to retain ChatGPT output logs and produce a 20-million-log anonymized sample to plaintiffs. On January 5, 2026, District Judge Sidney H. Stein affirmed the production order.

The order's direct application is to the copyright dispute and to OpenAI's preservation duty as a defendant in it. The broader signal for legal practice is the structural one: AI providers can be compelled to produce user logs by third-party legal process in litigation the firm has nothing to do with. The logs sit at the provider. The attorney does not control whether they are produced.

IV. The threat model

This section presents documented security incidents and peer-reviewed research demonstrating specific confidentiality risks associated with cloud AI services.

4.1 Training data extraction

The risk that confidential information submitted to AI models could be extracted by adversaries is not speculative. Peer-reviewed research demonstrates that training data extraction attacks succeed against production systems.

In research published at ICLR 2025, Nasr et al. demonstrated that alignment, the safety training designed to make models refuse harmful requests, provides an "illusion of privacy" but does not eliminate memorization. Using a "divergence attack," researchers extracted training data from ChatGPT at a rate 150× higher than standard prompting. Over 5% of output under attack conditions consisted of verbatim copies from training data, including real personally identifiable information.

Research published in January 2026 by Stanford and Yale researchers extended these findings. Claude 3.7 Sonnet reproduced 95.8% of Harry Potter and the Sorcerer's Stone when prompted with jailbreak techniques. Gemini 2.5 Pro achieved 76.8% recall without jailbreaks. A control book published after all models' training cutoffs returned 0% recall, confirming actual memorization rather than hallucination.

Implication. If production models can reproduce near-complete copyrighted books, they can reproduce any sufficiently distinctive content memorized during training. Content submitted to AI systems that becomes part of training data could theoretically be extracted by adversarial users.

4.2 Policy reversals

Until August 2025, Anthropic's Claude was marketed as the privacy-first alternative: user data was not used for model training and was generally deleted within 30 days. On August 28, 2025, Anthropic announced that user conversations would be used for training unless users opt out, with data retention extended to 5 years for users who do not opt out, a 6,000% increase. The opt-out toggle was pre-checked to "On" with a prominent "Accept" button.

Implication. Attorneys using consumer AI products have no contractual rights to prevent policy changes. Providers can unilaterally extend retention periods, enable training on previously-protected data, or modify access controls.

4.3 Active exploitation

On January 8, 2026, security researchers disclosed ZombieAgent, a zero-click prompt injection attack targeting ChatGPT's connected services. Attackers embed hidden instructions in emails (white text on white background). When users ask ChatGPT to summarize their inbox, the AI reads and executes the hidden instructions, exfiltrating data server-side, invisible to the user and to enterprise security tools. OpenAI patched the specific vulnerability but noted that prompt injection "is unlikely to ever be fully 'solved.'"

In January 2026, OX Security discovered two Chrome extensions, with 900,000 combined users, exfiltrating complete AI conversation data every 30 minutes while requesting only "anonymous analytics" permissions. One extension had achieved "Featured" badge status in the Chrome Web Store before detection.

Implication. Data can be exfiltrated through vectors entirely outside the AI provider's control. Endpoint security and content-isolation are necessary; provider-side protections are not sufficient.

4.4 Structural access

Even when providers offer "incognito" or "temporary chat" modes that exclude conversations from training, authorized personnel retain access. Trust & Safety teams review conversations for policy enforcement. Engineers access data for debugging. Legal teams respond to subpoenas. OpenAI retains "temporary" chats for 30 days for abuse monitoring. The In re OpenAI production order requires OpenAI to retain consumer ChatGPT conversations under a separate preservation duty, overriding stated retention policies.

Implication. "Incognito" addresses training exclusion but not employee access, legal process, or breach exposure.

V. Why policy-based protections are insufficient

The threat model in Section IV reveals that policy-based protections, incognito modes, opt-out settings, privacy policies, no-training contractual clauses, address only a subset of confidentiality risks.

Threat vector	Policy protection available?
Training data extraction	Partial (incognito mode)
Policy reversals	No
Prompt-injection attacks	No
Malicious browser extensions	No
Employee access	No
Legal process (subpoenas, preservation orders)	No
Provider breach	No

The fundamental issue is architectural. When content leaves the attorney's computer and resides on the provider's servers, protection depends entirely on the provider's policies, security practices, and legal resistance, none of which the attorney controls or can verify.

This creates an irreducible trust problem. Using a cloud AI service requires trusting engineering staff with system-level access, Trust & Safety reviewers, DevOps personnel, third-party contractors, the effectiveness of the provider's security team, and the provider's legal team's resistance to legal process. The total number of personnel with access is not disclosed by any major provider.

Heppner and Wang are the operational expression of this structural problem. Heppner removed privilege protection from a defendant's own AI conversations on the facts before the court. Wang demonstrated that AI providers can be compelled to produce user logs by third-party legal process. Both rulings could be revised on appeal. The structural vulnerability they expose is independent of any specific ruling.

VI. Technical safeguards: the SafeIdea architecture

The alternative to policy-based protection is architectural: ensure that confidential content never reaches environments the attorney does not control.

6.1 Local processing, defined

A local application is software that runs on the attorney's own computer, a native desktop application like Microsoft Word or Adobe Acrobat, rather than in a web browser connected to remote servers. Content processed by a local application remains on the attorney's machine unless explicitly transmitted elsewhere.

SafeIdea runs locally on macOS and Windows. The Local Models that drive entity detection run on the attorney's workstation via Ollama; no internet round-trip is required to identify privileged entities in a document. The cloud is reached only when the attorney has approved the masked content for transmission to a chosen AI provider.

6.2 The Masking Engine

SafeIdea's patent-pending Masking Engine intercepts content before transmission to cloud AI services and replaces privileged entities with stable masked placeholders. The cloud AI processes only the masked content. Responses are re-mapped locally to restore original identifiers before presentation to the attorney.

The mask format follows a single template, used everywhere: [entitytype_NN]. Single square brackets, lowercase entity type, single underscore, integer counter that is stable within a matter. Examples: [person_1], [org_42], [email_7], [case_number_15].

This architectural approach addresses the threat model directly:

Threat vector	Local masking protection
Training data extraction	Masked content contains no real identifiers to extract.
Policy reversals	Retained data contains no identifying information.
Prompt-injection attacks	Exfiltrated data lacks identifying context.
Malicious extensions	Same, no identifiers to capture.
Employee access	Personnel see only masked content.
Legal process	Subpoenaed data contains no client identifiers.
Provider breach	Breached data is non-identifying.

6.3 The three-scope Masking Dictionary

The Masking Engine writes to a Masking Dictionary that ships at three scopes, all live at launch:

Matter-Level Masking Dictionary. Per-matter store of entities (people, organizations, addresses, case numbers) that SafeIdea masks consistently within a single matter. Created when a matter is opened. Lives on the attorney's machine. The default scope.
Local Cross-Matter Masking Dictionary. Per-attorney store of entities the attorney has promoted from a Matter-Level Masking Dictionary. The attorney's terminal scope. Recurring entities mask consistently across all of the attorney's matters.
Firm Masking Dictionary. Firm-scoped store of canonical firm-wide entities (firm partners, frequent opposing counsel, courts, standard vendors). Built by the Firm Administrator using the SafeIdea Indexer in Dictionary-only mode against the firm's seed sources, and distributed to firm attorneys through the firm's normal file-sharing channels.

Promotion is UI-deliberate, never automatic. Local Cross-Matter entries do not propagate to the Firm Masking Dictionary; the boundary is by design. The Firm Administrator builds the firm's institutional masking layer on their own cadence using the SafeIdea Indexer; SafeIdea does not include its own sync service.

6.4 Compliance Receipts

SafeIdea generates Compliance Receipts on demand for any matter. A Compliance Receipt is a signed, cryptographically chained, tamper-evident PDF that documents:

Principal, the authenticated identity that used SafeIdea on the matter.
Timestamps, issuance and session window.
Dictionary reference, which Firm Masking Dictionary version and Matter-Level Masking Dictionary were in force.
Actions, counts of masking actions (masks, unmasks, promotes).
Model invoked, which AI model the attorney sent masked content to.
Signature and chain, cryptographic signature plus chain to the Audited Session Records the Receipt summarizes.

The Receipt never captures the underlying confidential content. It is the attorney-owned artifact you produce on demand for a managing partner, ethics counsel, a malpractice carrier, a regulator, a client, or a court. It is what makes "reasonable efforts" tangible rather than rhetorical.

6.5 The privilege question, narrowly

Courts have consistently declined to find privilege waiver when attorneys use third-party technology services, cloud storage, e-discovery platforms, document management systems. AI may present a structurally different risk: the January 2026 book-extraction research demonstrates that content submitted for AI processing can become permanently encoded in model weights in ways traditional storage cannot.

Even if courts ultimately decline to find that AI use waives privilege, attorneys face substantial uncertainty. ABA Formal Opinion 512 explicitly requires attorneys to understand whether AI systems are "self-learning" and mandates informed consent before using client data. Local masking provides a technical mechanism to comply with these requirements regardless of how courts eventually resolve the privilege question. When client-identifying information never reaches the AI provider, the question of whether AI processing could waive privilege becomes moot.

VII. Implementation framework

Not all AI uses require the same safeguards. The risk profile depends on whether client-identifying information is in scope.

Use case	Recommended approach
General legal research (no client facts)	Cloud AI acceptable.
Routine correspondence referencing client names	Matter-Level masking via SafeIdea.
Contract analysis with identifying details	Matter-Level + Local Cross-Matter masking; review and approve before transmission.
M&A strategy, litigation planning, privilege-sensitive matters	Maximum masking; consider running the AI step against a local model entirely.

7.1 Recommendations for managing partners and ethics counsel

For matters involving identifiable client information, implement local masking before cloud AI processing.
Do not rely solely on incognito mode for privileged communications or sensitive strategy.
Develop internal guidelines distinguishing appropriate use cases by sensitivity level.
Ensure engagement letters address AI use and obtain specific, informed consent per ABA Formal Opinion 512.
Monitor AI provider policy changes, the August 2025 reversals demonstrate that protections can evaporate.
Produce a Compliance Receipt for every matter that touches client-identifying information. Treat the Receipt as the firm's "reasonable efforts" artifact under Rule 1.6.

VIII. Conclusion

Policy-based protections address only a subset of confidentiality risks. Training data extraction has been demonstrated on production systems. Policy changes have eliminated protections users relied on. Active exploitation occurs through vectors providers cannot control. Heppner and Wang are recent evidence of the structural problem; the structural problem is independent of any specific ruling.

For attorneys handling sensitive matters, the question is not whether to implement technical safeguards, but whether they can professionally justify not doing so.

Local-first masking, processing content on the attorney's own computer before any transmission to cloud services, provides a technical mechanism to satisfy confidentiality obligations regardless of AI provider policies, practices, or vulnerabilities. When confidential identifiers never leave the firm's control, the structural risks identified in this whitepaper become irrelevant to client confidentiality. The Compliance Receipt makes that "reasonable efforts" case tangible for the firm, the regulator, the carrier, and the court.

SafeIdea incorporates patent-pending masking technology that implements these capabilities, enabling attorneys to use AI effectively while maintaining the confidentiality protections their clients expect and their professional obligations require.

IX. Citations and references

Peer-reviewed research

Nasr, M., Carlini, N., et al. (2025). Scalable Extraction of Training Data from (Production) Language Models. ICLR 2025.
Ahmed, A., Cooper, A. F., Koyejo, S., & Liang, P. (2026). Extracting books from production language models. arXiv:2601.02671.

Provider policy changes

Anthropic. (2025, August 28). Updates to Consumer Terms and Privacy Policy.
Coldewey, D. (2025, August 28). Anthropic users face a new choice, opt out or share your chats for AI training. TechCrunch.

Security incidents

Radware. (2026, January 8). ZombieAgent: A Newly Discovered Zero-Click AI Agent Vulnerability.
OX Security. (2026, January). Malicious Chrome extensions exfiltrate AI conversations.

Case law and regulatory

United States v. Heppner, S.D.N.Y., Judge Jed Rakoff (February 10, 2026).
In re OpenAI, S.D.N.Y., Magistrate Judge Ona T. Wang (preservation order May 13, 2025; production order November 7, 2025; affirmed by District Judge Sidney H. Stein, January 5, 2026).
ABA Formal Opinion 512 (July 29, 2024). Generative Artificial Intelligence Tools.
ABA Model Rules of Professional Conduct, Rule 1.6 (Confidentiality of Information) and Rule 1.1 (Competence).

Compliance infrastructure for AI in legal practice.