Jan 29, 2026
Article by
The practical application of Section 17(2)(b) of the DPDPA is defined by a strict boundary between utility and individual impact. While the Act grants fiduciaries the flexibility to process data for research, archiving, and statistical analysis without the standard consent hurdles, this privilege is entirely contingent on a non-decisional framework. This creates a technical challenge: fiduciaries must prove that insights derived from research never loop back to influence a specific data principal’s status, such as their insurance premiums or creditworthiness. Consequently, compliance is shifting from static legal policies toward Automated Governance and Policy-as-Code, ensuring that the procedural mandates of Rule 14 are framed into the data architecture itself.
This requirement is reinforced by Section 8(4), which mandates that fiduciaries implement appropriate technical and organisational measures to ensure effective compliance. In high-volume environments, this essentially necessitates a Privacy by Design approach. These frameworks translate complex regulatory text like the procedural mandates of Rule 14 of the DPDP Rules, 2025, directly into machine-readable logic. By baking rights-handling (such as identity verification via identifiers like Enrollment IDs or Customer IDs) into the data architecture itself, fiduciaries can maintain the integrity of research exemptions while ensuring that the Right to Erasure or Right to Access remains actionable without manual intervention.
Understanding Section 17(2)(b)
Section 17 The DPDPA provides specific exemptions from certain provisions of the Act, such as notice and consent requirements. Section 17(2)(b) specifically addresses the processing of personal data necessary for research, archiving, or statistical purposes. This exemption is fundamental to the continued viability of longitudinal health studies, historical record-keeping, and socio-economic analysis.
The legal validity of Section 17(2)(b) exemption is bifurcated into `two mandatory conditions. First, personal data must not be used to make any decision specific to a data principal. This requirement is the primary safeguard preventing the repurposing of research data for individualised actions such as credit profiling or medical insurance adjustments. Second, the processing must be carried out in accordance with the standards specified in the Second Schedule of the DPDP Rules.
Rule 14 and the Right of Data Principals
While Section 17(2)(b) enables aggregate data use, Rule 14 empowers individuals to retain control over their digital footprint. Rule 14 provides the procedural mechanics for exercising the rights granted under Chapter III of the DPDPA, including the right to access information, the right to correction, the right to erasure, and the right of grievance redressal.
A core requirement of Rule 14(1) is transparency; data fiduciaries must prominently publish on their website or app the specific means by which a principal may make a rights request. This includes specifying the identifiers required to verify the principal’s identity under the terms of service.
To fulfill rights requests efficiently, especially in high-frequency data environments, fiduciaries must automate the identification process. Rule 14(5) provides a broad definition of identifier, which systems must be configured to recognise.
Identifier Type | Technical Representation and Verification |
Account Metadata | Username, email address, or mobile number registered with the fiduciary. |
Customer Identification | Unique customer IDs, account numbers, or reference numbers. |
Regulatory Tokens | Enrollment IDs or license numbers issued by the state or the fiduciary. |
Automated Controls and Policy-as-Code
The manual management of these legal nuances is impossible at scale, especially for platforms handling millions of data points. This has led to the emergence of Policy-as-Code (PaC) as a foundational methodology for embedding compliance directly into software systems. PaC involves translating written regulatory documents such as the Second Schedule and Rule 14 requirements into machine-readable instructions.
Policy-as-Code Implementation
A high-performance compliance framework for managing Section 17(2)(b) and Rule 14 typically consists of three integrated layers.
The Policy Definition: This utilises domain-specific languages (DSLs), such as Open Policy Agent’s Rego or Python-based policy engines, to represent regulatory demands. For instance, a policy can be written to allow access to a dataset only if the purpose of tag is set to research, and the decisional flag is false.
The Enforcement: This acts as a runtime interceptor, evaluating data access requests against the defined policies. By decoupling the policy intention (the what) from the enforcement mechanism (the how), organisations can maintain consistent governance across multi-cloud environments.
The Attestation and Audit: To meet the accountability requirements of Rule 14(3) and the 72-hour breach reporting window in Rule 7, every enforcement decision must be logged. These logs provide the verifiable audit trail required by the Data Protection Board (DPB).
Organisations adopting this approach reported identifying 35-55% more relevant requirements through structured cataloguing than through static compliance solutions. This indicates that automation not only improves speed but also the depth of compliance.
Automated Classification and the Search for Personally Identifiable Information
Safe configuration of research exemptions requires precise identification of personally identifiable information (PII) within vast datasets. In an era of Big Data and AI, manual tagging is no longer feasible. Fiduciaries are increasingly deploying automated classification engines that use natural language processing (NLP) to flag sensitive documents before they enter a research archive or an AI training pipeline.
NLP-Driven Segment and Sentence Classification
Research has shown that users and regulators often find privacy policies and data structures complex to comprehend. Automated classification addresses this by:
Segment Classification: Assigning category labels to paragraphs or data segments to provide a high-level overview of the sensitivity level.
Sentence Classification: Using advanced models like BERT and XLNet to extract critical, actionable information from unstructured data files.
Active data Tagging: Propagating confidentiality and integrity ratings down to the column level through real-time lineage.34
This automated discovery is a non-negotiable control for audit-ready transparency, particularly in the financial and healthcare sectors where research datasets must exclude embargoed or sensitive content. Labeling data as confidential, sensitive, or public, automated systems can apply the appropriate Second Schedule safeguards without human intervention.
How Organisations Are Handling it
The practical utility of these automated compliance systems is best illustrated through their application across diverse, data-intensive industries. The following use cases demonstrate how organizations can scale their operations while strictly adhering to the non-decisional and transparency mandates of the DPDP Act.
AI Model Training and the Research Exemption
The intersection of AI development and the Section 17(2)(b) research exemption is a focal point of intense legal debate. AI companies often claim that the processing of personal data to train models falls under research or statistical purposes, thus exempting them from consent requirements.
The Tension Between Innovation and Privacy
If personal data is included in the corpus used to train a foundational model, its influence manifests as statistical impressions diffused throughout the architecture. Because these patterns may still carry pieces of the original information, legal scholars argue that they are not truly anonymous.
Challenge in AI Training | Regulatory and Technical Mitigation |
Inseparability of Data | There is often no practical mechanism to excise an individual's data without complete retraining. |
Re-identification Risk | Patterns in AI models could be used to pull out information about real people indirectly. |
Decisional Impact | If a model is used for creditworthiness or hiring, the research exemption is voided. |
Right to Erasure | Regulators may apply a proportionality test where alternative safeguards are accepted if erasure cripples innovation. |
To maintain compliance, AI startups must adopt Privacy by Design principles, testing models for bias and unintended harms before release. Techniques like differential privacy adding random noise to datasets can make it almost impossible to figure out any one person’s information from the training results, thus supporting the Section 17(2)(b) non-decisional requirement.
Public Health Research and Epidemic Modelling
Hospitals and medical institutions frequently share patient datasets for public health initiatives and research into disease patterns. To maintain the Section 17(2)(b) exemption, these fiduciaries must implement automated "scrubbing" pipelines that remove explicit identifiers such as names and health IDs before the data enters the research archive.
Automated controls ensure the processing remains strictly non-decisional; for instance, the system can be configured to block any request that attempts to use research insights to deny insurance coverage or adjust treatment costs for specific individuals. Under Rule 14 , if a patient requests the erasure of their records, automated lineage tools can identify whether that data has already been irreversibly anonymised for research (rendering it exempt) or if it remains in an identifiable format requiring immediate deletion.
National Macroeconomic Policy and Official Statistics
The Ministry of Statistics and Programme Implementation (MoSPI) manages massive volumes of data used to calculate social and economic indicators like GDP. To comply with the DPDPA’s stringent standards, statistical offices deploy automated checking tools, such as the Argus software suite for Statistical Disclosure Control (SDC), which perform real-time confidentiality checks on researcher queries.
These systems automatically detect and block routines that could lead to statistical disclosure, where an individual's financial information might be inferred from a complex table. This automated gatekeeping allows researchers to access Unit Level Data for policy analysis while ensuring that the Right to Access under Rule 14 does not compromise the security of the broader national dataset.
Conclusion
Ultimately, the successful implementation of the DPDPA rests on the ability of fiduciaries to harmonise the technical with the legal. The case studies demonstrate that automation through Policy-as-Code and NLP-driven classification is no longer optional; it is the primary mechanism for maintaining trust at scale. By strictly isolating research datasets from individual decision-making and providing clear, automated pathways for citizens to exercise their rights, organisations can move past the fear of regulatory friction. As these frameworks mature, the research exemption will likely become the gold standard for how a modern society uses its collective information to drive progress without ever compromising the dignity or privacy of the individual.






