AI Training Data

The EU Artificial Intelligence Act (AIA) introduces four definitions related to data. Three of the definitions are typical for what one would expect for segregating data:

  1.  Training Data: data used for training an AI system through fitting its learnable parameters [Article 3 (29) AIA]
  2. Validation Data: data used for providing an evaluation of the trained AI system and for tuning its non-learnable parameters and its learning process (e.g., in order to prevent underfitting or overfitting). [Article 3 (30) AIA]
  3. Testing Data: data used for providing an independent evaluation of the AI system in order to confirm the expected performance of that system before its placing on the market or putting into service. [Article 3 (32) AIA]

The fourth group, Validation Data Set simply means a separate data set or part of the training data set, either as a fixed or variable split, but is not a distinct type.

The AIA has applicability beyond medical devices but medical device manufacturers marketing in the EU are considering the requirements of the AIA. For example, the naming is slightly different than FDA used in the 2025 Predetermined Change Control Plan for Artificial Intelligence-Enabled Device Software Functions.  FDA noticeably avoided the word “validation” as it is most often associated with “Design Validation” control processes.

  1. Training Data: These data are used by the manufacturer of an AI-DSF in procedures and training algorithms to build an AI model, including to define model weights, connections, and components. These data typically should be representative of the proposed intended use populations (e.g., with respect to race, ethnicity, disease severity, sex, age, or others, as appropriate) and intended environments.
  2. Tuning Data: These data are typically used by the manufacturer of an AI-DSF to evaluate a small number of trained AI models. This process involves exploring various aspects, including different architectures or hyperparameters. The tuning phase happens before the testing phase of the AI-DSF and is part of the training process. While the AI and ML communities sometimes use the term “validation” to refer to the tuning data and phase, FDA does not use the word “validation” in this context.
  3. Test Data: These data are used to characterize the performance of an AI-DSF. These data are never shown to the algorithm during training and are used to estimate the AI model’s performance after training. Testing is conducted to generate evidence to establish the performance of an AI-DSF before it is deployed or marketed. The testing phase is also expected to provide evidence to demonstrate a reasonable assurance of safety and effectiveness of an AI-DSF before it is deployed or marketed. These data typically should be representative of the proposed intended use populations (e.g., with respect to race, ethnicity, disease severity, sex, age, or others, as appropriate) and intended environments. Test data should be independent of data used for training and tuning and should generally be from multiple sites different from those that were used to generate training and tuning data.

For those that have been in the industry for more than two decades, we might have preferred group 2 naming of “tuning data,” but group 3 is more appropriately named “validation data.”

About the author

Partner and General Manager, Brian Pate is ISO 1385:2016 Lead Auditor certified for Medical Device Quality Management Systems (MD), and ISO 19011:2018 Management Systems Auditing (AU) and Leading Management Systems Audit Teams (TL). Brian started his medical device career in anesthesia clinical research in 1985 and has since worked both academia and industry including many years with Johnson & Johnson, Baxter Healthcare, and GE Medical. Brian’s roles have included software engineering, systems engineering, quality assurance, and regulatory affairs. Brian has served on multiple AAMI TIR working groups, including TIR32-2008 (Application of ISO 14971 Risk Management to Software; now IEC 80002-1) and TIR45-2012 (Guidance on the use of Agile practices in the development of medical device software) and served as a reviewer for the 2nd edition of TIR45. Brian serves on the AAMI Software Committee and as an AAMI instructor for the software, design controls, and agile methods courses. Brian also is a member of the Underwriters’ Laboratories (UL) Standards Technical Panel for UL1998 (Software in Programmable Components) and or UL5500 (Remote Software Updates).

SoftwareCPR Training Courses

ISO13485:2016 ISO 13485 Internal Audit(or) Training Course (Live, 3-day)

IEC 62304 and other Emerging Standards Impacting Medical Device Software (Live, 3-day)

Being Agile & Yet CompliantISO 14971 SaMD Risk Management

Software Risk Management

Medical Device Cybersecurity

Software Verification

IEC 62366 Usability Process and Documentation

Or just email training@softwarecpr.com for more info.

Corporate Office

15148 Springview St.
Tampa, FL 33624
USA
+1-781-721-2921
Partners located in the US (CA, FL, MA, MN, TX) and Canada.