Training data in patent specifications – the direction of travel

Bulletin 03 October 2023

With the recent expansion in artificial intelligence (AI) and machine learning technologies, and an associated rise in the number of AI related patent applications, there has been increased interest in the best practice when preparing such an application.

One of the requirements of patent systems around the world is that the invention must be sufficiently disclosed in its description such that a skilled person can put the invention into effect. This condition forms the inventor’s half of the patent bargain in which an inventor fully discloses his invention to the state in exchange for a legal monopoly on using that invention for a limited time. This requirement, typically referred to as sufficiency, is codified in Art. 83 EPC for European patents and S.14(3) of the Patents Act 1977 in the UK.

The requirement of sufficiency leads to a specific question regarding AI technology. For many types of AI algorithms to be effective (for example neural networks), they must be trained on training data to ‘learn’ correct weights and parameters for the algorithm to give accurate results. Even if a patent application describes an AI algorithm in intricate detail, if the description does not disclose the training data which the algorithm is trained on, could a skilled person put the invention into effect? In other words, without adequately disclosing the training data an AI invention uses, can a patent fulfil the requirement of sufficiency? As a result of this in case T 0161/18, before the EPO boards of appeal, the examination division’s decision to refuse an AI patent application because the training data was not disclosed in enough detail was upheld (it is interesting to note that a corresponding US application did not suffer the same fate).

There is accordingly clearly a need to disclose the training data used in an AI patent application, with enough detail that a skilled person can put the invention into effect. The exact level of detail that is required to do so, however, has not yet been clarified. This is not such an issue when the algorithm is trained on a publicly available dataset, in which case an explicit reference to the dataset should be enough. If the data is not available to the public -for example if it is private data carefully collected by the proprietor – is the invention be disclosed sufficiently?

One option that has been floated is to create a data deposit system so that these private datasets can be made available as part of the application. There has been a similar deposit system in place since the Budapest Treaty in 1980, for biological material which could not be adequately described in an application, and so the legal framework for such a system should not be too hard to implement. The owners of large, valuable, private datasets may however be loath to make them publicly available. From their point of view, it may be commercially beneficial to keep the data as a trade secret, and this may discourage AI inventors from pursuing patents.

An alternative is to require that the application describe the nature of the dataset in detail, such that the skilled person could assemble their own dataset and then put the algorithm into effect. This may need to include all the categories of information in the dataset, and the size of the sample. Whether this would be enough when the inventors had access to a specialised private dataset which required decades to compile is yet to be clarified, though it seems like this should not be an issue for sufficiency.

In conclusion, in order for an AI patent application to pass the test of sufficiency, it should be considered whether the training data used to train the algorithm is sufficiently disclosed that a skilled person can put the invention into effect. This should be easy where a publicly available dataset is used. Even if the data used for the invention is private and confidential, describing the nature of the data such that a skilled person can assemble their own dataset may be enough. In any event, a rise in open-source data for AI technology, which can be easily referenced, may go some way to solving this issue before it is fully considered by patent systems.

Training data in patent specifications – the direction of travel

Relevant sectors

Artificial Intelligence and Machine Learning

Relevant sectors