icon-symbol-logout-darkest-grey

Neural Patent Classification beyond Title and Abstract: Leveraging Patent Text and Metadata

  • Date in the past
  • Tuesday, 30. July 2024, 14:00
  • SR 1 (02.101)
    • Subhash Chandra Pujari
  • Address

    Seminar Room 1 (02.101)

  • Organizer

  • Event Type

Intellectual property violations entail significant litigation and licensing costs, emphasizing the need for efficient patent search systems. With millions of patents in existence, manual searches are impractical, necessitating automated classification techniques. The research performed in this thesis enhances Cooperative Patent Classification (CPC) and International Patent Classification (IPC) systems, facilitating better application routing and prior art searches in patent examination offices. It also addresses classification in the context of Patent Landscape Study (PLS), enabling organizations to categorize and analyze patents for valuable insights.

Key contributions of the thesis include the release of a comprehensive CPC classification dataset with full patent texts to overcome limitations of existing datasets. Additionally, three open-source datasets are curated and released, enabling PLS automation. Addressing the hierarchical multi-label nature of CPC/IPC classification, which involves hundreds of labels, a memory-efficient model architecture is developed. This architecture employs a single transformer-based language model for multiple classification heads, significantly improving performance, particularly for infrequent labels.

Furthermore, we propose a novel document representation technique that combines truncated section text embeddings using vector summation, which outperforms existing methods. For PLS, we enrich the representation by combining CPC/IPC labels with patent texts to predict PLS-oriented categories. The versatility and effectiveness of the proposed techniques are demonstrated by applying them to the task of classifying research publications.