DAGECC

ICPR 2024 Competition on Domain Adaptation and GEneralization for Character Classification

News

  • July 4th: Number of submissions allowed per day increased to 6
  • July 4th: EasyOCR library is now available in the Docker image. Requirements.txt have been updated accordingly.
  • July 2nd: Extended deadline to August 4th for closing submissions on validation data! (see Important Dates section)
  • June 28th: Participants can now select their submission to be shown in the leaderboard (see FAQ section)
  • June 14th: FAQ section added to the website
  • June 10th: Submissions are now open on Codabench!
    Domain Generalization [here]
    Domain Adaptation [here]

Welcome to the ICPR 2024 Competition on Domain Adaptation and GEneralization for Character Classification (DAGECC)!

This competition focuses on fostering the advancement in the fascinating fields of domain adaptation and domain generalization applied to serial number character recognition.

Serial number recognition holds significant importance in various industries, predominantly in quality control, tracking and tracing of parts, and inventory management. However, executing successful serial number recognition can pose substantial challenges due to the varied surfaces on which the numbers are inscribed. Numbers may be etched, painted, or stamped onto materials ranging from shiny metals, transparent glasses, to rough-textured composites.

Domain adaptation and domain generalization come into play as essential tools when addressing the multifaceted problem of character recognition under varying conditions.

Domain adaptation can substantially enhance the accuracy of character recognition in target domains by leveraging the knowledge acquired from related source domains. In an industrial context, this could mean building robust models that have been trained on a set of images of numbers captured under specific conditions, and adapting those models to recognize serial numbers under different conditions. This technique reduces the need for extensive and time-consuming data collection under every potential condition, providing efficient and scalable solutions.

Domain generalization seeks to develop models that are capable of generalizing and performing well across multiple unseen domains, ultimately improving the model's adaptability. In the realm of character recognition, this potential translates into the creation of models that are capable of recognizing serial numbers on varied surfaces, in disparate lighting conditions, or from different viewing angles, all the while maintaining a consistent level of accuracy.

The Datasets

To address this challenge, we are introducing two new datasets: Safran-MNIST-D and Safran-MNIST-DLS. Both datasets comprise images of serial numbers extracted from diverse avionic parts manufactured by SAFRAN, the international high-technology group and world leader operating in the aviation (propulsion, equipment and interiors), defense and space markets. These datasets resemble the well-known MNIST dataset, but with a focus to industrial contexts, encompassing variations in lighting conditions, orientations, writing styles and surface textures.

The ultimate goal is to inspire innovative solutions that can robustly adapt to new domains and generalize well to unseen target domains, thereby pushing the boundaries of current state-of-the-art techniques in these critical research areas. The competition’s dataset serves as a valuable resource for the computer vision community, facilitating progress in developing more adaptable and universally applicable models for character recognition and beyond.

We warmly invite both academic and industry professionals to participate in this competition and offer their valuable expertise and perspectives.

Competition Tracks

The proposed competition encompasses two primary tracks: Domain generalization and Unsupervised Domain Adaptation.

Track 1: Domain Generalization

The aim of this track is to develop models that can generalize well to an unseen target domain, without requiring access to any target domain data during training.

Source Data:

Participants are granted the freedom to use any publicly available data as of April 26, 2024 or generated data from various source domains. Nonetheless, we provide a noncomprehensive list of datasets representing letters, digits and symbols, which participants may utilize, namely: MNIST, MNIST-M, SVHN, HASYv2, Synthetic Digits, EMNIST (Extended MNIST), CROHME. Proprietary data is not allowed.

Target Data:

Participants are not allowed to use any data from the target domain for training, which will be the new Safran-MNIST-D dataset that contains images of numbers ranging from 0 to 9.

Track 2: Unsupervised Domain Adaptation

This task is focused on unsupervised domain adaptation methods, in which we provide unlabeled data from a target domain: the new Safran- MNIST-DLS dataset, which comprises images of 32 classes depicting numbers, alphabetic characters, and symbols, namely: [0, 1, 2, 3, 4, 5, 5, 6, 7, 8, 9, A, B, C, D, E, F, G, H, J, K, L, M, N, P, R, S, T, U, W, Y, /, .].

Source Data:

Participants are granted the freedom to use any publicly available data as of April 26, 2024 or generated data from various source domains. Nonetheless, we provide a noncomprehensive list of datasets representing letters, digits and symbols, which participants may utilize, namely: MNIST, MNIST-M, SVHN, HASYv2, Synthetic Digits, EMNIST (Extended MNIST), CROHME. Proprietary data is not allowed.

Target Data:

Participants have access to the unlabeled target data of Safran-MNIST-DLS during training phase.

Rules

Participants:

  • Participants from both academic and industrial institutions are welcome to participate in this challenge. The list of team members must be provided during registration, and cannot be changed throughout the competition. Each individual can only participate in one team and should provide the institution / corporate email at registration.
  • The maximum team size allowed is 5 people.
  • Researchers belonging to the institutes of the organizers are not allowed to participate to avoid potential conflict of interest.
  • This challenge features two tracks, and participants can decide if they participate in one of them or both.

Dataset:

  • For both tracks, participants may use any publicly available and appropriately licensed data to pretrain their models. Each participant must ensure that their use of any data in connection with this competition complies with applicable law and all other applicable legal requirements. Furthermore, any public pre-trained models can be used, such as CLIP, OpenCLIP, or models pre-trained on ImageNet, etc.
  • It is forbidden to use data from one track of the competition for the other track (e.g. using the unlabeled target data of Safran-MNIST-DLS for Track 1).
  • For Track 2 - Unsupervised Domain Adaptation, models can be adapted using the unsupervised dataset from the target domain without access to the labels. Solutions that involve manual labeling of the target domain will be disqualified.
  • Safran-MNIST-D and Safran-MNIST-DLS datasets along with associated ground-truth will be released at the end of the competition.

Evaluation:

  • For both tracks, the well-known macro average F1-score evaluated on the unseen test set will be the metric use to rank submissions.

Submission:

  • Submissions can be done in Codabench following the structure reported in the starting kit of each competition. Submissions will be executed on a dedicated docker.
  • Each team can make up to 3 6 submissions per day per task in the development phase. Using multiple accounts to increase the number of submissions is strictly prohibited.
  • To avoid overfitting the testing set, we only offer one successful submission opportunity on the testing set.
  • Any attempt to recover input data from the validation and test sets using the prediction code that is solely intended to run the model is a reason for exclusion from the competition.

Prizes:

  • Cash prizes and selected SAFRAN goodies will be awarded to the top 3 teams of each track.
  • An award certificate will be provided to the top 3 teams of each track.

Important Dates

Release Training Data Apr 30th, 2024
Open submissions (on validation data) May 27th, 2024 June 10th, 2024
Close submissions (on validation data) Jul 21st, 2024 August 4th, 2024
Winners announcement Jul 31st, 2024 August 8th, 2024
Report submission deadline (optional) Aug 18th, 2024

How to participate

To participate in this competition, you must follow these steps:

  1. Register you team [here]
  2. Create an account on Codabench if you do not have one already
  3. Register to the competition on Codabench [here] and/or [here]
  4. Download the Safran-MNIST-DLS training data for Track 2 [here]

FAQ

  • "Participants are granted the freedom to use any publicly available data as of April 26, 2024 or generated data from various source domains": does that mean that participants are allowed to use Generative AI to create a dataset to train their model?
    Yes, participants are allowed to use Generative AI to create a dataset, provided that your generative model has been trained only with public data. This means that, for example, participants are allowed to use images generated by Stable Diffusion (trained on the open-source LAION dataset) but they are not allowed to use images generated by Midjourney or DALL-E (trained on proprietary data).

  • Are participants allowed to use a model that was pre-trained with images coming from Generative AI?
    Yes, participants are allowed to use a model that was pretrained with images coming from Generative AI, provided that your generative model has been trained only with public data.

  • Submissions with score n/a appear in the leaderboard as my best result, but I have done submissions with a higher valid score which do not appear.
    We indeed notice a bug in Codabench for which submissions that scored n/a are reported in the leaderboard. We were at first deleting those submissions to avoid having n/a values in the leaderboard. However, we think it may be easier to allow participants to manually select the submission to be shown in the leaderboard. This option is thus now available. In the meantime, we are in contact with Codabench to try to solve this issue. We apologize for the inconvenience.

  • Submissions sometimes remain stuck in 'Submitted' or 'Scoring' state for more than a day, without get a final score. These submissions count in the daily quota. Is it possible to increase the quota to bypass the problem temporarily?
    We have indeed identified an issue where some submissions occasionally remain in 'Submitted' or 'Scoring' state. We invite the participants to delete those submissions and re-run them. Since some participants reported that the problem appears quite frequently, we increased the number of daily submissions to 6. We apologize for the inconvenience.

Organisers

Frederic Jurie
Frederic Jurie
Professor
@University of Caen
Emanuel Aldea
Emanuel Aldea
Associate Professor
@University Paris-Saclay
Sylvie Le-Hegarat Mascle
Sylvie Le-Hegarat Mascle
Professor
@University Paris-Saclay
Jennifer Vandoni
Jennifer Vandoni
Research Scientist
@SafranTech, Paris
Sofia Marino
Sofia Marino
Research Scientist
@SafranTech, Paris
Ichraq Lemghari
Ichraq Lemghari
Ph.D. Student
@University Paris-Saclay

Contact Us

Send us an email to: dagecc.icpr24 [at] gmail.com

Acknowledgments

A special thank to SAFRAN, especially Basile MUSQUER (SAFRAN Aircraft Engines) and Thierry ARSAUT (SAFRAN Helicopter Engines) for participating in the acquisition of the images, the creation of the dataset and for allowing us to publish the data.