Importing annotations from Excel
In many regression problems, large number of annotated objects is needed to build good models. perClass Mira provides import of point annotations from Excel.
To demonstrate, we will use a different data set with leaves. The goal is to estimate dry-matter-content (DMC) per leaf.
We have added several scans to perClass Mira project and build a pixel classifier separating leaf from background. The "leaf" class is flagged as foreground.
The ground-truth annotation is available in a separate Excel file:
To import object annotation from this Excel file, we need to match scan name to dry-matter-content (DMC) value.
We will select Import point meta-data from Regression menu. A dialog will appear:
We have pressed Load Excel file button and selected the file containing meta-data.
Now we need to specify the first cell of a column containing scan names and numerical values.
In our example, it will be E2 for scan names and H2 for numerical values, respectively. We can then click on Refresh button. scan-value pairs will load in the meta-data pannel on the left.
Now we need to match scan names of selected images (single leaf_dry_matter/n003b image in our example) to the meta-data values.
In a simple situation, where the file name exactly matches the field in the Excel file, we can keep the default Exact image name option and press Match to meta-data button.
Often, the scan names contain additional characters: For example, we have the enclosing directory name ('leaf_dry_matter/') before the scan name and also extra letter after.
In perClass Mira, we can use regular expression to extract the scan name only. Regular expression is a text pattern decribing how to match or extract substring from a string.
If you are not familiar with regular expressions, simply update your Excel file to list exact scan names. TIP: If you set the top-level data directory exactly above the scans, you will not see the enclosing directory as a part of the scan name ("leaf_dry_matter" in our example). This will simplify the task.
In our situation, we use a regular expression: (\w\d\d\d)
This means: There is a letter (\w) followed by three digits (\d). The parentheses are defining a "capture" i.e. part of the string returned. We need the parentheses to capture the scan name without the trailing letter.
To make sure this pattern does not match anywhere earlier, we could also use: leaf_dry_matter/(\w\d\d\d)
We click on Match to meta-data button and see that our selected image is matched to 0.121016:
Now we can press OK. By default, a point annotation will be created inside the largest object in the scan. We can also annotate all objects (larger than minimum object size).
Note, that the existing point annotations are not removed automatically. If needed, you can remove point annotations from the images prior to import, as described here.
Naturally, the import dialog can process multiple scans at once. To do that, just select desired scans in the image list.
The import dialog remembers the settings in one perClass Mira session. So you may just return to it later and annotate several more images from the same Excel file with one click.