Score an entire competition (or a whole AOI) using cw-eval

This recipe describes how to run evaluation of a proposal CSV for an entire competition against a ground truth CSV.

Things to understand before starting

When we score entire competitions, we want to ensure that competitors provide submissions for the entire area of interest (AOI), not just the subset that competitors provide scores for, in case they leave out chips that they can’t predict well. Therefore, proposal files scored using this pipeline should contain predictions for every chip in the ground truth CSV. The score outputs also provide chip-by-chip results which can be used to remove non-predicted chips if needed.

When CosmiQ Works runs competitions in partnership with TopCoder, we set some cutoffs for scoring buildings:

  • An IoU score of > 0.5 is required to ID a building as correctly identified.
  • Ground truth buildings fewer than 20 pixels in extent are ignored. However, it is up to competitors to filter out their own small footprint predictions.

Imports

For this test case we will only need cw_eval installed - Installation instructions for cw_eval

[1]:
# imports
import os
import cw_eval
from cw_eval.challenge_eval.off_nadir_dataset import eval_off_nadir  # runs eval
from cw_eval.data import data_dir  # get the path to the sample eval data
import pandas as pd  # just for visualizing the outputs in this recipe


Ground truth CSV format

The following shows a sample ground truth CSV and the elements it must contain.

[2]:
ground_truth_path = os.path.join(data_dir, 'sample_truth_competition.csv')

pd.read_csv(ground_truth_path).head(10)
[2]:
ImageId BuildingId PolygonWKT_Pix PolygonWKT_Geo
0 Atlanta_nadir8_catid_10300100023BC100_743501_3... 0 POLYGON ((476.88 884.61, 485.59 877.64, 490.50... 1
1 Atlanta_nadir8_catid_10300100023BC100_743501_3... 1 POLYGON ((459.45 858.97, 467.41 853.09, 463.37... 1
2 Atlanta_nadir8_catid_10300100023BC100_743501_3... 2 POLYGON ((407.34 754.17, 434.90 780.55, 420.27... 1
3 Atlanta_nadir8_catid_10300100023BC100_743501_3... 3 POLYGON ((311.00 760.22, 318.38 746.78, 341.02... 1
4 Atlanta_nadir8_catid_10300100023BC100_743501_3... 4 POLYGON ((490.49 742.67, 509.81 731.14, 534.12... 1
5 Atlanta_nadir8_catid_10300100023BC100_743501_3... 5 POLYGON ((319.28 723.07, 339.97 698.22, 354.29... 1
6 Atlanta_nadir8_catid_10300100023BC100_743501_3... 6 POLYGON ((466.49 709.69, 484.26 696.45, 502.59... 1
7 Atlanta_nadir8_catid_10300100023BC100_743501_3... 7 POLYGON ((433.84 673.34, 443.90 663.96, 448.70... 1
8 Atlanta_nadir8_catid_10300100023BC100_743501_3... 8 POLYGON ((459.24 649.03, 467.38 641.90, 472.84... 1
9 Atlanta_nadir8_catid_10300100023BC100_743501_3... 9 POLYGON ((403.55 643.50, 416.98 630.51, 440.36... 1

Important points about the CSV format:

  • The column denoting the chip ID for a given geospatial location must be titled ImageId.
  • The column containing geometries must be in WKT format and should be titled PolygonWKT_Pix.
  • The BuildingId column provides a numeric identifier sequentially numbering each building within each chip. Order doesn’t matter.
  • For chips with no buildings, a single row should be provided with BuildingID=-1 and PolygonWKT_Pix="POLYGON EMPTY".

Proposal CSV format

[3]:
proposals_path = os.path.join(data_dir, 'sample_preds_competition.csv')
pd.read_csv(proposals_path).head(10)
[3]:
ImageId BuildingId PolygonWKT_Pix Confidence
0 Atlanta_nadir8_catid_10300100023BC100_743501_3... 0 POLYGON ((0.00 712.83, 158.37 710.28, 160.59 6... 1
1 Atlanta_nadir8_catid_10300100023BC100_743501_3... 1 POLYGON ((665.82 0.00, 676.56 1.50, 591.36 603... 1
2 Atlanta_nadir8_catid_10300100023BC100_743501_3... 0 POLYGON ((182.62 324.15, 194.25 323.52, 197.97... 1
3 Atlanta_nadir8_catid_10300100023BC100_743501_3... 1 POLYGON ((92.99 96.94, 117.20 99.64, 114.72 12... 1
4 Atlanta_nadir8_catid_10300100023BC100_743501_3... 2 POLYGON ((0.82 29.96, 3.48 40.71, 2.80 51.00, ... 1
5 Atlanta_nadir8_catid_10300100023BC100_743501_3... 0 POLYGON ((476.88 884.61, 485.59 877.64, 490.50... 1
6 Atlanta_nadir8_catid_10300100023BC100_743501_3... 1 POLYGON ((459.45 858.97, 467.41 853.09, 463.37... 1
7 Atlanta_nadir8_catid_10300100023BC100_743501_3... 2 POLYGON ((407.34 754.17, 434.90 780.55, 420.27... 1
8 Atlanta_nadir8_catid_10300100023BC100_743501_3... 3 POLYGON ((311.00 760.22, 318.38 746.78, 341.02... 1
9 Atlanta_nadir8_catid_10300100023BC100_743501_3... 4 POLYGON ((490.49 742.67, 509.81 731.14, 534.12... 1

The only difference between the ground truth CSV format and the prediction CSV format is the Confidence column, which can be used to provide prediction confidence for a polygon. Alternatively, it can be set to 1 for all polygons to indicate equal confidence.


Running eval on the Off-Nadir challenge: Python API

cw-eval currently contains code for scoring proposals from the Off-Nadir Building Detection challenge. There are two ways to run scoring: using the Python API or using the CLI (see later in this recipe). The below provides an example using the Python API.

If you provide proposals and ground truth formatted as described earlier, no additional arguments are required unless you would like to alter the default scoring settings. If so, see the API docs linked above.

The scoring function provides two outputs:

  • results_DF, a summary Pandas DataFrame with scores for the entire AOI split into the nadir/off-nadir/very off-nadir bins
  • results_DF_Full, a DataFrame with chip-by-chip score outputs for detailed analysis. For large AOIs this function takes a fair amount of time to run.
[4]:
results_DF, results_DF_Full = eval_off_nadir(proposals_path, ground_truth_path)
100%|██████████| 33/33 [00:14<00:00,  2.11it/s]
[5]:
results_DF
[5]:
F1Score FalseNeg FalsePos Precision Recall TruePos
nadir-category
Nadir 1.0 0 0 1.0 1.0 2319

(This ground truth dataset only contained nadir imagery, hence the absence of the other bins)

[6]:
results_DF_Full.head(10)
[6]:
F1Score FalseNeg FalsePos Precision Recall TruePos imageID iou_field nadir-category
0 1.0 0 0 1.0 1.0 96 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
1 1.0 0 0 1.0 1.0 3 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
2 1.0 0 0 1.0 1.0 43 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
3 1.0 0 0 1.0 1.0 67 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
4 1.0 0 0 1.0 1.0 3 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
5 1.0 0 0 1.0 1.0 91 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
6 1.0 0 0 1.0 1.0 80 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
7 1.0 0 0 1.0 1.0 96 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
8 1.0 0 0 1.0 1.0 112 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir
9 1.0 0 0 1.0 1.0 78 Atlanta_nadir8_catid_10300100023BC100_743501_3... iou_score Nadir

Running eval on the Off-Nadir Challenge using the CLI

The cw-eval CLI allows competition scoring without even needing to open a Python shell. Its usage is as follows:

$ spacenet_eval --proposal_csv [proposal_csv_path] --truth_csv [truth_csv_path] --output_file [output_csv_path]

Argument details:

  • --proposal_csv, -p: Path to the proposal CSV. Required argument. See the API usage details above for CSV specifications.
  • --truth_csv, -t: Path to the ground truth CSV. Required argument. See the API usage details above for CSV specifications.
  • --output_file, -o: Path to save the output CSVs to. This script will produce two CSV outputs: [output_file].csv, which is the summary DataFrame described above, and [output_file]_full.csv, which contains the chip-by-chip scoring results.

Not implemented yet: The CLI also provides a --challenge command, which is not yet implemented, but will be available in future versions to enable scoring of other SpaceNet challenges.

Example:

[7]:
%%bash -s "$proposals_path" "$ground_truth_path" # ignore this line - magic line to run bash shell command
spacenet_eval --proposal_csv $1 --truth_csv $2 --output_file results  # argument values taken from magic line above
                F1Score  FalseNeg  FalsePos  Precision  Recall  TruePos
nadir-category
Nadir               1.0         0         0        1.0     1.0     2319
Writing summary results to result.csv
Writing full results to result_full.csv
100%|██████████| 33/33 [00:17<00:00,  1.16it/s]