Reporting standard
Data Cleaning Process
This is how we turn messy imported competition data into a report that reads cleanly, groups correctly and stays consistent across future imports.
Principle
Accuracy before presentation
We do not publish raw imported labels as-is when they create duplicates or inconsistent grouping. The goal is a report that reads like a single cleaned record set, not a stack of separate uploads.
Principle
Permanent decisions
When we merge a name or map an event, that decision is saved so future imports follow the same standard automatically. That keeps the report stable over time instead of drifting as new files arrive.
Principle
Human review where it matters
We still let a human confirm the final mapping when the source data is ambiguous. Cleaning is a judgement step, not just a string replacement step.
Athlete Name Merging
1. Detect duplicates and near-duplicates
We compare athlete names across imported rows, look for spelling variants, and check whether the same athlete appears under multiple slugs or slightly different name formats. This catches common issues like punctuation differences, spacing differences and transliteration variations.
2. Merge to one canonical athlete
When two records clearly represent the same athlete, we merge them into one profile so their results, event history and published summary sit together. This prevents a single athlete from being split across two public pages and undercounted in reports.
3. Preserve the useful history
We keep the original imported result rows and profile identifiers linked to the final canonical athlete. That means the cleaning process does not erase evidence of where the data came from; it just makes the reporting layer consistent.
Club Name Merging
1. Standardise club naming
Club names often arrive with extra prefixes, suffixes, team labels or uppercase noise. We normalise these labels so one club is not split into several public entries just because different imports wrote the name slightly differently.
2. Remove team suffix noise
Many data sources append markers such as Team A, Team B, Team 1, Team 2, or similar variants. We strip those suffixes when they are only identifying sub-teams and not the actual club name, so the report groups everyone under the correct organisation.
3. Merge organisations carefully
When two organisation labels are clearly the same real-world club, we merge them into a single representation. This helps the directory, athlete profile summaries and competition reports all point to the same place.
Event Name Mapping
1. Identify non-English event labels
Imported competition files may contain Chinese event names, mixed-language labels or inconsistent romanisation. We surface those entries in the Data Management page so they are easy to review before they become public-facing labels.
2. Map to one English label
Each source label is mapped to one permanent English event name. Once saved, every matching result row is updated so the event appears as one canonical category in the report rather than many separate versions of the same event.
3. Reuse the mapping on future imports
The mapping is stored permanently. If the same source label appears in a future file, we apply the saved English name automatically so the same cleanup decision does not need to be made twice.
Why permanent mapping matters
If a source label is mapped once and then left to drift, future reports can split the same event into multiple buckets. Permanent mapping removes that drift. The cleaned English label becomes the single public category used in reports, export files and future imports.
That matters especially when the same competition is imported more than once, when a file is corrected later, or when different source documents use slightly different Chinese wording for the same event.
Our cleaning workflow
Import the source file and review the raw names that came in.
Normalize obvious formatting issues such as spacing, suffix noise and inconsistent casing.
Merge athlete and club duplicates when the records clearly refer to the same entity.
Map non-English event labels to a permanent English name and save the mapping.
Re-run the cleaned results through the report views so totals and categories are consistent.
Check the final output for duplicate-looking rows, mismatched labels or unexpectedly split categories.
Reporting result
Clean labels, consistent totals, easier reports.
Once athlete names, club names and event names are cleaned, the report reads like a single coherent dataset. That makes the public directory easier to trust and the internal data easier to maintain.
Duplicate athletes collapse into one visible record.
Club labels become stable across imports and reports.
Event categories stay in one permanent English form.