2026-01-06 22:59:58 +08:00

15 KiB

Design Document

Overview

This design replaces the MongoDB 4.4+ $unionWith operator with a MongoDB 3.x compatible approach. The solution queries each collection separately, merges results in application memory, performs deduplication using LINQ, and applies sorting and pagination. This maintains functional equivalence while supporting older MongoDB versions.

Architecture

Current Architecture (MongoDB 4.4+)

┌─────────────────────────────────────────────────────────┐
│  QueryPagedDeduplicatedResultsAcrossCollectionsAsync    │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│  Single Aggregation Pipeline with $unionWith            │
│  - $match (filter)                                       │
│  - $project (fields)                                     │
│  - $unionWith (merge collections)                        │
│  - $sort                                                 │
│  - $group (deduplication)                                │
│  - $replaceRoot                                          │
│  - $sort (final)                                         │
│  - $facet (count + pagination)                           │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
                  MongoDB Server

New Architecture (MongoDB 3.x Compatible)

┌─────────────────────────────────────────────────────────┐
│  QueryPagedDeduplicatedResultsAcrossCollectionsAsync    │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│  Parallel Collection Queries                             │
│  For each collection:                                    │
│    - Find with filter                                    │
│    - Project fields                                      │
│    - Sort by sortField                                   │
│    - ToListAsync                                         │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
┌─────────────────────────────────────────────────────────┐
│  In-Memory Processing (LINQ)                             │
│  - Merge all results                                     │
│  - Sort by sortField                                     │
│  - GroupBy (Year, Month, Day, ItemId, Status)            │
│  - Select first from each group                          │
│  - Sort again (final order)                              │
│  - Count total                                           │
│  - Skip + Take (pagination)                              │
└─────────────────────────────────────────────────────────┘
                          │
                          ▼
                Return (results, count)

Components and Interfaces

Modified Method: QueryPagedDeduplicatedResultsAcrossCollectionsAsync

Signature:

private async Task<(List<SecondaryCircuitInspectionResult> Results, long TotalCount)> 
    QueryPagedDeduplicatedResultsAcrossCollectionsAsync(
        List<string> collectionNames,
        FilterDefinition<SecondaryCircuitInspectionResult> filter,
        string sortField,
        bool isDescending,
        int skipCount,
        int pageSize,
        CancellationToken cancellationToken = default)

Algorithm:

  1. Input Validation

    • Check if collectionNames is null or empty
    • Return empty results if no collections
  2. Prepare Query Components

    • Render filter to BsonDocument
    • Build projection document
    • Create sort definition
  3. Query Each Collection in Parallel

    var tasks = collectionNames.Select(async collectionName => {
        var collection = GetCollection<SecondaryCircuitInspectionResult>(collectionName);
        var query = collection.Find(filter);
    
        if (!string.IsNullOrWhiteSpace(sortField)) {
            query = isDescending 
                ? query.SortByDescending(x => GetPropertyValue(x, sortField))
                : query.SortBy(x => GetPropertyValue(x, sortField));
        }
    
        return await query
            .Project<SecondaryCircuitInspectionResult>(projection)
            .ToListAsync(cancellationToken);
    });
    
    var collectionResults = await Task.WhenAll(tasks);
    
  4. Merge Results

    var allResults = collectionResults.SelectMany(x => x).ToList();
    
  5. Sort Before Deduplication

    if (!string.IsNullOrWhiteSpace(sortField)) {
        allResults = isDescending
            ? allResults.OrderByDescending(x => GetPropertyValue(x, sortField)).ToList()
            : allResults.OrderBy(x => GetPropertyValue(x, sortField)).ToList();
    }
    
  6. Deduplication

    var deduplicatedResults = allResults
        .GroupBy(x => new {
            x.Year,
            x.Month,
            x.Day,
            x.SecondaryCircuitInspectionItemId,
            Status = x.Status ?? string.Empty
        })
        .Select(g => g.First())
        .ToList();
    
  7. Final Sort

    if (!string.IsNullOrWhiteSpace(sortField)) {
        deduplicatedResults = isDescending
            ? deduplicatedResults.OrderByDescending(x => GetPropertyValue(x, sortField)).ToList()
            : deduplicatedResults.OrderBy(x => GetPropertyValue(x, sortField)).ToList();
    }
    
  8. Count and Paginate

    var totalCount = deduplicatedResults.Count;
    var paginatedResults = deduplicatedResults
        .Skip(skipCount)
        .Take(pageSize)
        .ToList();
    
    return (paginatedResults, totalCount);
    

Helper Method: GetPropertyValue

Purpose: Dynamically retrieve property values for sorting

Signature:

private object GetPropertyValue(SecondaryCircuitInspectionResult obj, string propertyName)

Implementation:

private object GetPropertyValue(SecondaryCircuitInspectionResult obj, string propertyName)
{
    var property = typeof(SecondaryCircuitInspectionResult).GetProperty(propertyName);
    if (property == null)
    {
        Log4Helper.Warning($"Property '{propertyName}' not found on SecondaryCircuitInspectionResult");
        return null;
    }
    return property.GetValue(obj);
}

Data Models

SecondaryCircuitInspectionResult (Existing)

Key fields used in deduplication and sorting:

  • Year (int): Year component of execution time
  • Month (int): Month component of execution time
  • Day (int): Day component of execution time
  • SecondaryCircuitInspectionItemId (Guid): Inspection item identifier
  • Status (string, nullable): Inspection status
  • ExecutionTime (DateTime): When inspection was executed
  • Id (string): MongoDB document ID

Deduplication Key

Anonymous type used for grouping:

new {
    Year,
    Month,
    Day,
    SecondaryCircuitInspectionItemId,
    Status = Status ?? string.Empty  // Handle null
}

Data Models

Query Result Structure

(List<SecondaryCircuitInspectionResult> Results, long TotalCount)
  • Results: Paginated list of deduplicated inspection results
  • TotalCount: Total number of deduplicated results (before pagination)

Correctness Properties

A property is a characteristic or behavior that should hold true across all valid executions of a system—essentially, a formal statement about what the system should do. Properties serve as the bridge between human-readable specifications and machine-verifiable correctness guarantees.

Property 1: Deduplication Removes Duplicates by Key Fields

For any list of inspection results, when deduplication is applied, the output should contain at most one result for each unique combination of (Year, Month, Day, SecondaryCircuitInspectionItemId, Status).

Validates: Requirements 1.4, 3.1

Property 2: Sorting Produces Correct Order

For any list of inspection results and any valid sort field, when sorting is applied in ascending order, each result should have a sort field value less than or equal to the next result's value. When sorting in descending order, each result should have a sort field value greater than or equal to the next result's value.

Validates: Requirements 1.5

Property 3: First Record Selection After Sorting

For any group of inspection results with the same deduplication key (Year, Month, Day, SecondaryCircuitInspectionItemId, Status), when sorted by a field and deduplicated, the selected result should be the one with the minimum (for ascending) or maximum (for descending) value of the sort field within that group.

Validates: Requirements 3.2

Property 4: Pagination Returns Correct Slice

For any deduplicated and sorted list of inspection results, when pagination is applied with skip count N and page size M, the returned results should be exactly the elements from index N to index N+M-1 (or end of list if shorter) from the sorted list.

Validates: Requirements 3.3

Property 5: Total Count Matches Deduplicated Count

For any list of inspection results, the total count returned should equal the number of unique combinations of (Year, Month, Day, SecondaryCircuitInspectionItemId, Status) in the input, regardless of pagination parameters.

Validates: Requirements 3.4

Error Handling

Collection Query Failures

Strategy: Resilient querying with partial failure handling

  • Each collection query is wrapped in try-catch
  • Failed collection queries are logged with collection name and error details
  • Successful collection results are still processed
  • If all collections fail, return empty results with error logged

Implementation:

var tasks = collectionNames.Select(async collectionName => {
    try {
        var collection = GetCollection<SecondaryCircuitInspectionResult>(collectionName);
        // ... query logic ...
        return await query.ToListAsync(cancellationToken);
    }
    catch (Exception ex) {
        Log4Helper.Error($"Failed to query collection {collectionName}: {ex.Message}", ex);
        return new List<SecondaryCircuitInspectionResult>();
    }
});

Null Status Handling

Strategy: Treat null as empty string for grouping

  • Null Status values are converted to empty string in the grouping key
  • This ensures consistent deduplication behavior
  • Matches the original MongoDB implementation's $ifNull behavior

Implementation:

.GroupBy(x => new {
    x.Year,
    x.Month,
    x.Day,
    x.SecondaryCircuitInspectionItemId,
    Status = x.Status ?? string.Empty  // Null becomes empty string
})

Invalid Sort Field

Strategy: Log warning and continue without sorting

  • If sort field doesn't exist on the entity, log a warning
  • Return null from GetPropertyValue helper
  • LINQ OrderBy will handle null values gracefully

Cancellation Support

Strategy: Pass CancellationToken through all async operations

  • CancellationToken is passed to all MongoDB queries
  • Task.WhenAll respects cancellation
  • If cancelled, OperationCanceledException is thrown and propagated

Empty Collection List

Strategy: Early return with empty results

if (collectionNames == null || collectionNames.Count == 0)
{
    return (new List<SecondaryCircuitInspectionResult>(), 0);
}

Testing Strategy

Dual Testing Approach

This feature requires both unit tests and property-based tests:

  • Unit tests: Verify specific examples, edge cases, and error conditions
  • Property tests: Verify universal properties across all inputs
  • Both are complementary and necessary for comprehensive coverage

Property-Based Testing

Framework: Use FsCheck for C# property-based testing (or CsCheck as an alternative)

Configuration:

  • Minimum 100 iterations per property test
  • Each test must reference its design document property
  • Tag format: Feature: mongodb-compatibility-fix, Property {number}: {property_text}

Test Structure:

Each correctness property will be implemented as a single property-based test:

  1. Property 1 Test: Generate random lists of inspection results with intentional duplicates, verify deduplication
  2. Property 2 Test: Generate random lists and sort fields, verify sort order
  3. Property 3 Test: Generate groups with multiple records, verify first selection
  4. Property 4 Test: Generate random skip/limit values, verify correct slice
  5. Property 5 Test: Generate random lists, verify count accuracy

Generators:

Custom generators needed:

  • InspectionResultGenerator: Generates random SecondaryCircuitInspectionResult objects
  • DuplicateGroupGenerator: Generates groups of results with same deduplication key
  • SortFieldGenerator: Generates valid property names for sorting
  • PaginationParamsGenerator: Generates valid skip/limit combinations

Edge Cases to Include:

  • Null Status values
  • Empty collection lists
  • Single collection
  • Large skip values (beyond result count)
  • Zero page size
  • Results with identical sort field values

Unit Testing

Focus Areas:

  1. Specific Examples

    • Query with 3 collections, verify merge
    • Deduplication with known duplicates
    • Pagination edge cases (first page, last page, beyond end)
  2. Error Conditions

    • All collections fail to query
    • Invalid sort field name
    • Cancellation during query
    • Null or empty collection list
  3. Integration Points

    • MongoDB query execution
    • Filter rendering
    • Projection building

Test Organization:

  • Create SecondaryCircuitInspectionResultAppServiceTests.cs in test project
  • Group tests by functionality (deduplication, sorting, pagination, error handling)
  • Use descriptive test names: QueryPagedDeduplicatedResults_WithDuplicates_RemovesDuplicates

Performance Testing

While not part of correctness properties, performance should be monitored:

  • Measure query time for 1, 6, and 12 collections
  • Verify performance warnings are logged when exceeding thresholds
  • Compare performance with original $unionWith implementation (on MongoDB 4.4+)

Performance Expectations:

  • 1-3 collections: < 500ms
  • 4-6 collections: < 1000ms
  • 7-12 collections: < 2000ms
  • Warning threshold: 2000ms