Skip to content

Conversation

karm1000
Copy link
Member

@karm1000 karm1000 commented Jul 23, 2025

Fixes: #1969

Sensitive Info Masking Refactor

Problem Statement

The original mask_sensitive_info method in the BaseAPI class had several issues:

  1. Generic key lookup: It searched for sensitive keys in all data structures (headers, output, data, request_body) regardless of where they actually belong
  2. Inefficient: It checked every sensitive key against every data structure
  3. Not context-aware: Some keys like "password" might only be relevant in request bodies, while "x-api-key" is only relevant in headers
  4. Potential false positives: A key like "password" could accidentally mask legitimate data in response outputs

Solution

Before (Generic Approach)

def mask_sensitive_info(self, log):
    # ... setup code ...
    for key in self.SENSITIVE_INFO:
        if key in request_headers:
            request_headers[key] = placeholder
        if output and key in output:
            output[key] = placeholder
        if data and key in data:
            data[key] = placeholder
        if request_body and key in request_body:
            request_body[key] = placeholder

After (Specific + Override Approach)

def mask_sensitive_info(self, log):
    # ... setup code ...
    sensitive_info_mapping = self._get_sensitive_info_mapping()

    # Mask sensitive info in specific locations only
    if request_headers and sensitive_info_mapping.get("headers"):
        for key in sensitive_info_mapping["headers"]:
            if key in request_headers:
                request_headers[key] = placeholder
    # ... similar for output, data, body ...

def _get_sensitive_info_mapping(self):
    # Default mapping with override support
    default_mapping = {...}
    overrides = self._get_sensitive_info_overrides()
    # Merge overrides with defaults
    return merged_mapping

def _get_sensitive_info_overrides(self):
    # Override this in subclasses to customize specific locations
    return {}

Benefits

1. Location-Specific Masking

# x-api-key only masked in headers, not everywhere
"headers": ["x-api-key", "auth-token", "auth_token"],
"output": ["auth-token", "auth_token", "sek", "rek"],  # no x-api-key
"data": ["app_key", "AppKey"],                         # no x-api-key
"body": ["password", "Password", "app_key", "AppKey"], # no x-api-key

2. Incremental Overrides

Subclasses only need to specify what's different:

# Old way - had to specify everything
class EInvoiceAPI(BaseAPI):
    SENSITIVE_INFO = BaseAPI.SENSITIVE_INFO + ("password", "Password", "app_key", "AppKey")

# New way - only specify what's different
class EInvoiceAPI(BaseAPI):
    def _get_sensitive_info_overrides(self):
        return {
            "output": [],                                # only override output
            "data": ["AppKey"],                          # only override data
            "body": ["password", "Password", "AppKey"],  # only override body
            # headers automatically uses default: ["x-api-key", "auth-token", "auth_token"]
        }

3. Easy Customization Examples

Example 1: API that only needs custom headers

class CustomHeaderAPI(BaseAPI):
    def _get_sensitive_info_overrides(self):
        return {
            "headers": ["x-api-key", "custom-auth-header"],
            # All other locations use defaults
        }

Example 2: API that doesn't return sensitive info

class ReadOnlyAPI(BaseAPI):
    def _get_sensitive_info_overrides(self):
        return {
            "output": [],  # No sensitive info in responses
            # Other locations use defaults
        }

Example 3: API with special body fields

class SpecialAPI(BaseAPI):
    def _get_sensitive_info_overrides(self):
        return {
            "body": ["password", "secret_key", "private_token"],
            # Headers, output, data use defaults
        }

Migration Guide

For Base API Usage

No changes needed - existing code continues to work.

For Subclasses Using Full Override

# Before
class MyAPI(BaseAPI):
    SENSITIVE_INFO = BaseAPI.SENSITIVE_INFO + ("password", "Password", "app_key", "my-special-key", "my-secret")

# After (recommended)
class MyAPI(BaseAPI):
    def _get_sensitive_info_overrides(self):
        return {
            "data": ["my-special-key"],            # only specify different parts
            "body": ["password", "Password", "my-secret"],
        }

Testing

The refactor includes comprehensive tests that verify:

  • Default mapping works correctly
  • Override system works correctly
  • Only specified locations are overridden
  • Empty overrides use defaults
  • No false positives occur
  • Existing functionality is preserved

Summary by CodeRabbit

  • Refactor
    • Improved sensitive-information masking in API logs: masking rules are now organized by location (headers, output, data, body) and applied per-location for more accurate redaction.
    • Removed redundant per-subclass sensitive-key declarations so subclasses inherit the centralized masking configuration.
  • Tests
    • Added comprehensive tests validating masking behavior across locations, nested structures, missing data, false-positive avoidance, and configurable overrides.

Copy link

coderabbitai bot commented Jul 23, 2025

Walkthrough

Refactors BaseAPI's sensitive-info masking to use a location-aware mapping (PLACEHOLDER, DEFAULT_MASK_MAP) with helpers (_get_sensitive_info_mapping, _get_sensitive_info_overrides, _mask_sensitive_info). Removes subclass SENSITIVE_INFO overrides and adds comprehensive tests for the new masking behavior. No other runtime flows changed.

Changes

Cohort / File(s) Change Summary
Base masking refactor
india_compliance/gst_india/api_classes/base.py
Replaced flat SENSITIVE_INFO with PLACEHOLDER and DEFAULT_MASK_MAP; added _get_sensitive_info_mapping, _get_sensitive_info_overrides, _mask_sensitive_info; updated mask_sensitive_info to apply location-aware masking; added copy import; removed SENSITIVE_INFO.
Subclass cleanup
india_compliance/gst_india/api_classes/nic/e_invoice.py, india_compliance/gst_india/api_classes/nic/e_waybill.py, india_compliance/gst_india/api_classes/taxpayer_base.py
Removed subclass-level SENSITIVE_INFO extensions so they inherit base mapping.
Tests
india_compliance/gst_india/api_classes/test_mask_sensitive_info.py
Added integration/unit tests exercising mapping retrieval, per-location masking, nested-body masking, overrides behavior, missing-data handling, and false-positive avoidance.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant BaseAPI

    Caller->>BaseAPI: mask_sensitive_info(log)
    BaseAPI->>BaseAPI: _get_sensitive_info_mapping()
    BaseAPI->>BaseAPI: _mask_sensitive_info(log.request_headers, mapping["headers"])
    BaseAPI->>BaseAPI: _mask_sensitive_info(log.data, mapping["data"])
    BaseAPI->>BaseAPI: _mask_sensitive_info(log.request_body, mapping["body"])
    BaseAPI->>BaseAPI: _mask_sensitive_info(log.output, mapping["output"])
    BaseAPI-->>Caller: masked log
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Assessment against linked issues

Objective (issue) Addressed Explanation
Use more specific key lookup and not generic (1969)
Make masking location-aware and extensible (1969)

Poem

"I nibble through logs with careful paws,
I hide the keys with thoughtful laws.
Headers, bodies, outputs — each has its place,
A masked little hop in a safe, tidy space.
🐇✨"


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9913f61 and c6af8dc.

📒 Files selected for processing (1)
  • india_compliance/gst_india/api_classes/base.py (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • india_compliance/gst_india/api_classes/base.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Python Unit Tests
  • GitHub Check: Mergify Merge Protections
  • GitHub Check: Summary
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@karm1000 karm1000 requested review from vorasmit and sagarvora July 23, 2025 12:28
@karm1000
Copy link
Member Author

@sagarvora
What do you think of the design?

Copy link

codecov bot commented Jul 23, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.03%. Comparing base (0e6dd12) to head (c6af8dc).
⚠️ Report is 44 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #3555      +/-   ##
===========================================
+ Coverage    60.98%   61.03%   +0.04%     
===========================================
  Files          134      135       +1     
  Lines        13788    13841      +53     
===========================================
+ Hits          8409     8448      +39     
- Misses        5379     5393      +14     
Files with missing lines Coverage Δ
india_compliance/gst_india/api_classes/base.py 87.84% <100.00%> (+1.79%) ⬆️
..._compliance/gst_india/api_classes/nic/e_invoice.py 76.33% <ø> (-0.18%) ⬇️
..._compliance/gst_india/api_classes/nic/e_waybill.py 64.00% <ø> (-0.43%) ⬇️
..._compliance/gst_india/api_classes/taxpayer_base.py 28.08% <ø> (-0.31%) ⬇️

... and 12 files with indirect coverage changes

Impacted file tree graph

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (4)

70-70: Inconsistent BaseAPI instantiation pattern.

The test uses BaseAPI.__new__(BaseAPI) here but BaseAPI() in other tests. For consistency and clarity, use the standard constructor pattern throughout.

-        api = BaseAPI.__new__(BaseAPI)
+        api = BaseAPI()

125-125: Inconsistent BaseAPI instantiation pattern.

Same issue as the previous test - use consistent instantiation pattern.

-        api = BaseAPI.__new__(BaseAPI)
+        api = BaseAPI()

156-156: Inconsistent instantiation pattern for custom API class.

For consistency with the BaseAPI instantiation pattern, use the standard constructor.

-        api = CustomAPI.__new__(CustomAPI)
+        api = CustomAPI()

172-172: Inconsistent instantiation pattern for custom API class.

For consistency, use the standard constructor.

-        api = NoOverrideAPI.__new__(NoOverrideAPI)
+        api = NoOverrideAPI()
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0e6dd12 and 9913f61.

📒 Files selected for processing (5)
  • india_compliance/gst_india/api_classes/base.py (3 hunks)
  • india_compliance/gst_india/api_classes/nic/e_invoice.py (1 hunks)
  • india_compliance/gst_india/api_classes/nic/e_waybill.py (1 hunks)
  • india_compliance/gst_india/api_classes/taxpayer_base.py (0 hunks)
  • india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1 hunks)
🧠 Learnings (3)
india_compliance/gst_india/api_classes/nic/e_waybill.py (3)

Learnt from: karm1000
PR: #3532
File: india_compliance/gst_india/utils/e_waybill.py:327-336
Timestamp: 2025-07-22T11:45:43.419Z
Learning: In the e-waybill update functions (india_compliance/gst_india/utils/e_waybill.py), fields like place_of_change and state use hardcoded "-" values in old_values dictionaries because these values are not stored in the system, so there's no way to retrieve the actual previous values.

Learnt from: karm1000
PR: #3354
File: india_compliance/gst_india/report/summary_of_itc_availed/summary_of_itc_availed.py:278-281
Timestamp: 2025-06-25T08:19:02.607Z
Learning: In the Summary of ITC Availed report (india_compliance/gst_india/report/summary_of_itc_availed/summary_of_itc_availed.py), the summary dictionaries created by get_initial_summary() are never empty - they always contain tax field keys (igst_amount, cgst_amount, sgst_amount, cess_amount) with 0 values, so checking for empty dictionaries is unnecessary.

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:84-92
Timestamp: 2025-07-12T13:31:12.352Z
Learning: In the e-Invoice API implementation (india_compliance/gst_india/api_classes/nic/e_invoice.py), the base class and StandardEInvoiceAPI subclass have different is_ignored_error method implementations because enriched and standard APIs receive errors in different formats. The base class checks if the error message starts with an error code from the message field, while StandardEInvoiceAPI checks the ErrorCode field in ErrorDetails.

india_compliance/gst_india/api_classes/nic/e_invoice.py (2)

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:84-92
Timestamp: 2025-07-12T13:31:12.352Z
Learning: In the e-Invoice API implementation (india_compliance/gst_india/api_classes/nic/e_invoice.py), the base class and StandardEInvoiceAPI subclass have different is_ignored_error method implementations because enriched and standard APIs receive errors in different formats. The base class checks if the error message starts with an error code from the message field, while StandardEInvoiceAPI checks the ErrorCode field in ErrorDetails.

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:137-139
Timestamp: 2025-07-06T06:08:22.165Z
Learning: Sandbox credentials for e-Invoice API testing can be hardcoded in the EnrichedEInvoiceAPI class. These are test credentials provided by the API service (company_gstin: "02AMBPG7773M002", username: "adqgsphpusr1", password: "Gsp@1234") and are meant to be shared for ease of testing by developers.

india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1)

Learnt from: vorasmit
PR: #3399
File: india_compliance/gst_india/utils/gstr_1/test_gstr_1_books_data.py:791-822
Timestamp: 2025-05-29T15:22:04.761Z
Learning: In the india_compliance test suite, the IntegrationTestCase framework automatically handles database rollbacks after test completion, which means functions that modify global state (like GST Settings and Item Tax Templates) in setUpClass won't persist beyond the test run and don't require manual cleanup.

🧬 Code Graph Analysis (1)
india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1)
india_compliance/gst_india/api_classes/base.py (4)
  • BaseAPI (19-375)
  • _get_sensitive_info_mapping (314-338)
  • mask_sensitive_info (297-312)
  • _get_sensitive_info_overrides (340-348)
💤 Files with no reviewable changes (1)
  • india_compliance/gst_india/api_classes/taxpayer_base.py
🧰 Additional context used
🧠 Learnings (3)
india_compliance/gst_india/api_classes/nic/e_waybill.py (3)

Learnt from: karm1000
PR: #3532
File: india_compliance/gst_india/utils/e_waybill.py:327-336
Timestamp: 2025-07-22T11:45:43.419Z
Learning: In the e-waybill update functions (india_compliance/gst_india/utils/e_waybill.py), fields like place_of_change and state use hardcoded "-" values in old_values dictionaries because these values are not stored in the system, so there's no way to retrieve the actual previous values.

Learnt from: karm1000
PR: #3354
File: india_compliance/gst_india/report/summary_of_itc_availed/summary_of_itc_availed.py:278-281
Timestamp: 2025-06-25T08:19:02.607Z
Learning: In the Summary of ITC Availed report (india_compliance/gst_india/report/summary_of_itc_availed/summary_of_itc_availed.py), the summary dictionaries created by get_initial_summary() are never empty - they always contain tax field keys (igst_amount, cgst_amount, sgst_amount, cess_amount) with 0 values, so checking for empty dictionaries is unnecessary.

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:84-92
Timestamp: 2025-07-12T13:31:12.352Z
Learning: In the e-Invoice API implementation (india_compliance/gst_india/api_classes/nic/e_invoice.py), the base class and StandardEInvoiceAPI subclass have different is_ignored_error method implementations because enriched and standard APIs receive errors in different formats. The base class checks if the error message starts with an error code from the message field, while StandardEInvoiceAPI checks the ErrorCode field in ErrorDetails.

india_compliance/gst_india/api_classes/nic/e_invoice.py (2)

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:84-92
Timestamp: 2025-07-12T13:31:12.352Z
Learning: In the e-Invoice API implementation (india_compliance/gst_india/api_classes/nic/e_invoice.py), the base class and StandardEInvoiceAPI subclass have different is_ignored_error method implementations because enriched and standard APIs receive errors in different formats. The base class checks if the error message starts with an error code from the message field, while StandardEInvoiceAPI checks the ErrorCode field in ErrorDetails.

Learnt from: vorasmit
PR: #3326
File: india_compliance/gst_india/api_classes/nic/e_invoice.py:137-139
Timestamp: 2025-07-06T06:08:22.165Z
Learning: Sandbox credentials for e-Invoice API testing can be hardcoded in the EnrichedEInvoiceAPI class. These are test credentials provided by the API service (company_gstin: "02AMBPG7773M002", username: "adqgsphpusr1", password: "Gsp@1234") and are meant to be shared for ease of testing by developers.

india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1)

Learnt from: vorasmit
PR: #3399
File: india_compliance/gst_india/utils/gstr_1/test_gstr_1_books_data.py:791-822
Timestamp: 2025-05-29T15:22:04.761Z
Learning: In the india_compliance test suite, the IntegrationTestCase framework automatically handles database rollbacks after test completion, which means functions that modify global state (like GST Settings and Item Tax Templates) in setUpClass won't persist beyond the test run and don't require manual cleanup.

🧬 Code Graph Analysis (1)
india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1)
india_compliance/gst_india/api_classes/base.py (4)
  • BaseAPI (19-375)
  • _get_sensitive_info_mapping (314-338)
  • mask_sensitive_info (297-312)
  • _get_sensitive_info_overrides (340-348)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Codacy Static Code Analysis
  • GitHub Check: Python Unit Tests
  • GitHub Check: Mergify Merge Protections
  • GitHub Check: Summary
🔇 Additional comments (8)
india_compliance/gst_india/api_classes/nic/e_invoice.py (1)

14-14: LGTM: Clean migration to structured sensitive info masking.

The removal of the SENSITIVE_INFO extension aligns perfectly with the new structured masking approach in the base class. The previously masked keys ("password", "Password", "AppKey") are now properly handled by the DEFAULT_MASK_MAP in their appropriate locations (headers, data, body).

india_compliance/gst_india/api_classes/nic/e_waybill.py (1)

19-19: LGTM: Consistent migration to structured masking approach.

The removal of the SENSITIVE_INFO extension is consistent with the refactor. The previously masked keys ("password", "app_key") are now covered by the base class's DEFAULT_MASK_MAP in their appropriate data locations.

india_compliance/gst_india/api_classes/test_mask_sensitive_info.py (1)

7-176: Excellent comprehensive test coverage for the new masking system.

The test suite thoroughly validates all aspects of the refactored sensitive information masking functionality:

  • ✅ Basic mapping structure validation
  • ✅ Location-specific masking (headers vs. output vs. data vs. body)
  • ✅ Request body handling
  • ✅ Missing data edge cases
  • ✅ False positive prevention
  • ✅ Override mechanism functionality
  • ✅ Default fallback behavior

The tests properly verify that sensitive information is masked only in appropriate locations and that the new structured approach works as intended.

india_compliance/gst_india/api_classes/base.py (5)

1-1: Appropriate import addition for deep copy functionality.

The copy import is necessary for the deep copy operation in _get_sensitive_info_mapping to prevent modification of the class constant.


22-43: Excellent structured approach to sensitive information mapping.

The new DEFAULT_MASK_MAP provides significant improvements over the previous flat tuple approach:

  • Location-aware masking: Keys are only masked in relevant contexts (e.g., "x-api-key" only in headers)
  • Comprehensive coverage: Includes all sensitive key variations and locations
  • Clear categorization: Easy to understand which keys are masked where

The PLACEHOLDER constant provides consistent masking across the codebase.


303-312: Clean and focused refactor of the masking logic.

The refactored mask_sensitive_info method is much cleaner and more maintainable:

  • Delegates location-specific masking to the helper method
  • Uses the structured mapping for precise masking
  • Maintains the same interface while improving functionality

314-348: Well-designed override mechanism with proper deep copy protection.

The _get_sensitive_info_mapping method implementation is excellent:

  • Deep copy protection: Prevents accidental modification of the class constant
  • Flexible override system: Allows subclasses to customize specific locations only
  • Complete replacement strategy: Overrides replace entire lists per location for clarity
  • Null handling: Properly handles empty override lists

The template method pattern in _get_sensitive_info_overrides provides a clean extension point for subclasses.


350-360: Simple and focused helper method for masking operations.

The _mask_sensitive_info helper method is well-implemented:

  • Null-safe: Handles missing target or sensitive_keys gracefully
  • Simple logic: Clear and focused on the single responsibility of masking
  • Consistent placeholder: Uses the class constant for consistent masking

@vorasmit vorasmit changed the title feat: enhance mask_sensitive_info in BaseAPI refactor: enhance mask_sensitive_info in BaseAPI Jul 30, 2025
@karm1000 karm1000 requested a review from vorasmit August 12, 2025 17:18
@vorasmit vorasmit merged commit 191e68e into resilient-tech:develop Aug 21, 2025
14 of 15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor needed: mask_sensitive_info
2 participants