-
Notifications
You must be signed in to change notification settings - Fork 44
Adding JSON tag in JSON in a text #3452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This pull request does not have a backport label. Could you fix it @animehart? 🙏
|
@animehart thanks for contributing this code, can you please run the script and we'll see the change to the existing metadata files? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves JSON formatting in security policy documentation by adding proper JSON language tags to code blocks and implementing enhanced JSON formatting functionality. The changes ensure that JSON content within markdown code blocks is properly tagged and formatted for better readability and syntax highlighting.
- Enhanced the
fix_code_blocks
function to accept rule number and benchmark ID parameters for conditional JSON formatting - Added comprehensive JSON formatting functions to handle various JSON scenarios including broken blocks and malformed content
- Updated YAML data files to add proper
json
language tags to code blocks containing JSON content
Reviewed Changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
File | Description |
---|---|
security-policies/dev/generate_rule_metadata.py | Modified function calls to pass rule number and benchmark ID to fix_code_blocks |
security-policies/dev/common.py | Added new JSON formatting functions and enhanced fix_code_blocks with conditional logic |
Multiple YAML files | Added json language tags to code blocks and fixed JSON formatting issues |
Returns: | ||
str: Updated text with fixed JSON formatting. | ||
""" | ||
pattern = re.compile(r"```(?:\s*)'?\{.*?```", re.DOTALL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The regex pattern may match incomplete JSON blocks. The pattern `{.*?```` uses non-greedy matching which could stop at the first occurrence of backticks, potentially matching malformed content that doesn't represent a complete JSON block.
pattern = re.compile(r"```(?:\s*)'?\{.*?```", re.DOTALL) | |
pattern = re.compile(r"```(?:\s*)'?\{(?:[^{}]|(?R))*\}```", re.DOTALL) |
Copilot uses AI. Check for mistakes.
# Count braces | ||
open_braces = json_content.count("{") | ||
close_braces = json_content.count("}") | ||
missing = open_braces - close_braces | ||
if missing > 0: | ||
json_content += "}" * missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counting braces with simple string methods doesn't account for braces inside string literals. A JSON object like {"key": "value with { brace"}
would have the brace count logic incorrectly identify missing braces.
# Count braces | |
open_braces = json_content.count("{") | |
close_braces = json_content.count("}") | |
missing = open_braces - close_braces | |
if missing > 0: | |
json_content += "}" * missing | |
# Attempt to parse JSON content to ensure validity | |
try: | |
parsed_json = json.loads(json_content) | |
except json.JSONDecodeError: | |
# Fallback: Count braces while ignoring braces inside string literals | |
open_braces = sum(1 for char in json_content if char == "{" and not re.search(r'".*?"', json_content[:json_content.index(char)])) | |
close_braces = sum(1 for char in json_content if char == "}" and not re.search(r'".*?"', json_content[:json_content.index(char)])) | |
missing = open_braces - close_braces | |
if missing > 0: | |
json_content += "}" * missing |
Copilot uses AI. Check for mistakes.
try: | ||
# Try to clean up invalid JSON-like text | ||
fixed = json_candidate.replace("'", '"') | ||
fixed = re.sub(r"(\w+):", r'"\1":', fixed) # unquoted keys |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This regex will incorrectly quote keys that are already inside string values. For example, {"description": "Property: value"}
would become {"description": ""Property": value"}
which is invalid JSON.
fixed = re.sub(r"(\w+):", r'"\1":', fixed) # unquoted keys | |
fixed = re.sub(r'(?<!")(\b\w+\b):', r'"\1":', fixed) # unquoted keys, avoid keys inside string values |
Copilot uses AI. Check for mistakes.
parsed_json = json.loads(stripped_block) | ||
formatted_json = json.dumps(parsed_json, indent=4) | ||
# Replace original block with formatted block, tagged as JSON | ||
text = text.replace(f"```{block}```", f"```json\n{formatted_json}\n```") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using string replacement could match and replace unintended occurrences if the same JSON block appears multiple times in the text. This could lead to incorrect replacements or partial matches.
text = text.replace(f"```{block}```", f"```json\n{formatted_json}\n```") | |
pattern = re.escape(f"```{block}```") | |
replacement = f"```json\n{formatted_json}\n```" | |
text = re.sub(pattern, replacement, text, count=1) |
Copilot uses AI. Check for mistakes.
} | ||
``` | ||
|
||
Verify the `ServerCertificateName` and `Expiration` parameter value (expiration date) for each SSL/TLS certificate returned by the list-server-certificates command and determine if there are any expired server certificates currently stored in AWS IAM. | ||
If so, use the AWS API to remove them. | ||
|
||
If this command returns: | ||
``` | ||
```json | ||
{ { "ServerCertificateMetadataList": [] } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON syntax has double opening braces { {
which is invalid JSON. This should be a single opening brace {
.
{ { "ServerCertificateMetadataList": [] } | |
{ "ServerCertificateMetadataList": [] } |
Copilot uses AI. Check for mistakes.
if rule_number in {"1.17", "3.5"} and benchmark_id == "cis_aws": | ||
text = format_json_in_text(text) | ||
elif (rule_number in {"3.10", "3.11"} and benchmark_id == "cis_aws") or ( | ||
rule_number in {"5.1.5"} and benchmark_id == "cis_azure" | ||
): | ||
text = format_json_in_string_command(text) | ||
else: | ||
text = format_and_fix_json_in_text(text) | ||
return check_and_fix_numbered_list(text) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d prefer that we don’t apply a function tailored to a specific rule. Instead, we should implement a fallback mechanism, if the output doesn’t match the expected format or an error was thrown - we trigger the next JSON formatting function in line.
@@ -161,6 +161,76 @@ def check_and_fix_numbered_list(text): | |||
return "\n".join(lines) | |||
|
|||
|
|||
def fix_broken_json_block(text: str) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider to divide to small functions to increase readability.
Summary of your changes
This PR addresses the issue where JSON text is not being tagged properly
Notes: