Skip to content

[Draft] Small improvement on sanity checks per MI repository provided on the autogenerated JSON#1851

Draft
srbarrios wants to merge 2 commits intomasterfrom
validate-maintenance-repos
Draft

[Draft] Small improvement on sanity checks per MI repository provided on the autogenerated JSON#1851
srbarrios wants to merge 2 commits intomasterfrom
validate-maintenance-repos

Conversation

@srbarrios
Copy link
Copy Markdown
Member

@srbarrios srbarrios commented Jan 14, 2026

Related card: https://github.com/SUSE/spacewalk/issues/29401

This just makes a sanity check to find .repo file inside the repo URL, if it doesn't find it it logs an error. But we need to improve this logic to consider Debian-like repos too.

@srbarrios srbarrios changed the title First small iteration Small improvement on sanity checks per MI repository provided on the autogenerated JSON Jan 14, 2026
@srbarrios srbarrios requested a review from a team January 14, 2026 15:40
@ktsamis
Copy link
Copy Markdown
Member

ktsamis commented Jan 14, 2026

I think because of the way we iterate on all possible urls and only use the valid ones this might give all the false negatives. Basically we try to see if an MI has a specific client tools repo and if it doesn't we don't use it, which is a valid case since not all MIs have all the suffixes. I think the way this would check, it would log all these non-repos and show them as errors. But I will review this better later on

@srbarrios
Copy link
Copy Markdown
Member Author

srbarrios commented Jan 19, 2026

Take this PR as an example :)
It was automatically generated by GH Copilot, and I did not test it.

res: requests.Response = requests.get(url, timeout=6)
res.raise_for_status()
except requests.RequestException as e:
logging.error(f"Error requesting: {url}; {e}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As pointed out by Kostas, this will log all 404s that we get.
Since they tend to be quite a lot, we may want to skip logging here.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we get 404 on repos that we expect to be valid?

Copy link
Copy Markdown
Contributor

@NamelessOne91 NamelessOne91 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to feed the MI with ID 42236, using 50-sles, as version it correctly recognized these as valid, because of a link to a .repo:

Not a fan of handling this by parsing HTML with regEx, but it seems to work with RPM repos.

The 1st case, where the URL itself is pointing to a .repo file, is probably not going to happen in our setup.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds sanity checks to validate maintenance repository URLs by detecting whether they contain .repo files (for RPM repositories) or Release/InRelease files (for Debian repositories). The validation is performed by fetching the URL content and using regex patterns to search for repository indicator files.

Changes:

  • Enhanced create_url function to validate repository URLs by checking for .repo, Release, or InRelease files in the response body
  • Changed return type from str to str | None to explicitly handle invalid repositories
  • Added comprehensive error logging when repositories fail validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

return url if res.ok else ""
try:
res: requests.Response = requests.get(url, timeout=6)
res.raise_for_status()
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code now calls 'res.raise_for_status()' on line 59, which is the correct requests library pattern. However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:18-19 currently always raises HTTPError regardless of the response status. This means all test cases will fail because even successful (200) responses will raise an exception. The MockResponse.raise_for_status() method should only raise HTTPError when status_code indicates an error (4xx, 5xx), not for successful responses.

Copilot uses AI. Check for mistakes.
Comment on lines +67 to +70
# If the URL itself points directly to a .repo file
if url.lower().endswith('.repo'):
return url.rsplit('/', 1)[0] + '/'

Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic appears problematic. If the URL itself ends with '.repo', it means the URL was constructed to point directly to a .repo file (which seems unlikely based on how create_url is called with MI IDs and suffixes). The code then returns the parent directory by stripping the filename, but this parent directory hasn't been validated to exist. Additionally, this check happens before checking the response body, so even if the URL returns a 404, it would still return the parent directory. Consider moving this check after validating the response, or removing it if URLs are never expected to point directly to .repo files.

Suggested change
# If the URL itself points directly to a .repo file
if url.lower().endswith('.repo'):
return url.rsplit('/', 1)[0] + '/'

Copilot uses AI. Check for mistakes.

@cache
def create_url(mi_id: str, suffix: str) -> str:
def create_url(mi_id: str, suffix: str) -> str | None:
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The return type of create_url has changed from returning an empty string "" for invalid URLs to returning None. However, the existing test at jenkins_pipelines/scripts/tests/test_maintenance_json_generator.py:99 still expects an empty string ("") for invalid cases. This test will fail because it checks 'expected_url: str = ... if valid else ""' and then asserts equality with the result. The test needs to be updated to expect None instead of "" for invalid URLs.

Copilot uses AI. Check for mistakes.
Comment on lines +71 to +88
# Search for links to .repo files in the HTML body
# Matches: href="something.repo"
rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)
if rpm_regex.search(body):
return url

# Fallback for plain text listings of .repo files
token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
if token_repo_re.search(body):
return url

# --- Strategy 2: Debian/Ubuntu Repository Check ---
# Debian repos define a 'Release' or 'InRelease' file at the repository root.
# Matches: href="Release" or href="InRelease"
deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
if deb_regex.search(body):
logging.info(f"Identified Debian-like repository at: {url}")
return url
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new validation logic searches for .repo files or Release/InRelease files in the response body (lines 71-88). However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:22-23 returns a MockResponse with no content/text for successful URLs. This means even the valid test case will fail the new validation checks and return None instead of the expected URL. The mock needs to be updated to include realistic HTML content with .repo file references for successful test cases.

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +53
Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease).
Returns the directory URL if valid, None otherwise.
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states "Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease)". However, this is slightly misleading as the function doesn't just validate - it also constructs and returns the URL. Consider updating the docstring to clarify that the function both validates and returns the URL, or returns None if validation fails. For example: "Validates if the URL points to a valid RPM or Debian repository and returns the URL if valid, None otherwise."

Suggested change
Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease).
Returns the directory URL if valid, None otherwise.
Validates if the URL points to a valid RPM repository (contains a .repo file) or Debian repository
(contains Release/InRelease), and returns the URL if valid, None otherwise.

Copilot uses AI. Check for mistakes.
Comment on lines +73 to +85
rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)
if rpm_regex.search(body):
return url

# Fallback for plain text listings of .repo files
token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
if token_repo_re.search(body):
return url

# --- Strategy 2: Debian/Ubuntu Repository Check ---
# Debian repos define a 'Release' or 'InRelease' file at the repository root.
# Matches: href="Release" or href="InRelease"
deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex defines a named group 'link' but the named group is not used. The match object from 'search()' is not captured, so the named group serves no purpose. Consider simplifying the regex by removing the named group, as only the existence of a match is checked.

Suggested change
rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)
if rpm_regex.search(body):
return url
# Fallback for plain text listings of .repo files
token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
if token_repo_re.search(body):
return url
# --- Strategy 2: Debian/Ubuntu Repository Check ---
# Debian repos define a 'Release' or 'InRelease' file at the repository root.
# Matches: href="Release" or href="InRelease"
deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
rpm_regex = re.compile(r'href=["\'][^"\']+?\.repo\b["\']', re.IGNORECASE)
if rpm_regex.search(body):
return url
# Fallback for plain text listings of .repo files
token_repo_re = re.compile(r'https?://[^\s"\'<>]+?\.repo\b|[^\s"\'<>]+?\.repo\b', re.IGNORECASE)
if token_repo_re.search(body):
return url
# --- Strategy 2: Debian/Ubuntu Repository Check ---
# Debian repos define a 'Release' or 'InRelease' file at the repository root.
# Matches: href="Release" or href="InRelease"
deb_regex = re.compile(r'href=["\'](?:In)?Release["\']', re.IGNORECASE)

Copilot uses AI. Check for mistakes.
return url

# Fallback for plain text listings of .repo files
token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This regex has two named groups both capturing .repo file patterns: 'link' and 'link2'. Having duplicate group names is problematic as only one will be accessible in match results. Additionally, these named groups are never used since only the existence of a match is checked. Consider simplifying to a single pattern without named groups.

Suggested change
token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
token_repo_re = re.compile(r'https?://[^\s"\'<>]+?\.repo\b|[^\s"\'<>]+?\.repo\b', re.IGNORECASE)

Copilot uses AI. Check for mistakes.
# --- Strategy 2: Debian/Ubuntu Repository Check ---
# Debian repos define a 'Release' or 'InRelease' file at the repository root.
# Matches: href="Release" or href="InRelease"
deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the RPM regex, the named group 'link' is defined but never used. Consider removing the named group to simplify the code.

Suggested change
deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
deb_regex = re.compile(r'href=["\'](?:In)?Release["\']', re.IGNORECASE)

Copilot uses AI. Check for mistakes.
logging.error(f"Error requesting: {url}; {e}")
return None

body = res.text or ""
Copy link

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new create_url function now reads 'res.text' on line 64 to get the response body for validation. However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:9-13 only defines 'content' attribute, not 'text'. The tests will fail with AttributeError when trying to access res.text. The MockResponse class needs to be updated to include a 'text' property that returns the content.

Copilot uses AI. Check for mistakes.
@nodeg
Copy link
Copy Markdown
Member

nodeg commented Feb 16, 2026

It was automatically generated by GH Copilot, and I did not test it.

This is problematic and we need to set clear rules for AI generated code. Testing the code, however, should always be done, nevertheless the author of the code.

@srbarrios
Copy link
Copy Markdown
Member Author

srbarrios commented Feb 16, 2026

It was automatically generated by GH Copilot, and I did not test it.

This is problematic and we need to set clear rules for AI generated code. Testing the code, however, should always be done, nevertheless the author of the code.

@nodeg To make it clear. I opened a GH Card as discussed in our meeting. After that, and just as example, I opened this proposal vibe coded by Gemini. I did not test it because that was an example and I expected someone else to pick the card.

Right now, I'm assigned on the card, and ofc, I will test it before going further. But this card had never been prioritized, so for now, I did not start working on it, it still in "New" column.

@srbarrios srbarrios marked this pull request as draft March 11, 2026 10:26
@srbarrios srbarrios changed the title Small improvement on sanity checks per MI repository provided on the autogenerated JSON [Draft] Small improvement on sanity checks per MI repository provided on the autogenerated JSON Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants