[Draft] Small improvement on sanity checks per MI repository provided on the autogenerated JSON by srbarrios · Pull Request #1851 · SUSE/susemanager-ci

srbarrios · 2026-01-14T15:33:56Z

Related card: https://github.com/SUSE/spacewalk/issues/29401

This just makes a sanity check to find .repo file inside the repo URL, if it doesn't find it it logs an error. But we need to improve this logic to consider Debian-like repos too.

ktsamis · 2026-01-14T15:59:03Z

I think because of the way we iterate on all possible urls and only use the valid ones this might give all the false negatives. Basically we try to see if an MI has a specific client tools repo and if it doesn't we don't use it, which is a valid case since not all MIs have all the suffixes. I think the way this would check, it would log all these non-repos and show them as errors. But I will review this better later on

srbarrios · 2026-01-19T13:09:59Z

Take this PR as an example :)
It was automatically generated by GH Copilot, and I did not test it.

NamelessOne91 · 2026-01-19T15:33:14Z

+        res: requests.Response = requests.get(url, timeout=6)
+        res.raise_for_status()
+    except requests.RequestException as e:
+        logging.error(f"Error requesting: {url}; {e}")


As pointed out by Kostas, this will log all 404s that we get.
Since they tend to be quite a lot, we may want to skip logging here.

Why we get 404 on repos that we expect to be valid?

NamelessOne91

I tried to feed the MI with ID 42236, using 50-sles, as version it correctly recognized these as valid, because of a link to a .repo:

Not a fan of handling this by parsing HTML with regEx, but it seems to work with RPM repos.

The 1st case, where the URL itself is pointing to a .repo file, is probably not going to happen in our setup.

Copilot

Pull request overview

This PR adds sanity checks to validate maintenance repository URLs by detecting whether they contain .repo files (for RPM repositories) or Release/InRelease files (for Debian repositories). The validation is performed by fetching the URL content and using regex patterns to search for repository indicator files.

Changes:

Enhanced create_url function to validate repository URLs by checking for .repo, Release, or InRelease files in the response body
Changed return type from str to str | None to explicitly handle invalid repositories
Added comprehensive error logging when repositories fail validation

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-27T14:11:39Z

-    return url if res.ok else ""
+    try:
+        res: requests.Response = requests.get(url, timeout=6)
+        res.raise_for_status()


The code now calls 'res.raise_for_status()' on line 59, which is the correct requests library pattern. However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:18-19 currently always raises HTTPError regardless of the response status. This means all test cases will fail because even successful (200) responses will raise an exception. The MockResponse.raise_for_status() method should only raise HTTPError when status_code indicates an error (4xx, 5xx), not for successful responses.

Copilot · 2026-01-27T14:11:40Z

+    # If the URL itself points directly to a .repo file
+    if url.lower().endswith('.repo'):
+        return url.rsplit('/', 1)[0] + '/'
+


This logic appears problematic. If the URL itself ends with '.repo', it means the URL was constructed to point directly to a .repo file (which seems unlikely based on how create_url is called with MI IDs and suffixes). The code then returns the parent directory by stripping the filename, but this parent directory hasn't been validated to exist. Additionally, this check happens before checking the response body, so even if the URL returns a 404, it would still return the parent directory. Consider moving this check after validating the response, or removing it if URLs are never expected to point directly to .repo files.

Suggested change

# If the URL itself points directly to a .repo file

if url.lower().endswith('.repo'):

return url.rsplit('/', 1)[0] + '/'

Copilot · 2026-01-27T14:11:40Z


 @cache
-def create_url(mi_id: str, suffix: str) -> str:
+def create_url(mi_id: str, suffix: str) -> str | None:


The return type of create_url has changed from returning an empty string "" for invalid URLs to returning None. However, the existing test at jenkins_pipelines/scripts/tests/test_maintenance_json_generator.py:99 still expects an empty string ("") for invalid cases. This test will fail because it checks 'expected_url: str = ... if valid else ""' and then asserts equality with the result. The test needs to be updated to expect None instead of "" for invalid URLs.

Copilot · 2026-01-27T14:11:40Z

+    # Search for links to .repo files in the HTML body
+    # Matches: href="something.repo"
+    rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)
+    if rpm_regex.search(body):
+        return url
+
+    # Fallback for plain text listings of .repo files
+    token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
+    if token_repo_re.search(body):
+        return url
+
+    # --- Strategy 2: Debian/Ubuntu Repository Check ---
+    # Debian repos define a 'Release' or 'InRelease' file at the repository root.
+    # Matches: href="Release" or href="InRelease"
+    deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
+    if deb_regex.search(body):
+        logging.info(f"Identified Debian-like repository at: {url}")
+        return url


The new validation logic searches for .repo files or Release/InRelease files in the response body (lines 71-88). However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:22-23 returns a MockResponse with no content/text for successful URLs. This means even the valid test case will fail the new validation checks and return None instead of the expected URL. The mock needs to be updated to include realistic HTML content with .repo file references for successful test cases.

Copilot · 2026-01-27T14:11:41Z

+    Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease).
+    Returns the directory URL if valid, None otherwise.


The docstring states "Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease)". However, this is slightly misleading as the function doesn't just validate - it also constructs and returns the URL. Consider updating the docstring to clarify that the function both validates and returns the URL, or returns None if validation fails. For example: "Validates if the URL points to a valid RPM or Debian repository and returns the URL if valid, None otherwise."

Suggested change

Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease).

Returns the directory URL if valid, None otherwise.

Validates if the URL points to a valid RPM repository (contains a .repo file) or Debian repository

(contains Release/InRelease), and returns the URL if valid, None otherwise.

Copilot · 2026-01-27T14:11:41Z

+    rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)
+    if rpm_regex.search(body):
+        return url
+
+    # Fallback for plain text listings of .repo files
+    token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
+    if token_repo_re.search(body):
+        return url
+
+    # --- Strategy 2: Debian/Ubuntu Repository Check ---
+    # Debian repos define a 'Release' or 'InRelease' file at the repository root.
+    # Matches: href="Release" or href="InRelease"
+    deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)


The regex defines a named group 'link' but the named group is not used. The match object from 'search()' is not captured, so the named group serves no purpose. Consider simplifying the regex by removing the named group, as only the existence of a match is checked.

Suggested change

rpm_regex = re.compile(r'href=["\'](?P<link>[^"\']+?\.repo)\b["\']', re.IGNORECASE)

if rpm_regex.search(body):

return url

# Fallback for plain text listings of .repo files

token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)

if token_repo_re.search(body):

return url

# --- Strategy 2: Debian/Ubuntu Repository Check ---

# Debian repos define a 'Release' or 'InRelease' file at the repository root.

# Matches: href="Release" or href="InRelease"

deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)

rpm_regex = re.compile(r'href=["\'][^"\']+?\.repo\b["\']', re.IGNORECASE)

if rpm_regex.search(body):

return url

# Fallback for plain text listings of .repo files

token_repo_re = re.compile(r'https?://[^\s"\'<>]+?\.repo\b|[^\s"\'<>]+?\.repo\b', re.IGNORECASE)

if token_repo_re.search(body):

return url

# --- Strategy 2: Debian/Ubuntu Repository Check ---

# Debian repos define a 'Release' or 'InRelease' file at the repository root.

# Matches: href="Release" or href="InRelease"

deb_regex = re.compile(r'href=["\'](?:In)?Release["\']', re.IGNORECASE)

Copilot · 2026-01-27T14:11:42Z

+        return url
+
+    # Fallback for plain text listings of .repo files
+    token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)


This regex has two named groups both capturing .repo file patterns: 'link' and 'link2'. Having duplicate group names is problematic as only one will be accessible in match results. Additionally, these named groups are never used since only the existence of a match is checked. Consider simplifying to a single pattern without named groups.

Suggested change

token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)

token_repo_re = re.compile(r'https?://[^\s"\'<>]+?\.repo\b|[^\s"\'<>]+?\.repo\b', re.IGNORECASE)

Copilot · 2026-01-27T14:11:42Z

+    # --- Strategy 2: Debian/Ubuntu Repository Check ---
+    # Debian repos define a 'Release' or 'InRelease' file at the repository root.
+    # Matches: href="Release" or href="InRelease"
+    deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)


Similar to the RPM regex, the named group 'link' is defined but never used. Consider removing the named group to simplify the code.

Suggested change

deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)

deb_regex = re.compile(r'href=["\'](?:In)?Release["\']', re.IGNORECASE)

Copilot · 2026-01-27T14:11:42Z

+        logging.error(f"Error requesting: {url}; {e}")
+        return None
+
+    body = res.text or ""


The new create_url function now reads 'res.text' on line 64 to get the response body for validation. However, the test mock at jenkins_pipelines/scripts/tests/mock_response.py:9-13 only defines 'content' attribute, not 'text'. The tests will fail with AttributeError when trying to access res.text. The MockResponse class needs to be updated to include a 'text' property that returns the content.

nodeg · 2026-02-16T09:25:44Z

It was automatically generated by GH Copilot, and I did not test it.

This is problematic and we need to set clear rules for AI generated code. Testing the code, however, should always be done, nevertheless the author of the code.

srbarrios · 2026-02-16T13:36:13Z

It was automatically generated by GH Copilot, and I did not test it.

This is problematic and we need to set clear rules for AI generated code. Testing the code, however, should always be done, nevertheless the author of the code.

@nodeg To make it clear. I opened a GH Card as discussed in our meeting. After that, and just as example, I opened this proposal vibe coded by Gemini. I did not test it because that was an example and I expected someone else to pick the card.

Right now, I'm assigned on the card, and ofc, I will test it before going further. But this card had never been prioritized, so for now, I did not start working on it, it still in "New" column.

srbarrios added 2 commits January 14, 2026 16:32

First small iterarion

0aa72ef

Second iteration including Debian-like logic

5ea2a60

srbarrios changed the title ~~First small iteration~~ Small improvement on sanity checks per MI repository provided on the autogenerated JSON Jan 14, 2026

srbarrios requested a review from a team January 14, 2026 15:40

NamelessOne91 reviewed Jan 19, 2026

View reviewed changes

nodeg requested a review from Copilot January 27, 2026 14:06

nodeg assigned srbarrios Jan 27, 2026

Copilot started reviewing on behalf of nodeg January 27, 2026 14:07 View session

Copilot AI reviewed Jan 27, 2026

View reviewed changes

srbarrios marked this pull request as draft March 11, 2026 10:26

srbarrios changed the title ~~Small improvement on sanity checks per MI repository provided on the autogenerated JSON~~ [Draft] Small improvement on sanity checks per MI repository provided on the autogenerated JSON Mar 11, 2026

	# If the URL itself points directly to a .repo file
	if url.lower().endswith('.repo'):
	return url.rsplit('/', 1)[0] + '/'

		Validates if it's a valid RPM repo (contains .repo) or Debian repo (contains Release/InRelease).
		Returns the directory URL if valid, None otherwise.

	token_repo_re = re.compile(r'(?P<link>https?://[^\s"\'<>]+?\.repo\b)\|(?P<link2>[^\s"\'<>]+?\.repo\b)', re.IGNORECASE)
	token_repo_re = re.compile(r'https?://[^\s"\'<>]+?\.repo\b\|[^\s"\'<>]+?\.repo\b', re.IGNORECASE)

	deb_regex = re.compile(r'href=["\'](?P<link>(?:In)?Release)["\']', re.IGNORECASE)
	deb_regex = re.compile(r'href=["\'](?:In)?Release["\']', re.IGNORECASE)

Conversation

srbarrios commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ktsamis commented Jan 14, 2026

Uh oh!

srbarrios commented Jan 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NamelessOne91 Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

srbarrios Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

NamelessOne91 left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

nodeg commented Feb 16, 2026

Uh oh!

srbarrios commented Feb 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

srbarrios commented Jan 14, 2026 •

edited

Loading

srbarrios commented Jan 19, 2026 •

edited

Loading

srbarrios commented Feb 16, 2026 •

edited

Loading