Skip to content

Restore XML conformant logic for escaping < and > characters #2965

Open
kwokcb wants to merge 3 commits into
AcademySoftwareFoundation:mainfrom
kwokcb:pugixml_escaped_chars
Open

Restore XML conformant logic for escaping < and > characters #2965
kwokcb wants to merge 3 commits into
AcademySoftwareFoundation:mainfrom
kwokcb:pugixml_escaped_chars

Conversation

@kwokcb

@kwokcb kwokcb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Changes

Update #2956

From a user point of all run-time strings will still show < and >. The only difference is this enforces writing conformant escaped characters to file. This does not disallow users from using unescaped characters by hand editing.

Implementation

Replace MaterialX specific code for writing unescaped characters < and > with original PugiXML logic.

Tests

  • Add unit test for writing escaped characters
  • Add test to read back escaped output to make sure that unescaping works properly.

Example

Here is an original UDIM test file:

<?xml version="1.0"?>
<materialx version="1.38">
  <image name="image_color" type="color3">
    <input name="file" type="filename" value="resources/Images/grid_udim/grid.<UDIM>.png" colorspace="srgb_texture" />
  </image>
  ...
</materialx>

Here is the escaped version written out:

<?xml version="1.0"?>
<materialx version="1.39">
  <image name="image_color" type="color3" xpos="1.724638" ypos="0.000000">
    <input name="file" type="filename" value="resources/Images/grid_udim/grid.&lt;UDIM&gt;.png" colorspace="srgb_texture" />
  </image>
  ...
</materialx>

Loading either one in results in the same unescaped user string be stored at run-time and shown the UI. This is using the GraphEditor:

image

…< and > characters.

Expose a "strict" XML mode on XMLWriteOptions. Default is false which means to not escape.
As read supports both and escaped and non-escaped < and > support has not changed.
Added test case to allow writing of either and check for read of escaped.
@kwokcb

kwokcb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

Pinging @ashwinbhat, @ld-kerley, and @jstone-lucasfilm for feedback / review.

I'm not sure why < and > was ever modified to not be escaped as from a user perspective the only time you'd notice is if you hand edit a file -- but even here you can write either escaped or non-escaped, and run-time is always unescaped. A standard XML parser will always handle this as does PugiXML before it was modified to have this non-conformant behaviour.

Comment thread source/MaterialXFormat/External/PugiXML/pugixml.cpp Outdated
Comment thread source/MaterialXFormat/External/PugiXML/pugixml.cpp Outdated
Comment thread source/MaterialXFormat/XmlIo.cpp Outdated
REQUIRE(fileInput != nullptr);
std::string unescaped_string = fileInput->getValueString();
// There should not be any escaped characters in the value string
REQUIRE(unescaped_string.find("&lt;&gt;") == std::string::npos);

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was always the logic as read always unescapes.

@ashwinbhat

Copy link
Copy Markdown
Contributor

HI @kwokcb having the option is helpful but it does create a situtation where mtlx generated by MaterialX will be non-conformant. As per

filevalue = "cmsscheme:myassetdiffuse.<UDIM>.tif?ver=current"
UDIM tags are have < and >, should this be updated to mention that these should be encoded for XML?

@kwokcb

kwokcb commented Jun 8, 2026

Copy link
Copy Markdown
Contributor Author

I have put this up to allow maintaining "previous" (undesired) behaviour for discussion.

I would advocate to remove this option and always require encoding for any SGML derivatives (XML, HTML etc).
( HTML5 I believe has looser constraints.)

Hence the spec would need the clarification you note.

@kwokcb

kwokcb commented Jun 9, 2026

Copy link
Copy Markdown
Contributor Author

After discussing with @ashwinbhat offline, I have made this restore PugiXML behaviour for < and > to write conformant XML all the time.

@kwokcb kwokcb changed the title Add XML option to disable writing of invalid unescaped <> characters Restore XML conformant logic for escaping < and > characters Jun 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants