Avoiding XmlReader Not Normalizing Attribute Values

Asked 2 years ago, Updated 2 years ago, 98 views

If you use System.Xml.XmlReader in the .NET Framework to read XML data, the blank characters in the attribute values are not normalized.

Normalization of blank characters in attribute values is as follows:

  • "foo&#xD;&#xA;bar" <"foo bar"
  • "foo bar" <"foo bar"

Note:

I can normalize it myself when I use XmlReader directly, but I can't think of a good way to normalize it when I pass XmlReader to XmlSerializer for de-serialize it.
Is there any other way to avoid this problem other than the following two methods?

  • Create normalized XML data using XMLReader directly and then de-serialize with XMLSerializer
    → Because XML data is analyzed twice, processing efficiency will be reduced, and the benefits of using XMLReader will be halved
  • Create a new XmlReader derivative class and define the Value property to normalize attribute values
    → All abstract and virtual members must be overridden and transferred to an instance created in XmlReader.Create, making implementation difficult.You won't know if you can do it this way until you try it

Action (image) that would be good for both processing and implementation efficiency if:

using(var xmlReader=XmlReader.Create(streamReader,xmlSettings))
{
    // If you could do something like Ruby's unique method...
    public override string xmlReader.Value
    {
        get{
            var value = transferred to .Value
            if(NodeType==XmlNodeType.Attribute)
            {
                //value normalization
            }
            return value;
        }
    }

    t=(T)xmlSerializer.Deserialize(xmlReader);
}

c# .net xml

2022-09-30 12:08

2 Answers

Link to

At first, a character reference such as means adding the character it indicates to an empty box.Adding to a normalized value means adding characters to an empty box, meaning that characters written in a character reference are treated as characters and are no longer treated.In that respect, the results are clearly different from those described later.You may have a technique to write characters as references.

is described (especially the last sentence).Of course, Extensible Markup Language (XML) 1.0 (Fifth Edition), 3.3.3 Attribute-Value Normalization is also

  • For a character reference, append the referenced character to the normalized value.

and &#xD;&#xA; expands to \r\n and is not replaced by under normal behavior?
In fact,

<element attribute="a
    b&#xD;&#xA;c"></element>

The value of attribute in was ab\r\nc.

The real problem is that "foo bar" does not become "foo bar" (blank before and after deletion and blank compression in the middle).

This is exactly what the link says.

The first half described the conditions under which this regulation is invoked.If DTD is not used, attributes should be treated as CDATA types, so this is a meaningful provision mainly when DTD describes the type of attributes.

If you specify the attribute NMTOKENS in DTD, it will be normalized to the specification, and if it is unspecified or CDATA, it will not be converted to the specification.In addition, XmlReaderSettings.DtdProcessing must be set to DtdProcessing.Parse to handle DTD.

using System;
using System.IO;
using System.Xml;
using System.Xml.Serialization;

public class test
{
    [XmlAttribute]
    public string Nmtokens {get;set;}
    [XmlAttribute]
    public string Cdata {get;set;}
    public stringElem {get;set;}

    public static void Main()
    {
        varxml=@"<!DOCTYPE Test[
          <!ELEMENT TEST (Elem)>
          <!ATTLIST Test Nmtokens NMTOKENS #REQUIRED Cdata CDATA #REQUIRED>
          <!ELEMENTEM(#PCDATA)>
        ] >
        <Test Nmtokens='foo bar'Cdata='foo bar'>
          <Elem>foo bar</Elem>
        </Test>
        ";
        var reader = XmlReader.Create (new StringReader (xml), new XmlReaderSettings {DtdProcessing=DtdProcessing.Parse});
        vartest=(Test) new XmlSerializer(typeof(Test)) .Deserialize(reader);
        Console.WriteLine("Nmtokens=<{0}>, Cdata=<{1}>, Elem=<{2}>", test.Nmtokens, test.Cdata, test.Elem);
        // Nmtokens=<foo bar>, Cdata=<foo bar>, Elem=<foo bar>
    }
}

Although the entire question statement states that the translation is in accordance with the specification, such as "Blank characters in attribute values are not normalized", it should be recognized that you want an out-of-spec special translation.


2022-09-30 12:08

You can inherit the XmlTextReader and Value property only override if you do not want to do asynchronous processing.


2022-09-30 12:08

If you have any answers or tips


© 2024 OneMinuteCode. All rights reserved.