Avoiding XmlReader Not Normalizing Attribute Values

If you use System.Xml.XmlReader in the .NET Framework to read XML data, the blank characters in the attribute values are not normalized.

Normalization of blank characters in attribute values is as follows:

~~"foo
bar" <"foo bar"~~
"foo bar" <"foo bar"

Note:

@IT: Read gently "XML 1.0 Recommendation" 19th Pitfall-Hidden Attribute Value Normalization

I can normalize it myself when I use XmlReader directly, but I can't think of a good way to normalize it when I pass XmlReader to XmlSerializer for de-serialize it.
Is there any other way to avoid this problem other than the following two methods?

Create normalized XML data using XMLReader directly and then de-serialize with XMLSerializer
→ Because XML data is analyzed twice, processing efficiency will be reduced, and the benefits of using XMLReader will be halved
Create a new XmlReader derivative class and define the Value property to normalize attribute values
→ All abstract and virtual members must be overridden and transferred to an instance created in XmlReader.Create, making implementation difficult.You won't know if you can do it this way until you try it

Action (image) that would be good for both processing and implementation efficiency if:

using(var xmlReader=XmlReader.Create(streamReader,xmlSettings))
{
    // If you could do something like Ruby's unique method...
    public override string xmlReader.Value
    {
        get{
            var value = transferred to .Value
            if(NodeType==XmlNodeType.Attribute)
            {
                //value normalization
            }
            return value;
        }
    }

    t=(T)xmlSerializer.Deserialize(xmlReader);
}

c# .net xml

2022-09-30 12:08

2 Answers

Link to

At first, a character reference such as ሴ means adding the character it indicates to an empty box.Adding to a normalized value means adding characters to an empty box, meaning that characters written in a character reference are treated as characters and are no longer treated.In that respect, the results are clearly different from those described later.You may have a technique to write characters as references.

is described (especially the last sentence).Of course, Extensible Markup Language (XML) 1.0 (Fifth Edition), 3.3.3 Attribute-Value Normalization is also

For a character reference, append the referenced character to the normalized value.

and 
 expands to \r\n and is not replaced by under normal behavior?
In fact,

<element attribute="a
    b&#xD;&#xA;c"></element>

The value of attribute in was ab\r\nc.

The real problem is that "foo bar" does not become "foo bar" (blank before and after deletion and blank compression in the middle).

This is exactly what the link says.

The first half described the conditions under which this regulation is invoked.If DTD is not used, attributes should be treated as CDATA types, so this is a meaningful provision mainly when DTD describes the type of attributes.

If you specify the attribute NMTOKENS in DTD, it will be normalized to the specification, and if it is unspecified or CDATA, it will not be converted to the specification.In addition, XmlReaderSettings.DtdProcessing must be set to DtdProcessing.Parse to handle DTD.

using System;
using System.IO;
using System.Xml;
using System.Xml.Serialization;

public class test
{
    [XmlAttribute]
    public string Nmtokens {get;set;}
    [XmlAttribute]
    public string Cdata {get;set;}
    public stringElem {get;set;}

    public static void Main()
    {
        varxml=@"<!DOCTYPE Test[
          <!ELEMENT TEST (Elem)>
          <!ATTLIST Test Nmtokens NMTOKENS #REQUIRED Cdata CDATA #REQUIRED>
          <!ELEMENTEM(#PCDATA)>
        ] >
        <Test Nmtokens='foo bar'Cdata='foo bar'>
          <Elem>foo bar</Elem>
        </Test>
        ";
        var reader = XmlReader.Create (new StringReader (xml), new XmlReaderSettings {DtdProcessing=DtdProcessing.Parse});
        vartest=(Test) new XmlSerializer(typeof(Test)) .Deserialize(reader);
        Console.WriteLine("Nmtokens=<{0}>, Cdata=<{1}>, Elem=<{2}>", test.Nmtokens, test.Cdata, test.Elem);
        // Nmtokens=<foo bar>, Cdata=<foo bar>, Elem=<foo bar>
    }
}

Although the entire question statement states that the translation is in accordance with the specification, such as "Blank characters in attribute values are not normalized", it should be recognized that you want an out-of-spec special translation.

2022-09-30 12:08

You can inherit the XmlTextReader and Value property only override if you do not want to do asynchronous processing.

2022-09-30 12:08

If you have any answers or tips

Popular Tags

python x 4647

android x 1593

java x 1494

javascript x 1427

c x 927

c++ x 878

ruby-on-rails x 696

php x 692

python3 x 685

html x 656