MCE discussion points

Fri Apr 5 02:46:24 CEST 2013

Hi all -
I tracked down a few of the original authors of MCE and have had been in an e-mail discussion with them for a while.  The discussion has largely run its course and it's worth sharing some of their comments with you.  The first comment is a larger question I asked based on some of our past discussion in WG 4.  The second is about the proposed ExtensionElements.  The remaining ones are comments on the list of open issues Murata-san noted in the slides he presented in Copenhagen, WG 4 N 0253<http://www.itscj.ipsj.or.jp/sc34/wg4/archive/sc34-wg4-2013-0253.zip>.  Perhaps we can discuss some of these on the call next week.  I hope you find this useful information!

John

===== COMMENT 1: Suspension and resumption of MCE processing =====
John asked:
Section 12 of IS 29500 Part 3 (MCE) indicates that, even though MCE processing is normally suspended within application-defined extension elements, a markup specification may resume MCE processing within extension element.  However, no indication is given how this is to be specified. "A specification for an element nested somewhere within..."  As part of the ongoing revision of MCE, it has been proposed to either make this explicit via new MCE markup or to remove this "feature".  Are you aware of any implementations of MCE that actually use this?

Reply:
Both of the features referenced were used in XAML.

The quoted part of the specification appears to be intended to capture the behavior of the XAML parser processing content of an XmlDataSource. A XAML document author can supply the XML data inline in a XAML document using an XmlDataSource element. The XML is taken literally with no markup compatibility rules applied.

This feature allows literal quoting of XML that might or might not contain its own markup-compatibility attributes. It prevents declarations that were intended to apply to the container from applying to nested XML content.
ProcessContent was intended to allow markup-compatibility documents to simulate HTML's default processing of unknown elements.  The motivating example was,
    <p>This is some <v2:emphasis>important</v2:emphasis> text</p>
where v2:emphasis was an element introduced in version 2 that would display the text as emphasized in some version 2 specific way but the author was OK with the text appearing normal in v1. ProcessContent allows emphasis to be ignored without losing the word "important".

===== COMMENT 2: ExtensionElements =====
John note: We had a lengthy discussion about the utility of specifying what is an extension element.  After a bit of back-and-forth, I tend to agree with the argument that it's not necessary to know which elements are extension elements.  In short, the MCE preprocessor doesn't care that something is an extension element, it just cares whether the namespace of a given element is understood by whatever get the processed markup.  That next module may know and care that an element is an extension so it knows how to parse it, but that's out of scope for the MCE preprocessor.

Comment:
Let me try to explain my confusion. I think it can best be explained by the concepts below of the processed and raw info-sets. MCE can be thought of as a processing algorithm to turn raw info-sets into processed info-sets. If the processing is successful, the processed info-set would only contain elements and attributes declared as understood to the MCE processor. That is, all non-understood elements and attributes are either removed (if they are ignorable), re-namespaced (for subsumed namespaces) or cause the processing to fail (MustUnderstand).

Given this, I don't see where ExtensionElements fits. Either an element or attribute is understood or it is not. I am not sure how to think about an element that is some middle ground. If a consumer of the processed info-set can make sense of the element (e.g., it is valid in its schema), then the namespace is understood. Understood, from an MCE perspective, is just whether or not the namespace should be part of the processed info-set, not that it is fully understood by the consumer of the processed info-set. Either the namespace appears or it doesn't, I don't see a middle ground. In this light, extensions are just understood content, unless I am missing something important.

===== COMMENT 3: Whether to copy namespace declarations to ProcessContent children =====
I agree with the info-set recommendation. Some of these issues become trivial if we are talking about transforms on an info-set instead of textual transformations. We discussed this when originally designing MCE but we didn't want to require info-sets on schemas incorporating MCE.

===== COMMENT 4: Whether to disallow certain seemingly nonsensical combinations of MCE attributes/elements =====
I don't have a strong opinion on this. It is important semantically that the MCE XML be well formed and valid but whether you disregard or reject nonsensical  combinations is a style choice as long as you clearly define what was is disregarded.

John note: We should do some due diligence to check for usage of any combinations we explicitly disallow.

===== COMMENT 5: Whether only xml:lang and xml:space should be prohibited or also xml:base (others?) =====
We only rejected xml:lang and xml:space because they affect the text content of an element and, if the containing element is removed, there is no place to copy them. If the attribute can be copied to the lifted child element and preserve the intent of the original document then they should be allowed. With xml:lang and xml:space we would have to introduce an element to just hold these attributes. It was simpler just to reject them. If xml:base affects the meaning of the text content then it should also be rejected. If MCE was moved to an info-set module, these could be brought back since they affect how the info-set is constructed and are not strictly part of the info-set itself.

John note: I believe the last sentence was a slight nod to the hypothetical option of requiring info-sets when using MCE, which I don't think we want to mandate.

===== COMMENT 6: Should we try to specify the timing of mismatch errors? =====
The specifications should only dictate when an MCE document is valid or invalid, not how the errors are detected or when they are reported.

John note: I believe we also agreed on this in Copenhagen.

===== COMMENT 7: Can unwrappable/nwrapped elements specify @Ignorable, @ProcessContent, @MustUnderstand, or @ExtensionElements? =====
They must be allowed to have them and there is no reason to disallow them. If they were disallowed you will run into situation where you could construct a document that is understandable by any two of three processors but not all three. That is, consider a v1 that would unwrap an element that is not unwrapped by v2 and v3 but contains content that should be ignored by v2. The document should be able to specify how this content should be interpreted by v3 even though this element is unwrapped by v1. These attributes do not make it to the processed info-set (they are only in the MCE info-set) and would be understood by all versions, even if they are not acted on (or acted on differently), so there is no reason to disallow them. BTW, using the unwrappable/unwrapped element is a bit redundant. Such rules need to be enforced on unwrappable elements, regardless of whether they are actually unwrapped.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20130405/4ed8fbae/attachment.htm>