Excel & ISO 8601 times

John Haug johnhaug at exchange.microsoft.com
Wed Mar 21 15:45:54 CET 2012


New DR submitted by Ecma attached.  Rex, can you add this to the log?  Attaching here with WG4 cc'd as FYI since the call is tomorrow.

Thanks!
John

From: John Haug [mailto:johnhaug at exchange.microsoft.com]
Sent: Wednesday, March 21, 2012 6:55 AM
To: e-SC34-WG4 at ecma-international.org
Subject: FW: Excel & ISO 8601 times

Hi all -
I've been talking with Gareth, Jesper, Francis and the member formerly known as Chris about an issue with the ISO 8601 dates/times requirements that were added by Amendment 2 (:2011 Amd 1).  We wrapped up discussion last week and I submitted a new proposed DR to Ecma TC45.  I expect TC45 to approve that today, in which case I will have that DR out to WG 4 today.  Sorry for the short notice, but that's just how the timing worked.  I'd like to add this topic to the DRs agenda item.

This one will probably have a bit of discussion and I'm forwarding our discussions below as a basis for the one on the call tomorrow.

Thanks,
John

From: John Haug
Sent: Tuesday, March 20, 2012 1:43 PM
To: 'Horton, Gareth'; 'Jesper Lund Stocholm'; Chris Rae
Cc: 'Francis Cave'
Subject: RE: Excel & ISO 8601 times

TC45 has been looking at the new DR I wrote up and I expect I'll be able to submit it tomorrow.  Anyone mind if I forward this thread to WG 4?  I think it would be useful for others to see the discussion we've had and not have to completely rehash all the pros and cons from scratch on the call this week.

From: John Haug
Sent: Wednesday, March 14, 2012 12:44 PM
To: 'Horton, Gareth'; Jesper Lund Stocholm; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Good comments.  I'll amend what I submit in the DR to note a few options for the suggested fix.

From: Horton, Gareth [mailto:Gareth_Horton at datawatch.com]<mailto:[mailto:Gareth_Horton at datawatch.com]>
Sent: Wednesday, March 14, 2012 12:29 PM
To: Jesper Lund Stocholm; John Haug; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Hi all,

Yes, I think "should" is certainly necessary unless we want to add a lot more text and content to the standard, as well as make schema changes.

Although John is of the view that more precision is better, that is only true from an intra-application perspective, not an interop perspective.

In terms of financial applications and standards concerning finance, it is common to specify levels of precision within the standard - so you assert the level of precision the value is being stored in perhaps an element or attribute related the value. That way one knows whether the value is exact, or could be already be subject to rounding errors, even at the source.

For example XBRL encountered many issues coming from inconsistent levels of precision, which was addressed by adding a precision attribute IIRC.

See this document: http://www.xbrl.org/RFC/PDU/PWD-2008-10-09/PDU-RFC-PWD-2008-10-09.html

Applications will need to be mindful of dealing with values carefully and not entering into a precision arms race as John seems to favour.

For example 1,000,000.00 may be exactly that - applications that take that value and then store it as 1,000,000.000000 are incorrect, since it relates to cash transactions and there is no such thing as a microcent.

Although these examples are not date/time specific, they are illustrative.

I did find some useful information from an old communications spec:

"Standards developers shall make provision for receiving systems that require less than six digits of precision in the Fractional Second component to ignore any unneeded trailing decimal places without adversely affecting interoperability. (Explanatory note: A receiving application entity that represents Time values with low precision shall not return error messages to a sending application entity that represents Time values with higher precision.)"

Again, the problem is that precision is explicitly asserted in instance document elsewhere (as in Databases), whereas in SML it is not.

If you want examples of interop issues with fractional seconds, you only have to look at Microsoft Office & SQL Server

http://support.microsoft.com/kb/225334

Note this problem was never fixed and described as "by design".  This affected our product which depended on these Microsoft components. Essentially a complete failure occurred in the MS component, since there was no code to deal with precision mismatches.

A SQL Server issue we also came up against

http://rightondevelopment.blogspot.com/2009/10/sql-server-native-client-100-datetime.html

And of course, the world-famous MySQL problem with not even supporting millisecond precision (There's even a "Downfall" video):

http://stackoverflow.com/questions/2572209/why-doesnt-mysql-support-millisecond-microsecond-precision

Note that this is due to the fact that SQL92 compliance does not dictate any precision beyond seconds.

On another note, we do not support fractional seconds in Monarch (and hence would not be able to fully support Strict if we go with a minimum of 3), since we use Microsoft Access as an underlying database. Access does not support fractional seconds in the Date/Time datatype.

Although I support the notion that we should allow flexibility to go beyond millisecond precision, I just want to explain that it does come at the cost of interoperability. Giving a strong steer that using millisecond precision is a good idea is what we should aim for. Perhaps in the future, we could add information to explicitly assert the precision of values into the spec in another attribute.


Gareth

________________________________
From: John Haug
Sent: Wednesday, March 14, 2012 12:06 PM
To: 'Jesper Lund Stocholm'; Horton, Gareth; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

I'll get a DR written up ASAP and try to get it through Ecma before our call next week.  Thanks, all.


From: Jesper Lund Stocholm [mailto:jesper.stocholm at ciber.com]<mailto:[mailto:jesper.stocholm at ciber.com]>
Sent: Wednesday, March 14, 2012 9:14 AM
To: John Haug; Horton, Gareth; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Gentlemen,

If you look at the ISO definitions of the word "should", it says:

The verbal forms shown in Table H.2 shall be used to indicate that among several possibilities
one is recommended as particularly suitable, without mentioning or excluding others, or that a
certain course of action is preferred but not necessarily required, or that (in the negative form)
a certain possibility or course of action is deprecated but not prohibited.

Verbal form

Equivalent expressions for use in exceptional cases

Should

it is recommended that
ought to

Should not

it is not recommended that
ought not to


I think this conveys what Gareth mentioned (and I support him on this) pretty well - that we as "standard makers" communicate our intent in terms of providing a suggested minimum level of interop while still allowing implementers to deviate from this intent. Using the word "should" is indeed the clear opening for implementers that you seek.

I do agree with Francis, though - let's get a DR in the pipe-line for this and discuss it at the next Teleconference.


Med venlig hilsen / Best regards

Jesper Lund Stocholm

CIBER Danmark A/S
Mobil: +45 3094 5570
Email: jesper.stocholm at ciber.com<mailto:jesper.stocholm at ciber.com>

[Description: Description: mvp_avatar]

CONFIDENTIALITY NOTICE AND NOTICE REGARDING NO ELECTRONIC SIGNATURE: The materials in this electronic transmission (including attachments) may be subject to attorney-client or work product privilege, may be private and confidential and are the property of the sender. The information contained is intended only for the named addressee(s) and should not be considered evidence of intent to be bound to any agreement. The taking of any action in reliance on the contents is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender and promptly delete this message.

________________________________
From: Francis Cave [mailto:francis at franciscave.com]<mailto:[mailto:francis at franciscave.com]>
Sent: Tuesday, March 13, 2012 4:26 PM
To: John Haug; 'Jesper Lund Stocholm'; 'Horton, Gareth'; Chris Rae
Subject: RE: Excel & ISO 8601 times

Hi all - I've stayed out of this discussion because I'm not that familiar with the detail and Gareth knows far more than I do about what makes sense in dates in SpreadsheetML. However, my impression from the discussion is that there is definitely some kind of defect in the current text, since it would appear that it was never the intention of Gareth and Chris to limit implementations to store a maximum of 3 decimal places for seconds. Filing a DR will get this discussed by the WG as a whole, and in the unlikely event that no change is needed, the DR can be closed without action.

Francis

From: John Haug [mailto:johnhaug at exchange.microsoft.com]<mailto:[mailto:johnhaug at exchange.microsoft.com]>
Sent: Tuesday, March 13, 2012 11:23 PM
To: Jesper Lund Stocholm; Horton, Gareth; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Any comments on that suggestion?

From: John Haug
Sent: Thursday, March 08, 2012 10:51 AM
To: 'Jesper Lund Stocholm'; Horton, Gareth; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

I think "should" still says that the standard implies that 3 is what implementations should target.  I'd be more in favor of a clear opening for implementations to be better, but with a minimum bar for the consistency that Gareth is looking for.

So it seems there is budding consensus that this needs to be changed.  Since I started this kerfluffle, er, discussion, I can file a DR through Ecma and we can all discuss it on the next call in a few weeks.  Sound OK?

From: Jesper Lund Stocholm [mailto:jesper.stocholm at ciber.com]<mailto:[mailto:jesper.stocholm at ciber.com]>
Sent: Thursday, March 08, 2012 1:53 AM
To: John Haug; Horton, Gareth; Chris Rae
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Hi all,

I think a maximum of three decimal places is waaay too low for any real use. You don't need to add more than a couple of non-integer numbers before you ram into rounding errors that would seem illogical to the users.

The best way to deal with this in our situation would be to switch the "shall" with "should". That would allow implementations to use a broader range - while preserving the intent of the makers of the standard (us) on a setting a suggested minimum interoperability level.



Med venlig hilsen / Best regards

Jesper Lund Stocholm

CIBER Danmark A/S
Mobil: +45 3094 5570
Email: jesper.stocholm at ciber.com<mailto:jesper.stocholm at ciber.com>

[Description: Description: mvp_avatar]

CONFIDENTIALITY NOTICE AND NOTICE REGARDING NO ELECTRONIC SIGNATURE: The materials in this electronic transmission (including attachments) may be subject to attorney-client or work product privilege, may be private and confidential and are the property of the sender. The information contained is intended only for the named addressee(s) and should not be considered evidence of intent to be bound to any agreement. The taking of any action in reliance on the contents is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender and promptly delete this message.

From: John Haug [mailto:johnhaug at exchange.microsoft.com]<mailto:[mailto:johnhaug at exchange.microsoft.com]>
Sent: Thursday, March 08, 2012 12:43 AM
To: Horton, Gareth; Chris Rae; Jesper Lund Stocholm
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

> which is why 3 should be the default and minimum
That seems a reasonable thing to do in the standard.  Is it an error that the Amendment says "no more than"?  Should that have been "no fewer than"?  If so, that's an easy COR to the AMD that would run through the "rapid COR" path Makoto talked about when we were discussing the 2-year publish cycle question in Prague.

*         The decimal separator shall be a full stop (period), and fractional seconds shall be expressed with no more than three decimal places.

If the standard allowed an implementation to store more than 3 decimal places without that being a deviation, I think that would be much better than the current situation.  It would set a minimum bar for consistency without restricting finer precision, introducing interoperability problems (i.e., readers would be on notice that there may be more than 3 decimal places) or requiring an external extension defined over in left field.  Is this acceptable to people?


> there is no reason it can't decide to write in transitional instead
True, but limitations like this would make it hard to ever see Strict become the default file format.  We can't have the native format be lossy by definition.

> they are not the core user community of spreadsheet users and indeed, are using them for a job they were not intended to perform
Not the core, no, but desktop spreadsheet software has become a widely accepted analytic tool in more than a couple industries, in no small part because Excel has provided reliable fine precision in its calculation engine.  Scary as it may seem on first blush, Excel sheets run half the world's financial systems!  This is why we're so skittish about any sort of data loss, particularly in spreadsheets, even if it's not in the core mathematical calculations.


From: Horton, Gareth [mailto:Gareth_Horton at datawatch.com]<mailto:[mailto:Gareth_Horton at datawatch.com]>
Sent: Wednesday, March 07, 2012 3:25 PM
To: John Haug; Chris Rae; Jesper Lund Stocholm
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Hi John,

I think flexibility is useful, which is why 3 should be the default and minimum, but we should consider allowing implementations to store more if they document it.

The second part of your argument would of course be the same if the analyst etc was consuming in different applications.

If the precision causes backward compatibility issues for Excel, there is no reason it can't decide to write in transitional instead. It's one solution, but more top end flexibility is probably a reasonable tradeoff at the cost of interoperability.

We should probably keep in mind that we are dealing with document formats, not database or analysis software file formats.

No doubt there are edge case users of Excel that want to store time in sub-millisecond precision, but they are not the core user community of spreadsheet users and indeed, are using them for a job they were not intended to perform.

Perhaps it is a case for an extension to store "high precision" data, so as to maintain a reasonable bar for implementors of software that want to produce and consume typical spreadsheets in a very interoperable way and not have to deal with such edge cases.

Gareth

Gareth Horton
Senior Product Manager
Datawatch Corporation
________________________________
From: John Haug
Sent: 07/03/2012 17:50
To: Horton, Gareth; Chris Rae; Jesper Lund Stocholm
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times
Hmm, I'm not sure that attempting to enforce consistency across implementations by creating a lowest common denominator for data storage is the right thing.  I don't see how that is more beneficial than data loss and creating inconsistency between what a user sees at creation time and after saving/loading the file.  I don't see how interoperability is harmed by allowing freedom in data precision.  Given that different implementations will have different internal precisions (e.g., single- vs. double-precision), perhaps for a good reason depending on what a particular implementation's usage scenario focus is, hobbling the data format seems detrimental.  I think - and I'll bet most or all users would agree - that consistency within a single implementation is 100% expected, but there is a slightly lower bar for expecting n-th decimal consistency when sharing between different implementations since there will be variances in each implementation's internal precision and calculation engines.

What happens if an analyst is dealing with timestamped financial transactions and the data changes between the initial user session for some piece of data and the next session?  How about a scientist/engineer sifting through myriad timestamped measurements?  What if time data is calculated in a very specific way, perhaps by a formula?  It leads to unexpected changes that can cause problems for users.

From: Horton, Gareth [mailto:Gareth_Horton at datawatch.com]<mailto:[mailto:Gareth_Horton at datawatch.com]>
Sent: Wednesday, March 07, 2012 2:21 PM
To: John Haug; Chris Rae; Jesper Lund Stocholm
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

In an attempt to try and get consistent results across implementations.

Gareth

Gareth Horton
Senior Product Manager
Datawatch Corporation
________________________________
From: John Haug
Sent: 07/03/2012 17:17
To: Chris Rae; Jesper Lund Stocholm; Horton, Gareth
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times
I'm confused as to why we would mandate the number of decimal places in the XML/decimal representation of the data in the first place.  What does it matter?  Why not let an implementation write out as much precision as it can (or wants to) and read in as much as it is able (or wants to)?

John

From: Chris Rae
Sent: Wednesday, March 07, 2012 11:21 AM
To: Jesper Lund Stocholm; Horton, Gareth; John Haug
Cc: Francis Cave
Subject: RE: Excel & ISO 8601 times

Thanks for digging this up, as my memory of the situation appears to be entirely wrong. I remember being against restricting this, and Gareth wanting to have 3 dps.

It looks to me like Gareth and I were intending increasing the number of DPs and then just forgot to work it into the draft?

Chris

From: Jesper Lund Stocholm [mailto:jesper.stocholm at ciber.com]<mailto:[mailto:jesper.stocholm at ciber.com]>
Sent: 07 March 2012 05:09
To: Horton, Gareth; John Haug
Cc: Chris Rae; Francis Cave
Subject: RE: Excel & ISO 8601 times

Hi all,

The thread starts here: http://mailman.vse.cz/pipermail/sc34wg4/2010-April/001401.html

I have a private email from Chris to Gareth and me (document attached here) with a first draft of what became to be the ISO profile where the "3 decimal places" is included - I have no idea of how it got there.

On July 8th Chris sent a first public draft of the amendment to the email list (document also attached here).

I'd be happy to submit a defect report / comment on this.





Med venlig hilsen / Best regards

Jesper Lund Stocholm

CIBER Danmark A/S
Mobil: +45 3094 5570
Email: jesper.stocholm at ciber.com<mailto:jesper.stocholm at ciber.com>

[Description: Description: mvp_avatar]

CONFIDENTIALITY NOTICE AND NOTICE REGARDING NO ELECTRONIC SIGNATURE: The materials in this electronic transmission (including attachments) may be subject to attorney-client or work product privilege, may be private and confidential and are the property of the sender. The information contained is intended only for the named addressee(s) and should not be considered evidence of intent to be bound to any agreement. The taking of any action in reliance on the contents is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender and promptly delete this message.

From: Horton, Gareth [mailto:Gareth_Horton at datawatch.com]<mailto:[mailto:Gareth_Horton at datawatch.com]>
Sent: Wednesday, March 07, 2012 1:45 PM
To: Jesper Lund Stocholm; John Haug
Cc: Chris Rae; Francis Cave
Subject: RE: Excel & ISO 8601 times

Hi all,

I'm travelling at the moment, so a little difficult to check.

However, there were mails sent to the WG4 mailing list on this. Search for "precision".

As far as I recall, I posited a straw man of using milliseconds precision and Chris went back to some experts on the Excel team.

It seemed acceptable, so we went with it.

However, I think we should have gone with a minimum of 3 dp for fractional seconds, but allowed greater precision documented by those implementations, as in the XML Schema datatypes spec.

I'm not sure if we can fix this, as I only have my phone and can't recall the state of the amendment and whether it's possible for a NB to submit a comment to address this before publication.

I've copied Francis on this for his input.

Gareth

Gareth Horton
Senior Product Manager
Datawatch Corporation
________________________________
From: Jesper Lund Stocholm
Sent: 07/03/2012 07:27
To: John Haug
Cc: Horton, Gareth; Chris Rae
Subject: RE: Excel & ISO 8601 times
Hi guys,

Even though Chris and Gareth were the main drivers on the ISO date thingy, I was a bit involved too.

Sadly, I have absolutely no recollection of how/when the "three decimal places" got into the spec.

(adding Chris to the thread)



Med venlig hilsen / Best regards

Jesper Lund Stocholm

CIBER Danmark A/S
Mobil: +45 3094 5570
Email: jesper.stocholm at ciber.com<mailto:jesper.stocholm at ciber.com>

[Description: Description: mvp_avatar]

CONFIDENTIALITY NOTICE AND NOTICE REGARDING NO ELECTRONIC SIGNATURE: The materials in this electronic transmission (including attachments) may be subject to attorney-client or work product privilege, may be private and confidential and are the property of the sender. The information contained is intended only for the named addressee(s) and should not be considered evidence of intent to be bound to any agreement. The taking of any action in reliance on the contents is strictly prohibited. If you are not the intended recipient of this message, please immediately notify the sender and promptly delete this message.

From: John Haug [mailto:johnhaug at exchange.microsoft.com]<mailto:[mailto:johnhaug at exchange.microsoft.com]>
Sent: Wednesday, March 07, 2012 12:56 AM
To: Horton, Gareth (Gareth_Horton at datawatch.com<mailto:Gareth_Horton at datawatch.com>)
Cc: Jesper Lund Stocholm
Subject: Excel & ISO 8601 times

Hi guys -
Jesper filed a bug about Excel "15" not truncating time-only values to three decimal places.  That seemed rather odd but the amendment does say that in 18.17.4.  I recall this was something Chris and Gareth worked on, before I got re-involved.  I'm curious what the rationale is for that truncation, as it seems to clearly lead to loss of precision / data loss.

Thanks,
John
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20120321/c7099b1b/attachment-0001.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 2095 bytes
Desc: image001.jpg
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20120321/c7099b1b/attachment-0001.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: _New DR - ISO 8601 time precision.docx
Type: application/vnd.openxmlformats-officedocument.wordprocessingml.document
Size: 89043 bytes
Desc: _New DR - ISO 8601 time precision.docx
URL: <http://mailman.vse.cz/pipermail/sc34wg4/attachments/20120321/c7099b1b/attachment-0001.bin>


More information about the sc34wg4 mailing list