This is so obscure i hesitate to blog about it, except that it took me so long to figure out that i’d love to save somebody else the trouble. You won’t care unless:
- You’re designing an XML Schema definition (.xsd) to validate an XML file
- You’re defining an element to contain regular text, or multiple elements, in any order, from zero to many times
Here’s an example: suppose you have a plain text description of events that includes people, places, and Bible references.
Jesus heals Simon’s mother-in-law (Matt 8:14-17; Mark 1:29-34; Luke 4:38-41)
You want to link person references with a Link element, Bible references with a Reference element, and otherwise leave the plain text as is. This results in something like this (using square brackets since otherwise WordPress gets confused):
[Link]Jesus[/Link] heals [Link]Simon[/Link]’s mother-in-law ([Reference]Matt 8:14-17[/Reference]; [Reference]Mark 1:29-34[/Reference]; [Reference]Luke 4:38-41[/Reference])
Now imagine several of these in the same element, so potentially you can have any arbitrary sequence of Links, References, and plain text, in any order, any number of times. Describing this with a BNF grammar is trivial:
LinkRef ::= Link | Reference
TextItem ::=Â ( text | LinkRef )+
A cursory reading of the XML Schema description (which i’d never actually done before, instead depending on XMLSpy which generally lets me avoid thinking that hard) might make you think grouping models like sequence
, choice
, and all
in conjunction with attributes like minOccurs
and maxOccurs
would do what you need. But there’s a surprisingly complex set of interactions between these, that i still don’t really understand, and so what seemed so simple proved surprisingly hard. Here are a few examples of what i tried, where XMLSpy’s validation model for XSD files (which i’m assuming is correct) wouldn’t allow it:
- while
all
is for an unordered group of elements, it’s restricted tomaxOccurs
=1. So it doesn’t handle unbounded occurrence (though it does allowminOccurs
=0, e.g. optionality). Furthermore, it can’t be nested inside other model groups likesequence
. choice
groupings can be neither optional nor unbounded.- trying to specify multiple occurrences of both Link and Reference, each both optional and unbounded, is flagged as an ambiguous model.
The solution i finally discovered (after embarrassingly many other permutations, more by trial and error than anything else):
- define a LinkRef
group
that allows asequence
of either Link or Reference, both optional and unbounded (zero to many occurrences) - the TextItem (enclosing parent) element allows an optional and unbounded
sequence
of LinkRefgroup
s.
For the more visually oriented, here’s how it looks in XMLSpy:
Sorry to hear of your frustrations. It seems that XML Spy is misleading you in a couple of regards.
1. Choice groups can be optional and/or unbounded (using the minOccurs and maxOccurs attributes).
2. I don’t see how your final solution allows text, or at least it’s obscured. The key is the presence of mixed=”true”, but I don’t see that (or know what to look for) in the diagram.
“Mixed content” is the term you’re looking for. It describes content that can consist of both elements and text. And XSD uses mixed=”true” on the complexType element as a way of signifying that text is allowed. If you got it to work, then I suspect that it’s happening under the covers, but only as a side effect of the unnecessarily convoluted approach you ended up taking.
I can totally understand your reason for using a tool like XML Spy (so you don’t have to mess directly with XSD’s ugly syntax), but it sounds like it has its own learning curve too.
Fortunately, you can have your cake and eat it too (simple/intuitive interface and XSD as the result). Use RELAX NG instead and use Trang to convert to XSD automatically.
If you know BNF, then you’ll love the RELAX NG Compact Syntax. For one thing, it doesn’t treat mixed content as a special, strange beast like XSD does. Instead, it’s just another kind of node like you’d expect (using the keyword “text” as below). There are a couple of ways you could define what you want, but here’s one that’s basically isomorphic to your BNF example:
element example {
( LinkRef
| text
)*
}
LinkRef =
( element Link { text }
| element Reference { text }
)
Here’s the tutorial:
http://relaxng.org/compact-tutorial-20030326.html
You can validate documents directly against RELAX NG, using Jing, avoiding XSD entirely (and its limitations):
http://www.thaiopensource.com/relaxng/jing.html
Or you can use Trang to convert your .rnc file to an .xsd file at the push of a button:
http://www.thaiopensource.com/relaxng/trang.html
Whenever I need to create an XSD schema, I start with RELAX NG Compact. Ideally, if my environment/context allows it, I’ll stick with it and auto-generate the XSD every time I make an update. Better yet, I’ll avoid XSD altogether, allowing me to utilize RELAX NG’s greater power. But when I don’t have that luxury, James Clark’s Trang is a lifesaver.
Evan:
I’ll have to check more closely: it could be, as you say, that XMLSpy is misleading me. I guess we’re always a little captive to our tools, though i’ve had very good success with XMLSpy in the past (and have therefore come to rely on it).
I’ve been hearing about RELAX NG for a while, and yours is another significant vote in its favor. The main downside for me (here comes that tool thing again) is that XSLSpy doesn’t support it, so one (or two) more layers of indirection get put into the process. But i’ll take a closer look.