XML

June 19, 2008

DMS 2008 theme #2, part 2: XQuery to the rescue

(Oops -- sorry about letting so much time sneak by since part 1 on this topic; I got detoured preparing for our Burton Group Catalyst conference next week and attending Microsoft TechEd last week, but the DMS team is committed to making this blog a sustained and regularly updated conversation on data-centric topics, so expect the post frequency to increase as we get into and beyond Catalyst.)

Part 1 on this topic concluded with a summary of ideal attributes for an XML query (general-purpose XML content manipulation, actually) language.  XQuery does a great job of addressing those goals, and its role in the common XML processing pipeline is depicted below:

XQuery2

Specifically:

  • XQuery is an efficient and effective means of working with information resources including databases, documents, and programming language data structures.
  • It doesn't require developer brain transplants, but it is generally more accessible to people familiar with SQL than to people who have been working with content/document-focused systems in the past.
  • XQuery has strong potential to replace the use of multiple programming languages often used for XML query and structural transformation operations.  In this respect, it's a lot like the shift to SQL more than 20 years ago, in that a lot of difficult-to-maintain procedural code can be replaced with more a declarative, set-oriented (and easier to maintain) approach.
  • XQuery is a W3C Recommendation, building on other W3C work including XPath (updated in conjunction with XQuery) and XML Schema (the XSD data type model, although XQuery is not exclusively tied to XSD), along with other related standards such as SQL.
  • XQuery is not, despite its name, just for queries.  It is an XML data manipulation language, designed for declarative expressions that can be optimized by servers, but it also includes variables, conditional expressions, function and modular declarations, and extension points. The W3C XQuery working group is also expanding XQuery to include insert, update, and delete operations.
  • XQuery is not a replacement for SQL; it's designed to be used in conjunction with SQL, as the languages are designed for different data models -- SQL is for the extended relational model, and XQuery is for (ideally well-structured/schema-described) XML content.

Overall, XQuery has significant potential to simplify XML query and structural transformation concerns, and, as support for XQuery increases in a variety of software product categories, XQuery is poised to become, for XML content, what SQL is for relational databases.

Blogger: Peter O'Kelly

May 29, 2008

DMS 2008 theme #2, part 1: XQuery

Another post in the series of key DMS topics for 2008; see this post for part 1 (topic: DBMS redux)

XQuery ("XQuery 1.0: An XML Query Language") is a W3C Recommendation for an XML content/data manipulation language. Burton Group's DMS team believes XQuery is going to become one of the most significant (transformational, if you'll excuse the pun) information management advances since SQL went mainstream in the 1980s, although XQuery is also something of a paradox in some respects.

Why XQuery is needed:

  • Organizations are doing a lot more with XML these days, whether in XML data-centric domains such as inter-organization information exchanges (that might have been done in EDI, in the previous generation) or more document-oriented domains such as the use of ODF or OOXML for productivity application file formats.
  • While the leading DBMSs are getting better at handling XML (as in full-fidelity XML data model management, rather than just "shredding" into relational tables or storing XML documents as blobs), SQL is not well-suited to address the somewhat idiosyncratic nature of XML -- hierarchy and sequence are fundamentally important to document-centric XML management, for instance, and SQL (indeed, the relational data model) was not designed to accommodate such document-centric concerns. XML usage patterns also vary considerably in the use of elements and attributes, another issue that does not map well to SQL.
  • Many traditional content and document management systems use proprietary content manipulation languages for beyond-the-basics needs, sort of like how things were in the DBMS market more than 20 years ago, before SQL became the norm.  Proprietary == badness in terms of vendor lock-in, higher training/admin costs, etc.
  • XPath is useful for intra-XML document expressions, but it doesn't address more data-centric, multi-document XML usage scenarios; for that, we need something that is more analogous to the relational calculus approach embodied in SQL.
  • Overall, along with explosive growth in XML information (content and/or data) management, there's a market need for something more powerful and succinct than today's common approach, which is often to use 7 +/- 2 programming/scripting languages (with lots of custom coding), SQL, XPath, and XSLT, in order to handle the common pipeline scenario depicted below:
XQuery1
  • This also tends to get into ugly pair-wise "impedance mismatch" challenges, e.g., object/relational, object/XML, relational/XML, etc.

To recap, we need:

  • An efficient and effective means of exploiting the best of all possible worlds -- data management, document/content management, and programming language data structure models (typically object orientation)
  • With models and languages that don't require widespread brain transplants:
    • There's no way to press the restart button on existing content, tools already deployed, organizations already entrenched...
    • The industry needs a vast simplification (e.g., relative to SQL + XSLT + XPath + custom coding in a half-dozen programming languages, in many XML pipeline scenarios), but also more synergy -- a new gestalt...
  • And of course it should all be done with a true industry standard, community-driven and open.

And that's where XQuery enters the picture -- more on that in part 2 of this post series.

May 22, 2008

DMS 2008 theme #1: DBMS redux, part 2: one size still fits plenty

(See this post for part 1)

Okay, time for a DBMS reality check.

First, IBM, Microsoft, and Oracle collectively control, by most estimates, around 85% of the commercial DBMS market (which was generally assumed to be ~$21B in 2007). Sun's $1B acquisition of MySQL AB earlier this year also suggests that perhaps there's still some mileage left in the DBMS market as we've known it for the last decade or two.

Second, DBMS and RDBMS (relational DBMS) are not 1:1; the latest DBMS products from IBM, Microsoft, and Oracle, in particular, manage multiple database models (e.g., relational, XML, files, streaming, and OLAP). And they can do so with much lower total administration/license/etc. costs, compared with the alternative of using "best-of-breed"/specialized products for each database model domain.

Some architectural diagrams of what this looks like, when the bits hit the disk:

ibm

ms

ora

(Wow, these vendors are so fiercely competitive, they even use the same general color schemes:)...)

XQuery, the subject of my next post in this series, is a game-changer that brings significant potential for the DBMS leaders to leapfrog specialized XML content/data management systems (and SQL certainly isn't going away, nor are proprietary DBMS programming languages such as Transact-SQL or PL/SQL).

With XQuery and full-fidelity XML database model management in the leading DBMSs, it will also be possible for enterprises to shift some of their document management/content management infrastructure to DBMSs (rather than continuing to rely on traditional, proprietary specialized alternatives). This means enterprises will be able to have fewer moving parts in their data management infrastructure, vastly simplified data/document (a.k.a. structured/unstructured information) integration/synchronization opportunities, and fewer product licenses to renew each year.

The specialized vendors will also continue to have considerable market opportunities for the immediate future, of course, as enterprises don't undertake DBMS upgrades lightly, and some edge cases will remain out of the scope of the leading DBMSs for a while (e.g., petabyte XML document management scenarios), but for the most part it's likely the next generation of market leaders in the enterprise DBMS market will look a lot like the current generation, perhaps with relative market share changes along the way.

To close by addressing the questions from part 1 of this post:

  • It is not the end of the road for RDBMSs as we've known them for the last ~25 years; they're morphing to become multi-database model DBMSs, and they're likely to gain rather than lose momentum over the next few years.
  • Hang on to those SQL books; you're going to need them. The relational database model is still the most robust and powerful general-purpose database model, and it has significant synergy with XML (as the leading document model).
  • It would indeed be prudent to also invest some time and attention in learning more about XML (data and documents), along with XQuery (which will be relatively easy to learn, if you've already mastered SQL). Burton Group has a bunch of research content in these domains, incidentally...
  • Specialized XML data management systems are gaining momentum, but don't bother exploring IMS, IDMS, or other pre-relational systems for career enhancement potential, and it is still not the case that "everything is an object", so don't expect OODBMSs to rise from their graves anytime soon.  Some object database systems are being re-treaded and repositioned as XML database systems, but you probably want to look very closely before betting your next app on XML systems that aren't fundamentally designed to support XQuery (as opposed to bolting XQuery support on top of an existing object database system).
  • Don't worry about Larry Ellison's financial stability; he's probably going to be on that Forbes list for the foreseeable future...

The next DMS 2008 theme: XQuery

Blogger: Peter O'Kelly

DMS 2008 theme #1: DBMS redux, part 1: the end of the "one-size-fits-all" era?

Context-setting:

  • If you haven't seen the post titled "A few things you should know about Burton Group and this blog", please skim it for context-setting, before reading further in this post
  • This post is the first in a series I referenced in the initial post for this blog, when I mentioned data management-related topics the DMS team thinks will be most influential over the next few years.  The list is neither exhaustive nor rank-ordered, and we (the DMS team) will get around to the rest of the initial post series set -- on XQuery, data modeling, open source DBMS, and the simultaneous commoditization, democratization, and specialization of business intelligence -- over the next week or two.

So -- about database management systems (DBMSs).  Many people assume the evolution of DBMSs peaked about a dozen years ago, coinciding with the rush to web apps. Before then, DBMSs were often serving as the full application server stack, handling database, transaction, identity/authentication/access control, and even application logic needs. It was a pretty fun and lucrative time to be a database designer on, say Oracle Database or Sybase SQL Server, circa 1993.

Web app servers often displaced DBMSs at the center of application platform stack priorities, however, as organizations rushed to integrate disparate back-end systems (often running on multiple types of DBMSs) into web apps. DBMSs were often relegated to basic storage tier services, and application logic, integrity constraints, and even transactions sometimes migrated to the mid-tier.

In another important dimension, DBMS vendors have not always been the most lovable suppliers over the last couple decades, e.g., with unpopular (oligopoly-centric) pricing/licensing programs that helped to create mainstream market opportunities for open source DBMS vendors and projects.

The rapid growth of XML content/data management over the last decade has presented another challenge to business-as-usual in the DBMS market, as the XML database model cannot be completely accommodated in the relational database model ("shredding" XML documents into relational database tables, for instance, is an apt image, as it can entail metadata loss).

Even DBMS market pioneers such as Michael Stonebraker have been asserting, over the last couple years, that the "one-size-fits-all" DBMS era is history, and that specialized data management systems (for streaming data, columnar data domains such as analytics, and XML, for example) will increasingly displace traditional relational DBMSs.  A couple of Stonebraker's most recent companies, StreamBase Systems and Vertica, are leading examples of the latest wave of specialized data management vendors, as is XML content platform Mark Logic (there are compelling open source projects in these domains as well, e.g., the eXist-db open source XML database system).

Is it the end of the road for RDBMSs as we've known them for the last ~25 years?  Time to sell your SQL books on eBay, before they become worthless?  To sign up for a distance-learning XML management course, in order to invest in your future career potential?  Perhaps some of the "legacy" database models, such as hierarchical and object-oriented DBMSs, will finally start to gain (regain, for hierarchical) meaningful market momentum, with the market shift to XML for content and data?  Is Oracle CEO Larry Ellison at risk of being bumped off the short list of the richest people on the planet (he's already down to #14 on this year's Forbes "The World's Billionaires" list...)? 

(In the interest of keeping these posts to a reasonable length, I'll continue on this topic in a second post.)

Blogger: Peter O'Kelly

  • Burton Group Free Resources Stay Connected Stay Connected Stay Connected Stay Connected


Catalyst 2009


Blog powered by TypePad