The vision of integrating information—from a variety of sources, into the way people work, to improve decisions and process—is one of the cornerstones of biomedical informatics. Thoughts on how this vision might be realized have evolved as improvements in information and communication technologies, together with discoveries in biomedical informatics, and have changed the art of the possible. This review identified three distinct generations of ‘‘integration’’ projects. First generation projects create a database and use it for multiple purposes. Second generation projects integrate by bringing information from various sources together through enterprise information architecture. Third-generation projects inter-relate disparate but accessible information sources to provide the appearance of integration. The review suggests that the ideas developed in the earlier generations have not been supplanted by ideas from subsequent generations. Instead, the ideas represent a continuum of progress along the three dimensions of workflow, structure, and extraction.
Member of the audience:
I’d like to ask a question about capturing ontologies from multiple people. Imagine for a moment that knowledge freezes long enough for us to try to catch it. Do you have a vision of a tool that will allow multiple knowledge-domain people to act at once? To work out discrepancies in their visions?
Mark Musen: Put differently, the question was how do we deal with the fact that there is no overarching ontology? How do we build the tools that will allow us to try to achieve consensus in ontologies? I think the answer to that question is that we do not know. I’m being a little bit facetious, but philosophers have been trying to deal with that problem for 2,000 to 3,000 years.
I think you see two different approaches in the computer science community. You see the approach that Doug Lenat has taken. He is trying to create an ontology that he believes will provide all the knowledge that one needs to read the Encyclopaedia Britannica. Such an overarching ontology would need to capture most of human existence. The real problem, though, is how you ever validate the distinctions made in that ontology and have confidence that things have been captured in a way that is consistent and understandable? How do you record all the assumptions that you make while constructing the ontology? When you have concepts like ‘‘semi-tangible object’’ and ‘‘semiintangible object,’’ it’s very hard to know for sure whether what one records about those distinctions really makes sense.
At the other end of the spectrum, you see people who really want a thousand flowers to bloom and who are not trying to achieve that kind of perfect alignment among views of the world. For example, the Knowledge Systems Laboratory at Stanford is trying to make constrained ontologies that deal with very narrow domains, so that the kinds of problems that you allude to do not happen, because the number of concepts in the ontology is relatively small. The answer lies somewhere between Doug Lenat’s view of the world, that all we have to do is work hard enough and everything will fall into place, and the view that we can’t possibly do this, so we have to have just a small number of constrained ontologies. We need to elucidate a set of principles that will provide the basis for tools that will help us try to, if not merge small ontologies, at least create the kinds of alignments that will allow us to bring them together in ways that make them useful.
Randy Miller: One of the things that I learned from my mentor, Jack Myers, is that as an informatician, as opposed to a philosopher or a computer scientist, you do not need to represent everything. If you have a problem at hand, you represent it at a level that is tractable and doable. If you do what Doug Lenat’s doing, you can spend your entire career representing stuff that is not ever going to be used in a real system, because there is no way to apply it. While that may sound harsh, the reality is that we do not know how to represent time, severity of finding, and severity of illness well at all, but we can still build systems that do diagnosis or a good job of making recommendations for therapy. So you do not have to capture the world in all its infinite detail. The trick is to understand what the critical information is and represent things at that level. Otherwise, you get mired in detail.
Mark Musen: Let me underscore your last point. Doug Lenat actually felt pretty confident that his ontology covered all the areas that one would want to deal with, until last year, when HotBot contracted to use CYC as the basis for indexing Web pages. This contract showed, first of all, that ontologies have incredible commercial potential, but it also pointed out to Doug Lenat that there was a whole realm of human experience that was not well represented in the ontology. Specifically, there was a need to categorize different kinds of pornography which Lenat had not thought about previously.
Member of the audience:
Health Level Seven’s development of a set of reference information models is one of the major efforts for creating a structure for ontologies in the United States. Can you talk about how your organizations are participating in the development of that reference information model (RIM) and how you are using your academic experiences to contribute to that effort among providers, academics, and vendors?
Bill Stead: Vanderbilt is an institutional member and a strong advocate of HL7. The central core of our communication subsystem uses HL7, and we build middle ware as needed to bridge between the core and legacy products. We have not put direct energy into the process for defining the reference information model. We use the HL7 model as a starting point, but we extend it as needed. In this way we incorporate it into immediate solutions to real problems, while providing useful information about future directions.
Bill Hersh: None of us has been involved directly in that effort. However, our research into the nature of ontologies and the vocabulary projects such as the Cannon Grouping should useful to the effort.
Mark Musen: I will just add that I think the vendor community is in the best position to work on ontology content, because they have the most direct connection with the needs of end users. I think that academicians need to follow this work very carefully. We are, we hope, in the best position to be developing the kinds of tools that will help us examine ontologies, relate them to each other, and allow them to evolve as our understanding of the world changes.
Randy Miller: I have a slightly contrary view, partly out of ignorance about HL7 RIM. The key question is what problems it is trying to solve. That should drive what the content is. If you can state the problems it is going to be used to solve, then you can say whether it should clinically rich. In that case it will require lots of input from academic clinicians. If it is to solve the problem of interchange of data among vendors then it needs vendor input. But until you explicitly state what it’s going to be used for, just building it for the sake of building it is not useful. I know that the HL7 RIM is not being built that way. I am just saying that I think that’s the way to address your question, to seek the specific purpose before giving an answer.
Tuesday, June 22, 2010
Ontology: Lenat's Approach Versus Those That Would Let 'Many Flowers Bloom'
From a paper,Integration and Beyond: Linking Information from Disparate Sources and into Workflow, presented in part as the keynote to the Cornerstone on Integrating Information, one of four Cornerstone sessions included in the program of the AMIA Annual Fall Symposium, Washington, DC, November 6–10, 1999, and published in the Journal of the American Medical Informatics Association Volume 7 Number 2 Mar / Apr 2000 135