Exploring the use of natural language systems for fact identification

Towards the automatic construction of healthcare portals

Frederick A. Peck, Suresh Bhavnani, Marilyn H. Blackmon, Dragomir R. Radev

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

In prior work we observed that expert searchers follow well-defined search procedures in order to obtain comprehensive information on the Web. Motivated by that observation, we developed a prototype domain portal called the Strategy Hub that provides expert search procedures to benefit novice searchers. The search procedures in the prototype were entirely handcrafted by search experts, making further expansion of the Strategy Hub cost-prohibitive. However, a recent study on the distribution of healthcare information on the web suggested that search procedures can be automatically generated from pages that have been rated based on the extent to which they cover facts relevant to a topic. This paper presents the results of experiments designed to automate the process of rating the extent to which a page covers relevant facts. To automatically generate these ratings, we used two natural language systems, Latent Semantic Analysis and MEAD, to compute the similarity between sentences on the page and each fact. We then used an algorithm to convert these similarity scores to a single rating that represents the extent to which the page covered each fact. These automatic ratings are compared with manual ratings using inter-rater reliability statistics. Analysis of these statistics reveals the strengths and weaknesses of each tool, and suggests avenues for improvement.

Original languageEnglish (US)
Pages (from-to)327-338
Number of pages12
JournalProceedings of the ASIST Annual Meeting
Volume41
DOIs
StatePublished - Nov 2004
Externally publishedYes

Fingerprint

rating
Statistics
language
expert
Semantics
statistics
Costs
Experiments
semantics
experiment
costs

ASJC Scopus subject areas

  • Information Systems
  • Library and Information Sciences

Cite this

Exploring the use of natural language systems for fact identification : Towards the automatic construction of healthcare portals. / Peck, Frederick A.; Bhavnani, Suresh; Blackmon, Marilyn H.; Radev, Dragomir R.

In: Proceedings of the ASIST Annual Meeting, Vol. 41, 11.2004, p. 327-338.

Research output: Contribution to journalArticle

@article{b3c57b0526a743d589bf3796534ce5e5,
title = "Exploring the use of natural language systems for fact identification: Towards the automatic construction of healthcare portals",
abstract = "In prior work we observed that expert searchers follow well-defined search procedures in order to obtain comprehensive information on the Web. Motivated by that observation, we developed a prototype domain portal called the Strategy Hub that provides expert search procedures to benefit novice searchers. The search procedures in the prototype were entirely handcrafted by search experts, making further expansion of the Strategy Hub cost-prohibitive. However, a recent study on the distribution of healthcare information on the web suggested that search procedures can be automatically generated from pages that have been rated based on the extent to which they cover facts relevant to a topic. This paper presents the results of experiments designed to automate the process of rating the extent to which a page covers relevant facts. To automatically generate these ratings, we used two natural language systems, Latent Semantic Analysis and MEAD, to compute the similarity between sentences on the page and each fact. We then used an algorithm to convert these similarity scores to a single rating that represents the extent to which the page covered each fact. These automatic ratings are compared with manual ratings using inter-rater reliability statistics. Analysis of these statistics reveals the strengths and weaknesses of each tool, and suggests avenues for improvement.",
author = "Peck, {Frederick A.} and Suresh Bhavnani and Blackmon, {Marilyn H.} and Radev, {Dragomir R.}",
year = "2004",
month = "11",
doi = "10.1002/meet.1450410139",
language = "English (US)",
volume = "41",
pages = "327--338",
journal = "Proceedings of the ASIST Annual Meeting",
issn = "0044-7870",
publisher = "Learned Information",

}

TY - JOUR

T1 - Exploring the use of natural language systems for fact identification

T2 - Towards the automatic construction of healthcare portals

AU - Peck, Frederick A.

AU - Bhavnani, Suresh

AU - Blackmon, Marilyn H.

AU - Radev, Dragomir R.

PY - 2004/11

Y1 - 2004/11

N2 - In prior work we observed that expert searchers follow well-defined search procedures in order to obtain comprehensive information on the Web. Motivated by that observation, we developed a prototype domain portal called the Strategy Hub that provides expert search procedures to benefit novice searchers. The search procedures in the prototype were entirely handcrafted by search experts, making further expansion of the Strategy Hub cost-prohibitive. However, a recent study on the distribution of healthcare information on the web suggested that search procedures can be automatically generated from pages that have been rated based on the extent to which they cover facts relevant to a topic. This paper presents the results of experiments designed to automate the process of rating the extent to which a page covers relevant facts. To automatically generate these ratings, we used two natural language systems, Latent Semantic Analysis and MEAD, to compute the similarity between sentences on the page and each fact. We then used an algorithm to convert these similarity scores to a single rating that represents the extent to which the page covered each fact. These automatic ratings are compared with manual ratings using inter-rater reliability statistics. Analysis of these statistics reveals the strengths and weaknesses of each tool, and suggests avenues for improvement.

AB - In prior work we observed that expert searchers follow well-defined search procedures in order to obtain comprehensive information on the Web. Motivated by that observation, we developed a prototype domain portal called the Strategy Hub that provides expert search procedures to benefit novice searchers. The search procedures in the prototype were entirely handcrafted by search experts, making further expansion of the Strategy Hub cost-prohibitive. However, a recent study on the distribution of healthcare information on the web suggested that search procedures can be automatically generated from pages that have been rated based on the extent to which they cover facts relevant to a topic. This paper presents the results of experiments designed to automate the process of rating the extent to which a page covers relevant facts. To automatically generate these ratings, we used two natural language systems, Latent Semantic Analysis and MEAD, to compute the similarity between sentences on the page and each fact. We then used an algorithm to convert these similarity scores to a single rating that represents the extent to which the page covered each fact. These automatic ratings are compared with manual ratings using inter-rater reliability statistics. Analysis of these statistics reveals the strengths and weaknesses of each tool, and suggests avenues for improvement.

UR - http://www.scopus.com/inward/record.url?scp=33645274719&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33645274719&partnerID=8YFLogxK

U2 - 10.1002/meet.1450410139

DO - 10.1002/meet.1450410139

M3 - Article

VL - 41

SP - 327

EP - 338

JO - Proceedings of the ASIST Annual Meeting

JF - Proceedings of the ASIST Annual Meeting

SN - 0044-7870

ER -