Exploring Information Extraction Resilience

dc.creator	Gregg,Dawn
dc.date	2008
dc.date.accessioned	2024-02-06T12:56:40Z
dc.date.available	2024-02-06T12:56:40Z
dc.description	There are many challenges developers face when attempting to reliably extract data from the Web. One of these challenges is the resilience of the extraction system to changes in the web pages information is being extracted from. This article compares the resilience of information extraction systems that use position based extraction with an ontology based extraction system and a system that combines position based extraction with ontology based extraction. The findings demonstrate the advantages of using a system that combines multiple extraction techniques, especially in environments where web sites change frequently and where data collection is conducted over an extended period of time.
dc.format	text/html
dc.identifier	https://doi.org/10.3217/jucs-014-11-1911
dc.identifier	https://lib.jucs.org/article/29105/
dc.identifier.uri	https://openrepository.mephi.ru/handle/123456789/9774
dc.language	en
dc.publisher	Journal of Universal Computer Science
dc.relation	info:eu-repo/semantics/altIdentifier/eissn/0948-6968
dc.relation	info:eu-repo/semantics/altIdentifier/pissn/0948-695X
dc.rights	info:eu-repo/semantics/openAccess
dc.rights	J.UCS License
dc.source	JUCS - Journal of Universal Computer Science 14(11): 1911-1920
dc.subject	information extraction
dc.subject	semi-structured data
dc.subject	ontologies
dc.title	Exploring Information Extraction Resilience
dc.type	Research Article