Exploring Information Extraction Resilience

dc.creatorGregg,Dawn
dc.date2008
dc.date.accessioned2024-02-06T12:56:40Z
dc.date.available2024-02-06T12:56:40Z
dc.descriptionThere are many challenges developers face when attempting to reliably extract data from the Web. One of these challenges is the resilience of the extraction system to changes in the web pages information is being extracted from. This article compares the resilience of information extraction systems that use position based extraction with an ontology based extraction system and a system that combines position based extraction with ontology based extraction. The findings demonstrate the advantages of using a system that combines multiple extraction techniques, especially in environments where web sites change frequently and where data collection is conducted over an extended period of time.
dc.formattext/html
dc.identifierhttps://doi.org/10.3217/jucs-014-11-1911
dc.identifierhttps://lib.jucs.org/article/29105/
dc.identifier.urihttps://openrepository.mephi.ru/handle/123456789/9774
dc.languageen
dc.publisherJournal of Universal Computer Science
dc.relationinfo:eu-repo/semantics/altIdentifier/eissn/0948-6968
dc.relationinfo:eu-repo/semantics/altIdentifier/pissn/0948-695X
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rightsJ.UCS License
dc.sourceJUCS - Journal of Universal Computer Science 14(11): 1911-1920
dc.subjectinformation extraction
dc.subjectsemi-structured data
dc.subjectontologies
dc.titleExploring Information Extraction Resilience
dc.typeResearch Article
Файлы