Quality of Word Vectors and Its Impact on Named Entity Recognition in Czech

dc.contributor.authorDařena, František
dc.contributor.authorSüss, Martin
dc.date.accessioned2022-04-17T00:02:15Z
dc.date.available2022-04-17T00:02:15Z
dc.date.issued2020
dc.date.updated2022-04-17T00:02:15Z
dc.description.abstractNamed Entity Recognition (NER) focuses on finding named entities in text and classifying them into one of the entity types. Modern state-of-the-art NER approaches avoid using hand-crafted features and rely on feature-inferring neural network systems based on word embeddings. The paper analyzes the impact of different aspects related to word embeddings on the process and results of the named entity recognition task in Czech, which has not been investigated so far. Various aspects of word vectors preparation were experimentally examined to draw useful conclusions. The suitable settings in different steps were determined, including the used corpus, number of word vectors dimensions, used text preprocessing techniques, context window size, number of training epochs, and word vectors inferring algorithms and their specific parameters. The paper demonstrates that focusing on the process of word vectors preparation can bring a significant improvement for NER in Czech even without using additional language independent and dependent resources.en
dc.description.versionOA
dc.format154-169
dc.identifier.issn2336-6494
dc.identifier.orcidDařena, František 0000-0001-8892-4256
dc.identifier.urihttps://repozitar.mendelu.cz/xmlui/handle/20.500.12698/1545
dc.publisherMendelova univerzita v Brně
dc.relation.ispartofEuropean Journal of Business Science and Technology
dc.relation.urihttps://doi.org/10.11118/ejobsat.2020.010
dc.rightsCC BY-SA 4.0
dc.rights.urihttps://creativecommons.org/licenses/by-sa/4.0/
dc.subjectNamed Entity Recognitionen
dc.subjectword embeddingsen
dc.subjectword vectors trainingen
dc.subjectnatural language processingen
dc.subjectCzech languageen
dc.titleQuality of Word Vectors and Its Impact on Named Entity Recognition in Czechen
dc.typeJ_ČLÁNEK
local.contributor.affiliationPEF
local.identifier.doi10.11118/ejobsat.2020.010
local.identifier.e-issn2694-7161
local.identifier.obd43920378
local.identifier.scopus2-s2.0-85099840520
local.number2
local.volume6

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
J-Süss-EJOBSAT-2-2020.pdf
Size:
200.03 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
0 B
Format:
Item-specific license agreed upon to submission
Description: