Utilize este identificador para referenciar este registo:
https://hdl.handle.net/1822/42615
Título: | Processing Annotated TMX Parallel Corpora |
Autor(es): | Brito, Rui Miguel Magalhães Almeida, J. J. Simões, Alberto |
Palavras-chave: | Corpora paralelos TMX PLN Parallel corpora Annotated corpora |
Data: | Nov-2014 |
Citação: | Brito, Rui, José João Almeida, e Alberto Simões. 2014. Processing annotated TMX parallel corpora. Em IberSpeech 2014 --- VIII Jornadas en Tecnologías del Habla and IV Iberian SLTech Workshop, pp. 188--197, Las Palmas de Gran Canaria, Spain, November, 2014 |
Resumo(s): | In the later years the amount of freely available multilingual corpora has grown in an exponential way. Unfortunately the way these corpora are made available is very diverse, ranging from simple text files or specific XML schemas to supposedly standard formats like the XML Corpus Encoding Initiative, the Text Encoding Initiative, or even the Translation Memory Exchange formats. In this document we defend the usage of Translation Memory Exchange documents, but we enrich its structure in order to support the annotation of the documents with different information like lemmas, multi-words or entities. To support the adoption of the proposed formats, we present a set of tools to manipulate the different formats in an agile way. |
Tipo: | Artigo em ata de conferência |
URI: | https://hdl.handle.net/1822/42615 |
ISBN: | 978-84-617-2862-6 |
Versão da editora: | http://iberspeech2014.ulpgc.es/images/Iberspeech2014_OnlineProceedings.pdf |
Arbitragem científica: | yes |
Acesso: | Acesso aberto |
Aparece nas coleções: | CEHUM - Artigos em livros de atas |