Publicações

2015

Candido Junior, A., Magalhães, C., Caseli, H.M. & Zangirolami, R. (2015), "Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s", Revista de Estudos da Linguagem. Vol. 23(3), pp. 695-726.

[Abstract] [BibTeX] [URL]

Abstract: Este artigo tem o objetivo da avaliar a aplicação de dois métodos automáticos eficientes na extração de palavras-chave, usados pelas comunidades da Linguística de Corpus e do Processamento da Língua Natural para gerar palavras-chave de textos literários: o WordSmith Tools e o Latent Dirichlet Allocation (LDA). As duas ferramentas escolhidas para este trabalho têm suas especificidades e técnicas diferentes de extração, o que nos levou a uma análise orientada para a sua performance. Objetivamos entender, então, como cada método funciona e avaliar sua aplicação em textos literários. Para esse fim, usamos análise humana, com conhecimento do campo dos textos usados. O método LDA foi usado para extrair palavras-chave por meio de sua integração com o Portal Min@s: Corpora de Fala e Escrita, um sistema geral de processamento de corpora, concebido para diferentes pesquisas de Linguística de Corpus. Os resultados do experimento confirmam a eficácia do WordSmith Tools e do LDA na extração de palavras-chave de um corpus literário, além de apontar que é necessária a análise humana das listas em um estágio anterior aos experimentos para complementar a lista gerada automaticamente, cruzando os resultados do WordSmith Tools e do LDA. Também indicam que a intuição linguística do analista humano sobre as listas geradas separadamente pelos dois métodos usados neste estudo foi mais favorável ao uso da lista de palavras-chave do WordSmith Tools.

BibTeX:

@article{CandidoJr_etal_RELIN2015,
  author = {Candido Junior, Arnaldo and Magalhães, Célia and Caseli, Helena Medeiros and Zangirolami, Régis},
  title = {Topic Modeling for Keyword Extraction: using Natural Language Processing methods for keyword extraction in Portal Min@s},
  journal = {Revista de Estudos da Linguagem},
  year = {2015},
  volume = {23},
  number = {3},
  pages = {695--726},
  url = {http://www.periodicos.letras.ufmg.br/index.php/relin/article/view/8916}
}

Inácio, M.L. & Caseli, H.M. (2015), "Etiquetação morfossintática de textos em português do Brasil no domínio do e-commerce", In Anais do IV Student Workshop on Information and Human Language Technology., pp. 1-6.

[BibTeX] [URL]

BibTeX:

@inproceedings{Inacio_Caseli_TILIC2015,
  author = {Inácio, Márcio Lima and Caseli, Helena Medeiros},
  title = {Etiquetação morfossintática de textos em português do Brasil no domínio do e-commerce},
  booktitle = {Anais do IV Student Workshop on Information and Human Language Technology},
  year = {2015},
  pages = {1-6},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/tilic/2015/008.pdf}
}

Ito, F.T., Erdmann, H., Takabayashi, D., Santos, D.N. & Moreira, J. (2015), "Preprocessing Images to Improve Deep Neural Networks Classification", In Proceedings of XI Workshop de Visão Computacional. São Carlos, SP. October 2015., pp. 328-333.

[BibTeX] [URL]

BibTeX:

@inproceedings{Ito_etal_WVC_2015,
  author = {Ito, F. T. and Erdmann, H. and Takabayashi, D. and Santos, D. N. and Moreira, J.},
  title = {Preprocessing Images to Improve Deep Neural Networks Classification},
  booktitle = {Proceedings of XI Workshop de Visão Computacional},
  year = {2015},
  pages = {328-333},
  url = {http://wvc2015.eesc.usp.br/Proceedings_WVC2015.pdf}
}

Rondon, A.C., Caseli, H.M. & Ramisch, C. (2015), "Never-Ending Multiword Expressions Learning", In Proceedings of NAACL-HLT 2015. Denver, Colorado. June 2015., pp. 45-53.

[BibTeX] [URL]

BibTeX:

@inproceedings{Rondon_etal_MWE2015,
  author = {Rondon, Alexandre Coelho and Caseli, Helena Medeiros and Ramisch, Carlos},
  title = {Never-Ending Multiword Expressions Learning},
  booktitle = {Proceedings of NAACL-HLT 2015},
  year = {2015},
  pages = {45-53},
  url = {http://www.aclweb.org/anthology/W15-0908}
}

Silva, L.H. & Caseli, H.M. (2015), "Reconhecimento de entidades nomeadas em textos em português do Brasil no domínio do e-commerce", In Anais do IV Student Workshop on Information and Human Language Technology., pp. 1-7.

[BibTeX] [URL]

BibTeX:

@inproceedings{Silva_Caseli_TILIC2015,
  author = {Silva, Lucas Hochleitner and Caseli, Helena Medeiros},
  title = {Reconhecimento de entidades nomeadas em textos em português do Brasil no domínio do e-commerce},
  booktitle = {Anais do IV Student Workshop on Information and Human Language Technology},
  year = {2015},
  pages = {1-7},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/tilic/2015/010.pdf}
}

Teixeira, R.O., Seno, E.R.M. & Caseli, H.M. (2015), "NEPaLE: Uma ferramenta computacional de suporte à avaliação de paráfrases", In Anais do IV Student Workshop on Information and Human Language Technology., pp. 1-5.

[BibTeX] [URL]

BibTeX:

@inproceedings{Teixeira_etal_TILIC2015,
  author = {Teixeira, Rafael Oliveira and Seno, Eloize Rossi Marques and Caseli, Helena Medeiros},
  title = {NEPaLE: Uma ferramenta computacional de suporte à avaliação de paráfrases},
  booktitle = {Anais do IV Student Workshop on Information and Human Language Technology},
  year = {2015},
  pages = {1-5},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/tilic/2015/012.pdf}
}

Volpe, L.H.T. & Caseli, H.M. (2015), "Extração de relações semânticas de textos em português do Brasil no domínio do e-commerce", In Anais do IV Student Workshop on Information and Human Language Technology., pp. 1-7.

[BibTeX] [URL]

BibTeX:

@inproceedings{Volpe_Caseli_TILIC2015,
  author = {Volpe, Leonardo Henrique Tozzatto and Caseli, Helena Medeiros},
  title = {Extração de relações semânticas de textos em português do Brasil no domínio do e-commerce},
  booktitle = {Anais do IV Student Workshop on Information and Human Language Technology},
  year = {2015},
  pages = {1-7},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/tilic/2015/013.pdf}
}

2014

Martins, D.B.J. & Caseli, H.M. (2014), "Automatic machine translation error identification", Machine Translation. Vol. 29(1), pp. 1-24.

[Abstract] [BibTeX] [DOI] [URL]

Abstract: Although machine translation (MT) has been an object of study for decades now, the texts generated by the state-of-the-art MT systems still present several errors for many language pairs. Aiming at coping with this drawback, lots of efforts have been made to post-edit those errors either manually or automatically. Manual post-editing is more accurate but can be prohibitive when too many changes have to be made. Automatic post-editing demands less effort but can also be less effective and give rise to new errors. A way to avoid unnecessary automatic post-editing and new errors is by previously selecting only the machine-translated segments that really need to be post-edited. Thus, this paper describes the experiments carried out to automatically identify MT errors generated by a state-of-the-art phrase-based statistical MT system. Despite the fact that our experiments have been carried out using a statistical MT engine, we believe the approach can also be applied to other types of MT systems. The experiments investigated the well-known machine-learning algorithms Naive Bayes, Decision Trees and Support Vector Machines. Using the decision tree algorithm it was possible to identify wrong segments with around 77 % precision and recall when a small training corpus of only 2,147 error instances was used. Our experiments were performed on English-to-Brazilian Portuguese MT, and although some of the features are language-dependent, the proposed approach is language-independent and can be easily generalized to other language pairs.

BibTeX:

@article{Martins_Caseli_MT2014,
  author = {Martins, Débora Beatriz Jesus and Caseli, Helena Medeiros},
  title = {Automatic machine translation error identification},
  journal = {Machine Translation},
  year = {2014},
  volume = {29},
  number = {1},
  pages = {1--24},
  url = {http://dx.doi.org/10.1007/s10590-014-9163-y},
  doi = {http://doi.org/10.1007/s10590-014-9163-y}
}

Polastri, P.C., Caseli, H.M. & Seno, E.R.M. (2014), "Extração de paráfrases em português a partir de léxicos bilíngues: um estudo de caso", In Proceedings of the Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish., pp. 1-6.

[BibTeX] [URL]

BibTeX:

@inproceedings{Polastri_etal_TorPorEsp_2014,
  author = {Polastri, Paulo César and Caseli, Helena Medeiros and Seno, Eloize Rossi Marques},
  title = {Extração de paráfrases em português a partir de léxicos bilíngues: um estudo de caso},
  booktitle = {Proceedings of the Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish},
  year = {2014},
  pages = {1-6},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/torporesp/2014/015.pdf}
}

Taba, L.S. & Caseli, H. (2014), "Automatic Semantic Relation Extraction from Portuguese Texts", In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14). Reykjavik, Iceland. may 2014. European Language Resources Association (ELRA).

[BibTeX] [URL]

BibTeX:

@inproceedings{Taba_Caseli_LREC2014,
  author = {Leonardo Sameshima Taba and Helena Caseli},
  title = {Automatic Semantic Relation Extraction from Portuguese Texts},
  booktitle = {Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)},
  publisher = {European Language Resources Association (ELRA)},
  year = {2014},
  url = {http://www.lrec-conf.org/proceedings/lrec2014/pdf/522_Paper.pdf}
}

Vieira, T.L. & Caseli, H.M. (2014), "Aprendizado de Máquina Sem-Fim para Indução Automática de Léxico Bilíngue", In Proceedings of the Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish., pp. 1-8.

[BibTeX] [URL]

BibTeX:

@inproceedings{Vieira_Caseli_TorPorEsp_2014,
  author = {Vieira, Thiago Lima and Caseli, Helena Medeiros},
  title = {Aprendizado de Máquina Sem-Fim para Indução Automática de Léxico Bilíngue},
  booktitle = {Proceedings of the Workshop on Tools and Resources for Automatically Processing Portuguese and Spanish},
  year = {2014},
  pages = {1-8},
  url = {http://www.lbd.dcc.ufmg.br/colecoes/torporesp/2014/008.pdf}
}

Vieira, T.L. & Caseli, H.M. (2014), "NEBEL: Never-Ending Bilingual Equivalent Learner", In Proceedings of the Human-Inspired Computing and Its Applications: 13th Mexican International Conference on Artificial Intelligence -- MICAI. Tuxtla Gutiérrez, Mexico. November 16-22 2014.(Part I), pp. 99-103. Springer International Publishing.

[BibTeX] [DOI] [URL]

BibTeX:

@inproceedings{Vieira_Caseli_MICAI2014,
  author = {Vieira, Thiago Lima and Caseli, Helena Medeiros},
  title = {NEBEL: Never-Ending Bilingual Equivalent Learner},
  booktitle = {Proceedings of the Human-Inspired Computing and Its Applications: 13th Mexican International Conference on Artificial Intelligence -- MICAI},
  publisher = {Springer International Publishing},
  year = {2014},
  number = {Part I},
  pages = {99--103},
  url = {http://dx.doi.org/10.1007/978-3-319-13647-9_11},
  doi = {http://doi.org/10.1007/978-3-319-13647-9_11}
}

2013

Beck, D.E. & Caseli, H.M. (2013), "Tree-based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair", Learning and Nonlinear Models. Vol. 11(1), pp. 11-25.

[BibTeX]

BibTeX:

@article{Beck_Caseli_LNM_2013,
  author = {Beck, Daniel Emilio and Caseli, Helena Medeiros},
  title = {Tree-based Statistical Machine Translation: Experiments with the English and Brazilian Portuguese Pair},
  journal = {Learning and Nonlinear Models},
  year = {2013},
  volume = {11},
  number = {1},
  pages = {11-25}
}

Martins, D.B.J., Avanço, L.V., Nunes, M.G.V. & Caseli, H.M. (2013), "Annotating translation errors in Brazilian Portuguese autoautomatic translated ssentence: first step to automatic post-edition", In Proceedings of the Corpus Linguistics Conference.

[BibTeX] [URL]

BibTeX:

@inproceedings{Martins_etal_CL2013,
  author = {Martins, Débora Beatriz Jesus and Avanço, Lucas Vinicius and Nunes, Maria Graças Volpe and Caseli, Helena Medeiros},
  title = {Annotating translation errors in Brazilian Portuguese autoautomatic translated ssentence: first step to automatic post-edition},
  booktitle = {Proceedings of the Corpus Linguistics Conference},
  year = {2013},
  url = {http://ucrel.lancs.ac.uk/cl2013/doc/CL2013-ABSTRACT-BOOK.pdf}
}

2012

Beck, D.E. & Caseli, H.M. (2012), "Bayesian Induction of Syntactic Language Models for Brazilian Portuguese", In Proceedings of the 10th International Conference for Computational Processing of the Portuguese Language. April 2012. Volume 7243, pp. 157-167. Springer-Verlag Berlin Heidelberg.

[BibTeX] [URL]

BibTeX:

@inproceedings{Beck_Caseli_PROPOR2012,
  author = {Beck, Daniel Emilio and Caseli, Helena Medeiros},
  title = {Bayesian Induction of Syntactic Language Models for Brazilian Portuguese},
  booktitle = {Proceedings of the 10th International Conference for Computational Processing of the Portuguese Language},
  publisher = {Springer-Verlag Berlin Heidelberg},
  year = {2012},
  volume = {7243},
  pages = {157-167},
  url = {http://www.springer.com/br/book/9783642288845?referer=www.springeronline.com}
}

Beck, D.E. & Caseli, H.M. (2012), "Portuguese-English Statistical Machine Translation using Tree Transducers", In Anais do IX Encontro Nacional de Inteligência Artificial (ENIA-2012)., pp. 1-12.

[BibTeX] [URL]

BibTeX:

@inproceedings{Beck_Caseli_ENIA2012,
  author = {Beck, Daniel Emilio and Caseli, Helena Medeiros},
  title = {Portuguese-English Statistical Machine Translation using Tree Transducers},
  booktitle = {Anais do IX Encontro Nacional de Inteligência Artificial (ENIA-2012)},
  year = {2012},
  pages = {1-12},
  url = {http://www.ppgia.pucpr.br/ enia/anais/enia/artigos/105729_2.pdf}
}

Taba, L.S. & Caseli, H.M. (2012), "Bayesian Induction of Syntactic Language Models for Brazilian Portuguese", In Proceedings of the 10th International Conference for Computational Processing of the Portuguese Language. April 2012. Volume 7243, pp. 186-192. Springer-Verlag Berlin Heidelberg.

[BibTeX] [URL]

BibTeX:

@inproceedings{Taba_Caseli_PROPOR2012,
  author = {Taba, Leonardo Sameshima and Caseli, Helena Medeiros},
  title = {Bayesian Induction of Syntactic Language Models for Brazilian Portuguese},
  booktitle = {Proceedings of the 10th International Conference for Computational Processing of the Portuguese Language},
  publisher = {Springer-Verlag Berlin Heidelberg},
  year = {2012},
  volume = {7243},
  pages = {186-192},
  url = {http://www.springer.com/br/book/9783642288845?referer=www.springeronline.com}
}

2011

Antonio, M.M. & Caseli, H.M. (2011), "Tradução orientada a dados", In Anais de Eventos da UFSCar. São Carlos, SP. Volume 7

[BibTeX] [PDF]

BibTeX:

@inproceedings{CIC_Miguel_2011,
  author = {Antonio, Miguel M. and Caseli, Helena M.},
  title = {Tradução orientada a dados},
  booktitle = {Anais de Eventos da UFSCar},
  year = {2011},
  volume = {7}
}

Araújo, J.G. & Caseli, H.M. (2011), "Combining Models for the Alignment of Parallel Syntactic Trees", In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. Cuiabá, MT, Brazil. October, 24-26 2011., pp. 169-173. Sociedade Brasileira de Computação.

[Abstract] [BibTeX] [URL]

Abstract: The alignment of syntactic trees is the task of aligning the internal
and leaf nodes of two sentences in different languages structured
as trees. The output of the alignment can be used, for instance,
as knowledge resource for learning translation rules (for rule-based
machine translation systems) or models (for statistical machine translation
systems). This paper presents some experiments carried out based
on two syntactic tree alignment algorithms presented in [Lavie et
al. 2008] and [Tinsley et al. 2007]. Aiming at improving the performance
of internal nodes alignment, some approaches for combining the output
of these two algorithms were evaluated in Brazilian Portuguese and
English parallel trees.

BibTeX:

@inproceedings{STIL_Josue_2011,
  author = {Araújo, Josué G. and Caseli, Helena M.},
  title = {Combining Models for the Alignment of Parallel Syntactic Trees},
  booktitle = {Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology},
  publisher = {Sociedade Brasileira de Computação},
  year = {2011},
  pages = {169-173},
  url = {http://www.nilc.icmc.usp.br/til/stil2011_English/stil/artigos/Short/STIL2011_SP4.pdf}
}

Beck, D.E. (2011), "Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers", In Proceedings of the ACL 2011 Student Session. Portland, Oregon, USA. 19-24 June 2011 2011., pp. 36-40.

[Abstract] [BibTeX] [URL]

Abstract: In this paper I present a Master’s thesis proposal in syntax-based
Statistical Machine Translation. I propose to build discriminative
SMT models using both tree-to-string and tree-to-tree approaches.
Translation and language models will be represented mainly through
the use of Tree Automata and Tree Transducers. These formalisms have
important representational properties that makes them well-suited
for syntax modeling. I also present an experiment plan to evaluate
these models through the use of a parallel corpus written in English
and Brazilian Portuguese.

BibTeX:

@inproceedings{ACL_Daniel_2011,
  author = {Beck, Daniel Emilio},
  title = {Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers},
  booktitle = {Proceedings of the ACL 2011 Student Session},
  year = {2011},
  pages = {36-40},
  url = {http://aclweb.org/anthology-new/P/P11/P11-3007.pdf}
}

Kawamorita, C.T. & Caseli, H.M. (2011), "Memórias de tradução: recursos e ferramentas para auxiliar o humano a traduzir", In Anais de Eventos da UFSCar. São Carlos. Volume 7

[BibTeX] [PDF]

BibTeX:

@inproceedings{CIC_Cleber_2011,
  author = {Kawamorita, Cleber T. and Caseli, Helena M.},
  title = {Memórias de tradução: recursos e ferramentas para auxiliar o humano a traduzir},
  booktitle = {Anais de Eventos da UFSCar},
  year = {2011},
  volume = {7}
}

Schreiner, P., Villavicencio, A., Zilio, L. & Caseli, H.M. (2011), "Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques", In Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology. Cuiabá, MT, Brazil. October 24-26 2011., pp. 97-106. Sociedade Brasileira de Computação.

[Abstract] [BibTeX] [URL]

Abstract: Automatic lexical alignment is a vital step for empirical machine
translation, and although good results can be obtained with existent
models (e.g. Giza++), more precise alignment is still needed for
successfully handling complex constructions such as multiword expressions.
In this paper we propose an approach for lexical alignment combining
statistical and linguistic information. We describe the development
of a baseline discriminative aligner and a set of language dependent
post-processing functions that allow the inclusion of shallow linguistic
knowledge. The post-processing functions were designed to significantly
improve word alignment mainly on verb-particle constructs both over
our baseline and over Giza++.

BibTeX:

@inproceedings{STIL_Paulo_2011,
  author = {Schreiner, Paulo and Villavicencio, Aline and Zilio, Leonardo and Caseli, Helena M.},
  title = {Improving Lexical Alignment Using Hybrid Discriminative and Post-Processing Techniques},
  booktitle = {Proceedings of the 8th Brazilian Symposium in Information and Human Language Technology},
  publisher = {Sociedade Brasileira de Computação},
  year = {2011},
  pages = {97-106},
  url = {http://www.nilc.icmc.usp.br/til/stil2011_English/stil/artigos/Long/STIL2011_P11.pdf}
}

Sugiyama, B.A., Anacleto, J.C. & Caseli, H.M. (2011), "Assisting users in a cross-cultural communication by providing culturally contextualized translations", In Proceedings of SIGDOC 2011., pp. 1-6.

2015

2014

2013

2012

2011

2010

2009

Anteriores de Caseli (et al.)

2008

2007

2006

2005

2004

2003

2002

2001