HomeEducationSide projectsResume
image

Hello, I'm Hanh Tran 🔥

I am an NLP Engineer at Arkhn.

I got my Ph.D. diploma in the Cotuelle program between La Rochelle University, France, and JoĹľef Stefan Institute, Slovenia supervised by Prof. Antoine Doucet and Assist. Prof. Senja Pollak. Previously, I worked as a Data Scientist at Samsung SDSV.

My research interests are natural language processing, information extraction, low-resourced languages, generative AI, and large-scale language models.

NEWS

PUBLICATIONS

For a complete list of publications, please refer to my Google Scholar page.

LIAS: Layout Information-Based Article Separation in Historical Newspapers

Wenjun Sun, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Mickaël Coustaty, Antoine Doucet.

International Conference on Theory and Practice of Digital Libraries (TPDL 2024)

We propose LIAS, a method based on layout information, and conduct experiments on historical newspapers. The method initially identifies the separator lines of the newspaper, analyzes the layout information to reconstruct the in- formation flow of the document, performs segmentation based on the semantic relationship of each text block in the information flow, and ultimately achieves article separation.

LIT: Label-Informed Transformers on Token-Based Classification

Wenjun Sun, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Mickaël Coustaty, Antoine Doucet

International Conference on Theory and Practice of Digital Libraries (TPDL 2024)

We propose LIT, an end-to-end pipeline architecture that integrates the transformer’s encoder-decoder mechanism with an additional label semantic to token classification tasks.

Leveraging Open Large Language Models for Historical Named Entity Recognition

Carlos-Emiliano González-Gallardo, Hanh Thi Hong Tran, Ahmed Hamdi, Antoine Doucet

International Conference on Theory and Practice of Digital Libraries (TPDL 2024)

(Best Paper Awards)

We develop methods to detect semantic ambiguous and complex entities in short and low-context settings of Complex NER using three different prompt-based approaches.

Is Prompting What Term Extraction Needs?

Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, Julien Delaunay, Antoine Doucet, Senja Pollak

International Conference on Text, Speech, and Dialogue (TSD 2024)

We evaluate the applicability of open and closed-sourced LLMs on the ATE task compared to two benchmarks where we consider ATE as sequence-labeling (iobATE) and seq2seq (templATE) tasks.

Global-SEG: Text Semantic Segmentation Based on Global Semantic Pair Relations

Wenjun Sun, Hanh Thi Hong Tran, Carlos-Emiliano González-Gallardo, MickaĂ«l Coustaty,  Antoine Doucet

International Conference on Document Analysis and Recognition (ICDAR 2024). 

We propose Global-SEG, utilizing global semantic pair relations from both token- and sentence-level language models for text semantic segmentation.

Can cross-domain term extraction benefit from cross-lingual transfer and nested term labeling?

Hanh Thi Hong Tran, Matej Martinc, Andraz Repar, Nikola Ljubešić, Antoine Doucet & Senja Pollak

Machine Learning, 2024.

We propose a novel NOBI annotation regime and evaluate the abilities of cross-lingual and multilingual versus monolingual learning in the cross-domain to automatic term extraction.

L3I++ at SemEval-2024 Task 8: Can Fine-tuned LLM Detect Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text?

Hanh Thi Hong Tran, Tien Nam Nguyen, Antoine Doucet, Senja Pollak

Proceedings of the The 18th International Workshop on Semantic Evaluation (SemEval-2024)

We propose a comparative study among three groups of methods to trigger the detection: (1) Using metric-based models; (2) Using a fine-tuned sequence-labeling language model (LM); and (3) Using a fine-tuned large-scale language model (LLM).

L3I++ at SemEval-2023 Task 2: Prompting for Multilingual Complex NER

Carlos-Emiliano González-Gallardo, Hanh Thi Hong Tran, Nancy Girdhar, Emanuela Boros, Jose G Moreno, Antoine Doucet

Proceedings of the The 17th International Workshop on Semantic Evaluation (SemEval-2023)

We develop methods to detect semantic ambiguous and complex entities in short and low-context settings of Complex NER using three different prompt-based approaches.

Ensembling Transformers for Cross-domain Automatic Term Extraction

Hanh Thi Hong Tran, Matej Martinc, Andraz Pelicon, Antoine Doucet, Senja Pollak

International Conference on Asian Digital Libraries (ICADL, 2022)

We propose a comparative study on the predictive power of Transformers at extracting single- and multi-word terms in a multilingual cross-domain setting with and without ensembling approaches.

Can Cross-domain Term Extraction Benefit from Cross-lingual Transfer?

Hanh Thi Hong Tran, Matej Martinc, Antoine Doucet, Senja Pollak

International Conference on Discovery Science (DS, 2022)

We evaluate the abilities of cross-lingual and multilingual versus monolingual learning in the cross-domain to automatic term extraction.

Named Entity Recognition Architecture Combining Contextual and Global Features

Hanh Thi Hong Tran, Antoine Doucet, Nicolas Sidere, Jose G Moreno, Senja Pollak

International Conference on Asian Digital Libraries (ICADL, 2021)

We propose the combination of contextual features from XLNet and global features from the Graph Convolution Network (GCN) to enhance NER performance.