Research Projects

Research Opportunities at VinNLP

1. NLP Research Assistant – Bilingual Speech-to-Text Development

Overview: We’re seeking a motivated NLP Research Assistant to support the development of a bilingual speech-to-text application for the medical field. This role focuses on advancing speech recognition models capable of processing English and Vietnamese medical terminology with high accuracy.

Key Responsibilities:

Qualifications:

Preferred Skills:

2. NLP Research Assistant – English-Vietnamese Neural Machine Translation

Overview: We are seeking a skilled NLP Research Assistant to assist in the development of a neural machine translation (NMT) system focused on English-Vietnamese medical content. This role involves implementing advanced NMT techniques to enhance the quality and reliability of medical translations.

Key Responsibilities:

Qualifications: Graduate student in CS, strong NLP and Python skills, familiarity with NMT.

Preferred Skills: Vietnamese proficiency, medical background, attention to detail.

Project duration: 12 months starting May or June 2025.

Apply

3. NLP Research Assistant – Vietnamese Semantic Tagging System (ViSaS)

Overview: We are seeking a dedicated and curious NLP Research Assistant to support the development of ViSaS, an open-source Vietnamese Semantic Tagging System inspired by the AraSAS/USAS framework. This role will contribute to building a semantic lexicon and tagging system tailored for Vietnamese, facilitating semantic enrichment of text and enabling cross-linguistic research and applications. The project will build on PyMUSAS (Python Multilingual Ucrel Semantic Analysis System), a rule-based token and Multi Word Expression (MWE) semantic tagger. PyMUSAS supports any semantic tagset, and provides pre-configured spaCy components for the UCREL Semantic Analysis System (USAS). The appointed researcher will work in collaboration with a researcher at Lancaster University, who will support integration of ViSaS into the PyMUSAS framework. The Vietnamese semantic tagger will start with a blank lexicon and MWE list, which the appointed RA will populate and iteratively refine.

Key Responsibilities:

Qualifications: Graduate student in CS or Computational Linguistics, experience in lexical semantics, Python.

Preferred Skills: Vietnamese linguistic knowledge, work with lexicons, interest in multilingual NLP.

USAS Tagger: Currently, USAS is available in more than 10 languages, including English, Welsh, Arabic, Chinese, Dutch, Italian, Portuguese, Spanish, Finnish and Malay. We are looking forward to the Veitnamese version. You can try the English demo here: https://ucrel-api.lancaster.ac.uk

Collaboration: Working on this project, you'll collaborate with Researchers from Lancaser University in England, UK.

Project duration: 12 months starting May or June 2025.

Apply

4. NLP Research Assistant – English-Vietnamese Free-Text Survey Analysis (FreeTxt-Vi)[Văn Bản Tự Do]

Overview: We are looking for a highly motivated NLP Research Assistant to support the development of FreeTxt-Vi – an English–Vietnamese version of the FreeTxt toolkit designed for the automated analysis and visualisation of bilingual free-text responses from surveys and questionnaires. Building on the success of FreeTxt (English–Welsh), this role will focus on extending the functionality to handle Vietnamese, enabling wider adoption of the tool in multilingual and cross-cultural settings.

Key Responsibilities:

Qualifications: Graduate student in CS or related field, experience with NLP, Python, and Vietnamese.

Preferred Skills: Experience with Vietnamese NLP tools (e.g. PhoBERT), visualisation, survey data, and social science methods.

FreeTxt App: Currently, FreeTxt works on English and Welsh languages. The app is currently live and we are looking to have something similar for Vietnamese: https://freetxt.app/

Collaboration: Working on this project, you'll collaborate with Researchers from Lancaser University in England, UK. As well as Cardiff University in Wales, UK.

Project Context: FreeTxt-Vi builds on the open-source FreeTxt tool (https://freetxt.app), a bilingual English–Welsh toolkit funded by AHRC (AH/W004844/1) to support automated analysis of qualitative free-text feedback from surveys and public consultations. This new initiative expands this capability to Vietnamese, enabling better civic, healthcare, and service-oriented data understanding in multilingual communities.

Project duration: 12 months starting May or June 2025.

Apply