1. NLP Research Assistant – Bilingual Speech-to-Text Development
Overview: We’re seeking a motivated NLP Research Assistant to support the development of a bilingual speech-to-text application for the medical field. This role focuses on advancing speech recognition models capable of processing English and Vietnamese medical terminology with high accuracy.
Key Responsibilities:
- Contribute to the development of a bilingual (English-Vietnamese) speech-to-text system tailored for healthcare.
- Research and evaluate open-source speech-to-text frameworks, identifying those suitable for integration.
- Fine-tune speech recognition models to interpret specialized medical terminology accurately.
- Incorporate a bilingual medical dictionary for precise transcription.
- Conduct system testing, identify and resolve bugs, and optimize platform performance.
- Maintain detailed documentation of methodologies, experiments, and outcomes.
Qualifications:
- Graduate student in Computer Science or a related discipline.
- Background in Natural Language Processing (NLP) or related subjects.
- Proficiency in Python.
- Familiarity with machine learning frameworks (e.g., TensorFlow, PyTorch).
- Knowledge of speech recognition technologies.
Preferred Skills:
- Prior experience with speech-to-text projects or systems.
- Proficiency in the Vietnamese language is a strong asset.
- Familiarity with medical terminology or experience within healthcare settings.
Project duration: 12 months starting May or June 2025.
Apply2. NLP Research Assistant – English-Vietnamese Neural Machine Translation
Overview: We are seeking a skilled NLP Research Assistant to assist in the development of a neural machine translation (NMT) system focused on English-Vietnamese medical content. This role involves implementing advanced NMT techniques to enhance the quality and reliability of medical translations.
Key Responsibilities:
- Conduct literature reviews on state-of-the-art NMT approaches, applying relevant insights to improve translation accuracy.
- Work on refining Vietnamese-English translations for healthcare content, enhancing clarity and accuracy.
- Integrate a bilingual medical dictionary to improve translation reliability.
- Develop methods for identifying uncertainties in translations, including alternative translations or error flags.
- Test and evaluate translation models using real-world medical texts, analyzing results to identify areas for enhancement.
- Collaborate with linguists and medical professionals to ensure the system meets end-user needs.
- Keep comprehensive documentation of methodologies, experiments, and findings.
Qualifications: Graduate student in CS, strong NLP and Python skills, familiarity with NMT.
- Graduate student in Computer Science or a related field.
- Strong foundation in Natural Language Processing (NLP) or similar disciplines.
- Proficiency in Python.
- Experience with machine learning libraries (e.g., TensorFlow, PyTorch), particularly for training and tuning models.
- Familiarity with neural machine translation techniques.
Preferred Skills: Vietnamese proficiency, medical background, attention to detail.
- Vietnamese language proficiency is highly desirable.
- Background in medical terminology or healthcare-related projects.
- Experience with NMT projects or translation software.
- Strong attention to detail and commitment to accuracy in translation outputs.
Project duration: 12 months starting May or June 2025.
Apply3. NLP Research Assistant – Vietnamese Semantic Tagging System (ViSaS)
Overview: We are seeking a dedicated and curious NLP Research Assistant to support the development of ViSaS, an open-source Vietnamese Semantic Tagging System inspired by the AraSAS/USAS framework. This role will contribute to building a semantic lexicon and tagging system tailored for Vietnamese, facilitating semantic enrichment of text and enabling cross-linguistic research and applications.
The project will build on PyMUSAS (Python Multilingual Ucrel Semantic Analysis System), a rule-based token and Multi Word Expression (MWE) semantic tagger. PyMUSAS supports any semantic tagset, and provides pre-configured spaCy
components for the UCREL Semantic Analysis System (USAS).
The appointed researcher will work in collaboration with a researcher at Lancaster University, who will support integration of ViSaS into the PyMUSAS framework. The Vietnamese semantic tagger will start with a blank lexicon and MWE list, which the appointed RA will populate and iteratively refine.
Key Responsibilities:
- Assist in building a Vietnamese semantic lexicon, mapping word senses to semantic domains adapted from the USAS framework (.
- Work on the alignment of Vietnamese lexemes to coarse-grained semantic fields using English glosses and bilingual corpora.
- Explore and adapt existing Vietnamese morphological analysers and lemmatisers to support ViSaS tagging.
- Develop and test a Vietnamese semantic tagging pipeline using Python and open-source NLP tools.
- Evaluate system coverage and accuracy across corpora in Vietnamese, and support manual correction where needed.
- Evaluate Vietnamese tagging performance of existing chatbots (e.g. ChatGPT, Gemini), focusing on semantic consistency and error patterns.
- Contribute to the development of a hybrid Vietnamese semantic tagging pipeline that integrates outputs from commercial LLMs with rule-based or lightweight ML components.
- Help maintain clean documentation and support the open-source release of ViSaS tools and lexicons.
Qualifications: Graduate student in CS or Computational Linguistics, experience in lexical semantics, Python.
- Graduate student in Computer Science, Computational Linguistics, or a related discipline.
- Background in Natural Language Processing, particularly lexical semantics or annotation.
- Experience in Python programming.
- Familiarity with corpus linguistics and/or Vietnamese linguistic structures.
- Basic knowledge of linguistic resources like POS taggers, lemmatisers, or morphological analysers.
Preferred Skills: Vietnamese linguistic knowledge, work with lexicons, interest in multilingual NLP.
- Knowledge of Vietnamese language (reading and linguistic understanding).
- Experience working with language resources, lexicons, or tagging systems.
- Familiarity with frameworks such as NLTK, spaCy, or similar, as well as LLMs.
- Interest in low-resource language technologies and multilingual NLP.
USAS Tagger: Currently, USAS is available in more than 10 languages, including English, Welsh, Arabic, Chinese, Dutch, Italian, Portuguese, Spanish, Finnish and Malay. We are looking forward to the Veitnamese version. You can try the English demo here: https://ucrel-api.lancaster.ac.uk
Collaboration: Working on this project, you'll collaborate with Researchers from Lancaser University in England, UK.
Project duration: 12 months starting May or June 2025.
Apply4. NLP Research Assistant – English-Vietnamese Free-Text Survey Analysis (FreeTxt-Vi)[Văn Bản Tự Do]
Overview: We are looking for a highly motivated NLP Research Assistant to support the development of FreeTxt-Vi – an English–Vietnamese version of the FreeTxt toolkit designed for the automated analysis and visualisation of bilingual free-text responses from surveys and questionnaires. Building on the success of FreeTxt (English–Welsh), this role will focus on extending the functionality to handle Vietnamese, enabling wider adoption of the tool in multilingual and cross-cultural settings.
Key Responsibilities:
- Support the extension of the FreeTxt architecture to process Vietnamese free-text alongside English.
- Analyse how existing LLMs (e.g. ChatGPT, Gemini) handle English–Vietnamese free-text responses, with a focus on sentiment, summarisation, and meaning extraction.
- Develop, adapt, or integrate Vietnamese NLP resources (tokenisers, POS taggers, sentiment analysis tools, summarisation models, etc.).
- Curate and annotate bilingual survey datasets (English–Vietnamese) for tool training and evaluation.
- Work with the team to integrate semantic and sentiment analysis modules compatible with Vietnamese.
- Contribute to the evaluation and testing of the system using real-world Vietnamese survey data.
- Assist in the production of user-facing documentation, including written guides and video tutorials.
Qualifications: Graduate student in CS or related field, experience with NLP, Python, and Vietnamese.
- Graduate student in Computer Science, Computational Linguistics, or a related field.
- GraduateExperience with Natural Language Processing (NLP), preferably with multilingual or low-resource languages.
- GraduateStrong programming skills in Python.
- GraduateFamiliarity with frameworks such as spaCy, Hugging Face Transformers, NLTK, or related.
- GraduateKnowledge of the Vietnamese language (reading proficiency essential; writing/speaking desirable).
Preferred Skills: Experience with Vietnamese NLP tools (e.g. PhoBERT), visualisation, survey data, and social science methods.
- Experience working with Vietnamese NLP tools (e.g. VnCoreNLP, PhoBERT, or similar).
- Familiarity with survey data, qualitative text analysis, or social science research methods.
- Experience with data visualisation and front-end libraries is a plus (e.g. D3.js, Plotly).
- Interest in working on open-source, impactful projects serving diverse user groups.
FreeTxt App: Currently, FreeTxt works on English and Welsh languages. The app is currently live and we are looking to have something similar for Vietnamese: https://freetxt.app/
Collaboration: Working on this project, you'll collaborate with Researchers from Lancaser University in England, UK. As well as Cardiff University in Wales, UK.
Project Context: FreeTxt-Vi builds on the open-source FreeTxt tool (https://freetxt.app), a bilingual English–Welsh toolkit funded by AHRC (AH/W004844/1) to support automated analysis of qualitative free-text feedback from surveys and public consultations. This new initiative expands this capability to Vietnamese, enabling better civic, healthcare, and service-oriented data understanding in multilingual communities.
Project duration: 12 months starting May or June 2025.
Apply