New Benchmarks for Learning on Non-Homophilous Graphs
In recent years, there is strong emphasis on mining medical data using
machine learning techniques. A common problem is to obtain a noiseless set of
textual documents, with a relevant content for the research question, and
developing a Question Answering (QA) model for a specific medical field. The
purpose of this paper is to present a new methodology for building a medical
dataset and obtain a QA model for analysis of symptoms and impact on daily life
for a specific disease domain. The ``Mental Health'' forum was used, a forum
dedicated to people suffering from schizophrenia and different mental
disorders. Relevant posts of active users, who regularly participate, were
extrapolated providing a new method of obtaining low-bias content and without
privacy issues. Furthermore, it is shown how to pre-process the dataset to
convert it into a QA dataset. The Bidirectional Encoder Representations from
Transformers (BERT), DistilBERT, RoBERTa, and BioBERT models were fine-tuned
and evaluated via F1-Score, Exact Match, Precision and Recall. Accurate
empirical experiments demonstrated the effectiveness of the proposed method for
obtaining an accurate dataset for QA model implementation. By fine-tuning the
BioBERT QA model, we achieved an F1 score of 0.885, showing a considerable
improvement and outperforming the state-of-the-art model for mental disorders
domain.