A major hurdle in the development of natural language processing (NLP) methods for Electronic Health Records (EHRs) is the lack of large, annotated datasets. Privacy concerns prevent the distribution of EHRs, and the annotation of data is known to be costly and cumbersome. Synthetic data presents a promising solution to the privacy concern, if synthetic data has comparable utility to real data and if it preserves the privacy of patients. However, the generation of synthetic text alone is not useful for NLP because of the lack of annotations. In this work, we propose the use of neural language models (LSTM and GPT-2) for generating artificial EHR text jointly with annotations for named-entity recognition. Our experiments show that artificial...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative e...
Sensitive data is normally required to develop rule-based or train machine learning-based models for...
Sensitive data is normally required to develop rule-based or train machine learning-based models for...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
International audienceIn sensitive domains, the sharing of corpora is restricted due to confidential...
One broad goal of biomedical informatics is to generate fully-synthetic, faithfully representative e...
Sensitive data is normally required to develop rule-based or train machine learning-based models for...
Sensitive data is normally required to develop rule-based or train machine learning-based models for...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...
International audienceA vast amount of crucial information about patients resides solely in unstruct...