Applying machine learning to real problems is non-trivial because many important steps are needed to prepare for learning and to interpret the results after learning. This dissertation investigates four problems that arise before and after applying learning algorithms. First, how can we verify a dataset contains "good" information? I propose cross-data validation for quantifying the quality of a dataset relative to a benchmark dataset and define a data efficiency ratio that measures how efficiently the dataset in question collects information (relative to the benchmark). Using these methods I demonstrate the quality of bird observations collected by the eBird citizen science project which has few quality controls. Second, can off-the-shelf ...
Due to the prevalence of machine learning algorithms and the potential for their decisions to profou...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventi...
We are surrounded by data in our daily lives. The rent of our houses, the amount of electricity unit...
Wide-ranging digitalization has made it possible to capture increasingly larger amounts of data. In ...
The quality and quantity (we call it suitability from now on) of data that are used for a machine le...
Machine learning algorithms are used to train the machine to learn on its own and improve from exper...
Determining the optimal amount of training data for machine learning algorithms is a critical task i...
What do citation screening for evidence-based medicineand generating land-cover maps of the Earth ha...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
International audienceDesigning Machine Learning algorithms implies to answer three main questions: ...
Developing state-of-the-art approaches for specific tasks is a major driving force in our research c...
'Machine Learning' brings together all the state-of-the-art methods for making sense of data. With h...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
This dissertation seeks to clarify and resolve a number of fundamental issues surrounding algorithmi...
Due to the prevalence of machine learning algorithms and the potential for their decisions to profou...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventi...
We are surrounded by data in our daily lives. The rent of our houses, the amount of electricity unit...
Wide-ranging digitalization has made it possible to capture increasingly larger amounts of data. In ...
The quality and quantity (we call it suitability from now on) of data that are used for a machine le...
Machine learning algorithms are used to train the machine to learn on its own and improve from exper...
Determining the optimal amount of training data for machine learning algorithms is a critical task i...
What do citation screening for evidence-based medicineand generating land-cover maps of the Earth ha...
This thesis explores one of the most fundamental questions in Machine Learning, namely, how should t...
International audienceDesigning Machine Learning algorithms implies to answer three main questions: ...
Developing state-of-the-art approaches for specific tasks is a major driving force in our research c...
'Machine Learning' brings together all the state-of-the-art methods for making sense of data. With h...
One of the fundamental machine learning tasks is that of predictive classification. Given that organ...
This dissertation seeks to clarify and resolve a number of fundamental issues surrounding algorithmi...
Due to the prevalence of machine learning algorithms and the potential for their decisions to profou...
Thesis (Ph.D.)--University of Washington, 2020Modern machine learning algorithms have been able to a...
Data analysis usually aims to identify a particular signal, such as an intervention effect. Conventi...