Date of Graduation

Summer 2021


Master of Natural and Applied Science in Computer Science


Computer Science

Committee Chair

Razib Iqbal


Fourier-transform infrared (FTIR) spectra of organic compounds can be used to compare and identify compounds. A mid-FTIR spectrum gives absorbance values of a compound over the 400-4000 cm-1 range. Spectral matching is the process of comparing the spectral signature of two or more compounds and returning a value for the similarity of the compounds based on how closely their spectra match. This process is commonly used to identify an unknown compound by searching for its spectrum’s closes match in a database of known spectra. A major limitation of this process is that it can only be used to identify substances already in the database. An unknown compound not found in the database will likely match to a similar yet structurally different compound. Alternatively, FTIR has been used to identify characteristics, substructures, or functional groups of a compound based on the compounds IR spectral features. However, most works have only attempted to predict a limited set of substructures and there has only been limited success in predicting the full structure of an unknown compound based purely on its FTIR spectrum. For this thesis, I investigated the possibility of identifying compounds and identifying substructures present in the compound’s structure by analyzing the compound’s FTIR spectrum. This was dependent on the property that the infrared (IR) absorbances of a compound are the result of the physical interactions between bonded sets of atoms in the compound’s structure. I hypothesized that different instances of the same substructures will either give similar spectral signatures or some pattern of spectral signatures that could be learned using machine learning. In this thesis I show that it is possible to use convolutional neural networks (CNN) to predict the presence or absence of substructures within a compound. Finally, I demonstrate a method of making predictions for the full structure of these compounds based on the substructure predictions and the compound’s FTIR spectrum.


Fourier-transform infrared spectroscopy, chemistry, chemical structure, chemical substructures, deep learning, convolutional neural networks, evolutionary optimization, deep Q-learning, reinforcement learning

Subject Categories

Artificial Intelligence and Robotics | Organic Chemistry | Other Computer Sciences


© Joshua D. Ellis

Open Access