Trends in Sciences

Trends Sci. 2026; 23(2): 11396

Feature‑Optimized Electronic‑Nose Classification of Fermented and Non‑Fermented Teas

Ummi Kaltsum^1,2, Kombo Othman Kombo^1,3, Roto Roto⁴,

Sholihun Sholihun¹ and Kuwat Triyana^1,*

¹Department of Physics, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada,

Yogyakarta 55281, Indonesia

²Department of Physics Education, Faculty of Education of Mathematics, Natural Sciences, and Information Technology, Universitas PGRI Semarang, Semarang 50125, Indonesia

³Department of Natural Sciences, College of Science and Technical Education,

Mbeya University of Science and Technology, Mbeya 3C58+6C9, Tanzania

⁴Department of Chemistry, Faculty of Mathematics and Natural Sciences, Universitas Gadjah Mada,

Yogyakarta 55281, Indonesia

(^*Corresponding author’s e-mail: [email protected])

Received: 26 July 2025, Revised: 21 August 2025, Accepted: 28 August 2025, Published: 10 November 2025

Abstract

Fermentation plays a crucial role in establishing the extraordinary variety of flavors, fragrances, colors, nutritional attributes, and health advantages associated with teas. Considering the substantial disparities in quality and consumer preferences, it is essential to differentiate between fermented (yellow, red, and black) and non-fermented (green) teas for quality assurance and market viability. This study proposes a method to enhance tea classification performance using an electronic nose (e-nose) system. Five feature extraction methods (max, mean, median, gradient, and standard deviation) were applied to capture informative signals from the e-nose data. A silhouette score-based feature selection technique was then used to guide the determination of the optimal combination of these features. The best performance was achieved by combining the mean, median, and gradient features with a linear discriminant analysis (LDA) model, reaching a testing accuracy of 0.85, precision of 0.86, recall of 0.86, and an AUC of 0.97. The confusion matrix indicated perfect classification for green tea, with only minor misclassifications between red and black teas. To validate the e-nose results, gas chromatography-mass spectrometry (GC-MS) was employed. It identified key marker compounds for differentiating tea types. For instance, (1R)-4,7,7-trimethylbicyclo[2.2.1]heptan-2-one was found exclusively in yellow, red, and black teas, while nickel tetracarbonyl was unique to green tea. Overall, the study highlights the benefits of using feature selection to enhance e-nose classification performance and supports its use as a reliable, non-invasive tool for distinguishing fermented and non-fermented teas.

Keywords: E-nose, Feature selection, Fermented tea, Non-fermented tea, LDA, Tea classification, Silhouette score

Introduction

After water, tea is one of the most consumed beverages in the world [1]. Its widespread popularity stems from its delicious taste, unique aroma, and notable health benefits [2]. Over the past decade, global tea production has increased by an average of 3.2% per year, making tea one of the world’s most traded

commodities and influencing the global economy [3]. Tea manufacturing involves several steps, from withering to drying. Based on these steps, tea is classified into 2 main categories, namely fermented and non-fermented. Non-fermented teas, commonly known as green teas, undergo minimal processing without any fermentation, resulting in their characteristic bright color and refreshing flavor. In contrast, fermented teas undergo a complex process involving fermentation, where enzymatic oxidation converts catechin compounds into theaflavins and thearubigins, leaving non-fermented teas with the highest catechin levels [4,5]. Fermented teas can be further classified by their degree of fermentation into slightly fermented (yellow tea), semi-fermented (red and oolong teas), fully fermented (black tea), and post-fermented (dark tea) [6,7]. These teas have a darker color, rich flavor, and distinct aroma [8]. The varying degrees of fermentation impart each tea category with unique tastes, aromas, colors, nutritional profiles, and health benefits [9,10]. However, not all tea products on the market provide accurate information about their fermentation degrees. Moreover, consumer knowledge concerning the degree of fermentation of tea is still limited. Therefore, differentiating teas by their degree of fermentation is crucial to ensure consumers get the product they expect [11].

Sensory evaluation by professional panels is commonly used to distinguish between fermented and non-fermented teas. However, this method is inherently subjective as human physical and mental conditions influence it [12-14]. Analytical techniques such as gas chromatography-mass spectrometry (GC-MS) and high-performance liquid chromatography (HPLC) are also utilized to complement this approach. GC-MS effectively identifies high-volatile compounds, while HPLC targets low-volatile compounds [15]. Despite their high sensitivity in compound identification, both methods have limitations, including time-consuming processes and high costs [16,17], which makes them less practical for rapid and large-scale tea classification. As an alternative approach, the electronic nose (e-nose) offers a faster, non-invasive, and cost-effective solution, making it beneficial for industries that require real-time quality control.

An e-nose is a device that mimics the principles of biological olfaction to identify and differentiate various odors [18]. This technology utilizes a non-invasive method for sample analysis, eliminating the need for chemical extraction or destruction of the sample [19]. Due to its objectivity, rapidity, and non-destructive nature, e-nose technology is widely used in tea differentiation and quality assessment [20-22]. The main components of the e-nose include sensor arrays and pattern recognition algorithms. The sensor array consists of nonspecific sensors that interact with a variety of chemical compounds [23,24]. These interactions lead to physicochemical changes in the sensing materials, generating electrical signals [25]. E-nose technology has been successfully applied across various fields, including medicine, agriculture, military systems, and indoor and outdoor monitoring [26]. In the food and beverage industry, e-noses are extensively utilized for quality control, process monitoring, contaminant identification, shelf-life assessment, and freshness detection [27]. Several studies have reported on these applications, such as monitoring the quality of dry-fermented sausages [28], evaluating the quality of bread enriched with cobia [29], testing the freshness of yogurt [30], identifying adulteration of raw bovine milk with soy milk [31], and investigating changes in the aroma of green tea during the drying process [32].

E-nose data commonly exhibit high dimensionality due to having multiple sensors and applying various feature extraction to each sensor’s response. This can lead to the curse of dimensionality, causing overfitting and increased model complexity [33]. One strategy to address this issue is to apply feature selection methods, which involve identifying the most informative subset of features relevant for classification.

This study introduces a novel systematic approach for selecting feature extraction method combinations based on silhouette scores computed in the linear discriminant analysis (LDA) space. Unlike previous studies that used fixed features [34,35] or selected them in the original space [36,37], this study assesses each feature’s contribution to class separation in the LDA space, which explicitly enhances class separability.

Five feature extraction methods (mean, median, standard deviation, gradient, and maximum) were used. The silhouette score was selected as the feature ranking criterion because it quantifies how well the transformed data points are grouped into distinct classes. Each method was evaluated by calculating its silhouette score in the LDA space. The features were then ranked based on their scores and combined sequentially according to this ranking. Each combination was tested using an LDA model, and the feature set that achieved the highest classification accuracy with the fewest features was selected. This strategy not only enhances classification performance but also maintains model efficiency and interpretability. To the best of the authors’ knowledge, the use of silhouette scores in the LDA space to guide the selection of feature extraction combinations for e-nose data has not been previously reported, particularly in the context of classifying tea based on the fermentation degree.

Accordingly, this study aimed to determine whether an e-nose, enhanced by silhouette score-based feature selection, can effectively classify teas based on their degree of fermentation. To achieve this, the optimal combination of statistical features was identified using silhouette scores, classification performance was evaluated across multiple machine learning models, and the results were validated with GC-MS analysis of volatile compounds.

This study contributes to developing a rapid and interpretable tool for differentiating fermented and non-fermented teas, while enhancing the broader understanding of complex aroma and flavor chemistry. It also holds considerable significance for the tea industry by promoting transparency and accurate labeling of tea products.

Materials and methods

Sample preparation

This study used tea samples consisting of 4 types (i.e., green, black, red, and yellow teas) collected from the leaves of Camellia sinensis var. assamica, sourced from a local tea plantation and factory in Kulon Progo area (7 39'04.36"S, 110 08'04.83"E), Yogyakarta, Indonesia. Each of the 4 tea types (green, yellow, red, black) contributed 100 samples, for 400 samples. These samples were collected from different production batches dated February, March, April, June, July, September, and November (2022 2023). The required sample size was determined using analysis of variance (ANOVA) with a confidence level of 95% (Z-value of 1.96), power of 0.95, effect size of 0.25, and k groups of 4 to ensure reliable statistical analysis [38]. Accordingly, 280 samples were considered sufficient to classify the 4 tea types. However, the sample size was expanded to 400 to account for potential sensor noise and batch variability, ensuring more reliable results. Table 1 details the tea samples used, including visuals, codes, degree of fermentation, manufacturing process, and composition.

Table 1 List of tea samples with their visuals, codes, degree of fermentation, manufacturing process, and composition.

Dried tea leaves	Green tea	Yellow tea	Red tea	Black tea
Sample visual
Samples codes	G	Y	R	B
Fermentation degree	Non-fermented	Light fermented	Semi-fermented	Full fermented
Manufacturing process	Withering-rolling-1^st drying-2^nd drying	Withering-firing-rolling-lightly fermented-drying	Withering-rolling-partially fermented-drying	Withering-rolling-fully fermented-drying
Composition*	70% leaf buds and young leaves, 30% older leaves	70% leaf buds and young leaves, 30% older leaves	59% leaf buds and young leaves, 41% older leaves	59% leaf buds and young leaves, 41% older leaves

*The composition of each tea (percentage of buds, young leaves, older leaves) was obtained from the tea factory’s standard recipe for that tea type

E-nose system

This study utilized a commercial portable e-nose (GeNose) developed by PT Swayasa Prakarsa in Yogyakarta, Indonesia. The device was equipped with 10 metal oxide semiconductor (MOS) gas sensors, each designed to selectively detect various volatile compounds, as listed in Table 2. These MOS sensors identified sample gases through a redox reaction, a type of chemical reaction that occurred between the active material of the MOS and gas molecules [39]. This interaction altered the potential barrier at the grain boundaries of the sensors, leading to a change in conductivity [40]. For each sample tested, the sensor array produced 10 response curves, effectively creating a unique fingerprint for each sample.

Table 2 Sensor array in GeNose [41].

Sensor	Target compound
S1	Carbon monoxide, ethanol, hydrogen, isobutane, and methane
S2	Ammonia, ethanol, hydrogen, hydrogen sulfide, and toluene
S3	Ethanol, hydrogen, isobutane, and methane
S4	Carbon monoxide, ethanol, hydrogen, isobutane, and methane
S5	Carbon monoxide, ethanol, hydrogen, isobutane, methane, and propane
S6	Carbon monoxide, ethanol, hydrogen, isobutane, methane, and propane
S7	Carbon monoxide, ethanol, hydrogen, and methane
S8	Acetone, benzene, carbon monoxide, ethanol, isobutane, methane, and n-hexane
S9	Ammonia, ethanol, hydrogen, and isobutane
S10	Chlorofluorocarbons, ethanol, and hydrofluorocarbons

E-nose measurements

Before e-nose measurements, the device was preheated for 30 min to ensure stable responses. Tea samples (2 g) were placed in 100 mL glass beakers, sealed for 5 min to allow gas to accumulate, and then connected to the e-nose device. The time phase settings were configured as follows: 5 s for baseline, 60 s for sampling, and 240 s for purging. The signals generated by the e-nose across these 3 phases were recorded using data acquisition software at 100 ms intervals. Each tea sample, therefore, yielded data with dimensions of 3050×10 (i.e., 3050 time-series data points for 10 sensing outputs). The samples were tested in random order, with results showing the voltage values for each sensor. The measurement and data analysis procedures are illustrated in Figure 1. In this study, all e-nose measurements were conducted at an ambient temperature of 28 ± 2 °C and a relative humidity of 78 ± 2%. The working environment temperature for the e-nose was set in the range of 18 35 °C. This temperature range could effectively detect volatile compounds while minimizing rapid evaporation, which can lead to inaccuracies in aroma analysis [41]. Additionally, tea stored at temperatures between 25 40 °C has been shown to exhibit minimal variation in volatile compound stability, which is beneficial for preserving aroma profiles during testing and storage [43]. Some studies measured tea samples using e-nose technology at a relative humidity of 75 85% to obtain results that are more practically relevant and representative of actual environmental conditions for tea [14,21,44].

Figure 1 The procedure of sample measurement and data analysis. Tea samples were analyzed using PCA and LDA models.

The e-nose sample testing yielded voltage values from the sensor array recorded during 3 phases (i.e., baseline, sampling, and purging). The voltage values obtained during the sampling phase were selected as raw data. These data were subjected to several preprocessing steps to extract vital information from sensor responses and prepare data for pattern recognition analysis [45]. Preprocessing steps included baseline correction, normalization, feature extraction, and feature selection, as shown in Figure 1. Baseline correction was performed by subtracting the baseline from the original sensor responses using Eq. (1) [46]. The adjusted sensor responses were subsequently normalized using scaling and centering methods (Eq. (2)) [47].

where y^s_i,j(t) is the response of i-th sensor to the j-th sample at time-t after baseline correction. x_i,j(t) is the response of i-th sensor to the j-th sample at time-t, x_i,j₍_baseline₎ is the response of i-th sensor to the j-th sample at baseline time, Y_i,j (t) is the normalized response of i-th sensor to the j-th sample at time-t, μ is mean, and σ is the standard deviation.

The normalized responses were subjected to feature extraction for the characterization of the sample [34]. The feature extraction used was statistical features, including maximum, mean, median, gradient, and standard deviation values, as outlined in Table 3. The use of these features was due to the following rationale. Firstly, the maximal value feature method captures the peak sensor response, reflecting the strongest interaction between the sensors and the sample by identifying the upper limit of sensor responses. The mean value method indicates the central tendency of the sensor responses, offering insights into the general behavior of the sensors across the sample. The median represents the center of the response distribution, providing a reliable measure for assessing stable sensor responses unaffected by extreme values. The gradient value approach measures the rate of change in sensor responses over time, capturing the dynamic interaction between the sensor and the sample. Finally, the standard deviation provides the variability of sensor responses around the mean [24,48].

The statistical feature methods were optimized through the following steps: First, the calculation of silhouette score: Silhouette score was selected as the feature ranking criterion because it quantifies how well the transformed data points are grouped into distinct classes by comparing the average distance of a sample to points in its class with the average distance to points in the nearest other class. In this study, the silhouette score was computed in the LDA space rather than in the original feature space, ensuring that the score reflects the contribution of each feature method to the optimal class separation achieved by the classifier. The calculation used only the training data to avoid information leakage. Each feature extraction was first applied individually, transformed into the LDA space, and then assigned a silhouette score. The silhouette score for each feature extraction was computed based on the performance of the LDA model using Eq. (3) [39].

With a(i) being the average distance from the point i to all points within the same cluster, b(i) being the average distance from the point i to all points within the nearest neighbor cluster, and max (a (i), b (i)) being the maximum value between a(i) and b(i). This score indicated cluster separation quality, with higher scores reflecting better-defined cluster structures. Second, ranking features: Feature extraction methods were ranked in descending order based on their silhouette scores. This ranking highlighted which feature extraction contributed most to distinguishing between different tea categories. Third, building feature combinations: Feature extraction combinations were created progressively, starting with the top-ranked feature extraction. Combinations were built step-by-step by adding 1 feature extraction at a time. For example, the 1^st combination included the top-ranked feature extraction, the 2^nd combination with the top 2 feature extractions, and so forth. Fourth, evaluation criteria of feature selection: Each feature extraction combination from step 3 was evaluated using the accuracy metric obtained from the LDA model. Fifth, the last step, is the selection of the optimal feature extraction. Feature selection was based on both the classification accuracy and the number of features. The goal was to identify a feature set that provided high accuracy while using as few features as possible, thereby reducing model complexity and the risk of overfitting.

Table 3. Feature extraction methods used and their description.

Feature extraction methods	Description
Maximal	(4)
Mean	(5)
Median	(6)
Gradient	(7)
Standard deviation	(8)

The data generated from each combination of feature extractions was subsequently fed into the LDA model. As an initial step, principal component analysis (PCA) was applied to reduce data dimensionality and visualize trends. Prior to machine learning analysis, all e-nose data (400 samples) were split into an 80% training and a 20% testing dataset using a random sampling method. The split was performed once using a fixed random seed to ensure reproducibility, resulting in 320 training and 80 testing samples. This split is widely recognized for effectively balancing the model ability to learn patterns and perform predictive validation, often resulting in statistically stable model performance [49,50]. Several studies have confirmed the effectiveness of this division technique in e-nose application [21,41,51]. Repeated 10-fold cross-validation was employed within the training set to evaluate the classification performance of each feature extraction combination. Silhouette scores were calculated from the training data and used to guide the feature selection process. The test set was reserved solely for final model evaluation to prevent any data leakage. The performance of the model was evaluated using the accuracy, precision, and recall, as shown in Eqs. (9) (11). Accuracy is defined as the ratio of correct predictions to the total number of predictions. Precision measures the proportion of true positive predictions among all positive predictions made by the model, while recall indicates the proportion of actual positives that were correctly identified. Here, TP, TN, FP, and FN represent true positives, true negatives, false positives, and false negatives, respectively.

GC-MS analysis

The tea sample (1 g) was placed in a 20 mL vial. GC-MS used in this study was a Thermo Scientific Trace 1310 Gas Chromatograph and a Thermo Scientific ISQ LT Single Quadrupole Mass Spectrometer. Volatile compounds were separated using capillary column HP-5MS UI (30 m×0.25 mm×0.25 μm) with an oven temperature programmer sequence: Initially held at 38 C for 1 min, ramped at 5 C/min to 150 C and held for 2 min, then ramped to 250 C at a rate of 10 C/min and held for 3 min. Helium (>99.99%) was used as carrier gas with a constant flow rate of 1 mL/min. The inlet type was split, with the temperature set at 200 °C. The mass spectrometer operated in electron ionization mode (EI) with an electron energy of 70 eV. The MS transfer line and ion source temperatures were set at 230 °C. The mass spectrometer operated in full-scan mode (m/z 25–400). Detection was achieved through an electron multiplier. Compounds were identified by comparing their mass spectra with the NIST library. The compounds detected by GC-MS were then analyzed using a PCA score plot.

Results and discussion

E-nose response

Tea samples were tested using an e-nose over a total measurement time of 305 s. This included a 5-second baseline, 60-second sampling, and 240-second purging. Each sensor in the device provided a unique response to each sample. Figure 2(a) shows the raw signal of the e-nose from tea samples, while Figure 2(b) presents the reconstructed signal after baseline correction. During the baseline phase (0–5 s), the e-nose sensors measured the background level of the environment without any tea sample present. In the sampling phase (5–65 s), the sensors detected changes in the environment due to the presence of the sample, continuing until they reached their peak response. The peak response indicated the maximum detection of the sample characteristics. Subsequently, during the purging phase (65–305 s), the environment was cleared, and the sensor response gradually returned to the baseline level [52,53].

Figure 2 (a) Raw e-nose signal from tea samples as a function of time, and (b) baseline-corrected e-nose signal from tea samples as a function of time.

Optimization of feature extraction methods

This study focused on optimizing combinations of feature extraction to improve tea classification. The optimization process was performed using feature selection based on silhouette scores, which enabled the selection of feature extractions with the highest separation performance. Silhouette scores for the 5 feature extraction methods (maximum, mean, median, gradient, and standard deviation) were computed using Eq. (3). Table 4 displays each feature extraction alongside its corresponding silhouette score, arranged from highest to lowest. A high silhouette score indicates that a feature extraction effectively separated fermented and non-fermented tea samples. Among the tested feature extraction methods, the mean value exhibited the best separation performance, with a silhouette score of 0.28, while the maximal value provided the lowest score of 0.18. Based on these scores, 5 combinations of feature extraction, including mean, mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max, were formed and subsequently utilized for further analysis. The silhouette values in the 0.18–0.28 range indicate only moderate cluster separation, likely due to overlapping aroma profiles among some tea types (e.g., red vs. black teas). This overlap in chemical profiles was also reflected in GC-MS results (Table S1 and Figure 6(a)). However, the relative differences in silhouette scores were still useful in evaluating the discriminatory potential of each feature extraction. We emphasize that the moderately low silhouette scores did not compromise classification robustness. The final LDA model achieved a high test accuracy of 0.85 (Table 5), indicating robust performance despite the moderate silhouette values.

Table 4. Feature extraction methods and their silhouette scores.

Feature extraction methods	Silhouette score
Mean	0.28
Med	0.27
Grad	0.22
Std	0.21
Max	0.18

Data exploration using PCA and LDA

In this study, we initially employed PCA to reduce data dimensionality and explore the trends of spectral data, aiming to achieve a qualitative overview of tea samples. Figure 3 (left) presents PCA visualization plots showing the 1^st 2 principal components (PC1 and PC2) acquired from 5 feature extraction combinations (i.e., mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max). PC1 captured between 67.21% and 83.31% of the data variance, and PC2 contributed an additional 12.47% to 20.35%. The 2 components revealed distinct clustering patterns. In all PCA plots, green tea samples consistently formed a distinct cluster with slight overlapping, primarily located at the positive side of PC1 across all 5 feature extraction combinations, showing its unique spectral characteristics. Meanwhile, yellow, red, and black teas, which were fermented, were positioned closely together and showed considerable overlap on both the positive and negative sides of PC1 and PC2, suggesting significant similarity in spectral features among these fermented teas.

Besides PCA, LDA was utilized through its 1^st 2 linear discriminant components, LD1 and LD2, to enhance data visualization and achieve clearer class separation among different tea types. Figure 3 (right) shows LDA score plots based on 5 feature extraction combinations (i.e., mean, mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max). As depicted, LD1 explained the majority of the variance, effectively maximized the distances between different tea types, and provided better separation. Notably, green tea formed a well-defined cluster along the positive side of LD1 in all 5 feature extraction combinations, effectively separating it from the fermented teas, which clustered along the negative side of LD1. Although some overlap still existed between yellow, red, and black teas, LDA provided improved discrimination compared to PCA, particularly in mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max combinations. In these cases, yellow tea exhibited slight separation along LD2, occupying the lower region of the negative LD2. However, red and black teas remained closely aligned, reflecting their spectral similarity.

Figure 3 PCA and LDA visualizations for classifying different tea samples with 5 feature extraction combinations, including (a) mean, (b) mean-med, (c) mean-med-grad, (d) mean-med-grad-std, (e) mean-med-grad-std-max. In the plots, B, G, R, and Y represent black, green, red, and yellow teas, respectively.

These results suggested that while PCA successfully separated green tea from fermented varieties along a combined PC1 and PC2 axis, it struggled to distinctly differentiate yellow, red, and black teas in all feature extraction combinations. In contrast, LDA proved more effective in distinguishing between tea types by focusing on class separability. It achieved clearer separation between green tea (non-fermented) and fermented teas, as well as notable distinctions among the fermented varieties, with more complex feature extraction combinations. The analysis highlights the effectiveness of the LDA method in differentiating tea types based on fermentation degree.

Impact of feature extraction method combinations on cluster separation

The effects of each feature extraction combination (i.e., mean, mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max) based on cluster separations were further explored by calculating Euclidean distances between cluster centroids (Eq. (12)), with centroids determined by the average of data points within each cluster (Eq. (13)) [54]. d (A, B) represents the distance between clusters A and B, n is the number of dimensions, C_q is the centroid of a cluster for the dimension q-th, k being the number of data points in the cluster, and x_p,qis the q -th component of the data point p-th.

Figure 4 depicts the distances between clusters in all feature extraction combinations for all samples. As can be observed, increasing the number of feature extractions generally enhances the cluster separation. This trend was evident for all cluster pairs except red and black clusters. The distance between red and black clusters remained unchanged with the increased number of feature extractions and had the smallest value among the other cluster pairs. This demonstrated a high similarity between red and black teas. In contrast, green and black clusters exhibited the most significant distance, which indicated a notable distinction between green and black teas. Furthermore, the separation between fermented and non-fermented tea clusters (green-yellow, green-red, green-black) was more significant than that among the fermented tea clusters (yellow-red, yellow-black, red-black). This indicated that yellow, red, and black teas, classified as fermented teas, were clustered together in 1 region and distinguishable from the green cluster (non-fermented tea). This clear separation emphasized the effectiveness of PCA and LDA in classifying teas based on their spectral characteristics, particularly highlighting the differences between fermented (yellow, red, black) and non-fermented (green) teas.

Figure 4 Effect of the feature extraction combinations on the distance between clusters: Green-yellow (G-Y), green-red (G-R), green-black (G-B), yellow-red (Y-R), yellow-black (Y-B), red-black (R-B), and fermented-non-fermented. Yellow, red, and black teas are categorized as fermented tea, while green tea is categorized as non-fermented tea.

Classification performance evaluation

For classification, the performance of each combination of feature extractions was evaluated using an LDA model. Table 5 presents the validation and testing results of the LDA model using 5 different combinations of feature extractions: Mean, mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max. The highest cross-validation accuracy (0.82 ± 0.05 - 0.06) was observed in 3 combinations: Mean, mean-med-grad-std, and mean-med-grad-std-max. Meanwhile, mean-med and mean-med-grad combinations each yielded a slightly lower accuracy of 0.81 ± 0.06. Notably, the mean-med-grad combination achieved the highest testing accuracy of 0.85, indicating better generalization and robustness. This suggests that the model built with the mean-med-grad features was able to capture the most relevant information for class separation while maintaining low model complexity. Therefore, considering both the testing performance and the number of features used, the mean-med-grad combination offered the most effective balance between classification accuracy and simplicity.

Table 5. The accuracy of the LDA model using 5 feature extraction method combinations evaluated using 10-fold cross-validation repeated 10 times.

Feature extraction method combinations	Validation ± std	Testing
Mean	0.82 ± 0.05	0.78
Mean-med	0.81 ± 0.06	0.80
Mean-med-grad	0.81 ± 0.06	0.85
Mean-med-grad-std	0.82 ± 0.06	0.84
Mean-med-grad-std-max	0.82 ± 0.05	0.81

The classification performance of the LDA model using 5 combinations of feature extractions was evaluated using additional metrics, namely precision, recall, the receiver operating characteristic (ROC) curve with AUC values, and confusion matrix analysis (Table 6 and Figure 5). As shown in Table 6, both precision and recall gradually increased with the addition of more feature extraction. The best performance was obtained with the mean-med-grad combination, achieving a precision and recall of 0,86. However, including more features did not further improve the performance and even led to a slight decrease. This suggests that while adding relevant features enhances the model’s discriminative ability, excessive or redundant features may introduce noise and reduce classification performance.

Table 6 Precision and recall of LDA model on the testing data using 5 feature method combinations.

Feature extraction method combinations	Precision	Recall
Mean	0.79	0.80
Mean-med	0.81	0.82
Mean-med-grad	0.86	0.86
Mean-med-grad-std	0.84	0.85
Mean-med-grad-std-max	0.81	0.83

To evaluate the model’s ability to distinguish between all classes, ROC curves were generated, and AUC values were computed. A one-vs-rest strategy was employed for the 4-class classification task, in which each class was compared against the others. Micro-averaging was then applied to calculate a single AUC value by averaging the true positive and false positive rates across all classes. This provided an overall measure of the model’s classification performance. The ROC plots revealed that the area under the curve (AUC) values increased with the addition of feature extractions. The AUC values for the combinations were 0.93, 0.95, 0.97, 0.96, and 0.96 for the mean, mean-med, mean-med-grad, mean-med-grad-std, and mean-med-grad-std-max combinations, respectively. The mean method alone had the lowest AUC value of 0.93, while the highest AUC of 0.97 was achieved with the mean-med-grad combination. Interestingly, the mean-med-grad-std and mean-med-grad-std-max combinations both yielded an AUC of 0.96, slightly lower than that of the mean-med-grad combination. The highest AUC value, obtained with the mean-med-grad combination, indicated that this method yielded better performance than the others. As shown in the confusion matrix in Figure 5, the LDA model with the mean-med-grad feature extraction combination correctly classified nearly all samples in each tea category. Every green tea sample was predicted correctly, with only minimal errors occurring between red and black tea classifications. This pattern aligns with the overlapping volatile compound profiles of these 2 tea types, as revealed by the GC–MS analysis.

In addition to LDA, the classification capability of the mean-med-grad combination was further evaluated using 4 other supervised learning models, including quadratic discriminant analysis (QDA), support vector machine (SVM), k-nearest neighbor (kNN), and random forest (RF). Table 7 presents the test set performance of these models. Among them, the LDA model achieved the highest accuracy (0.85), precision (0.86), and recall (0.86). The kNN model was the closest model to LDA, followed by RF, SVM, and QDA. Based on its strong performance and practical advantages, LDA was selected as the final model. It is computationally efficient and easy to implement, while also providing clear class separation in the discriminant space (LD1 and LD2), supporting better visualization and interpretation [34,55]. Due to its simplicity, interpretability, and competitive performance compared to more complex models, LDA is a suitable choice for tea classification using the e-nose system. Overall, these findings suggest that the LDA model with the mean-med-grad combination is an effective method for distinguishing fermented and non-fermented teas.

Table 7 Performance of several models using a selected combination of feature extraction method (mean-med-grad) on testing data

Model	Accuracy	Precision	Recall
LDA	0.85	0.86	0.86
QDA	0.76	0.78	0.79
SVM	0.82	0.82	0.84
kNN	0.84	0.84	0.85
RF	0.82	0.83	0.84

ROC curve and confusion matrix

Figure 5 Receiver Operating Characteristic (ROC) curve analysis and confusion matrix illustrating the performance of the LDA model on the testing data across 5 feature extraction combinations: (a) mean, (b) mean-med, (c) mean-med-grad, (d) mean-med-grad-std, and (e) mean-med-grad-std-max.

GC-MS analysis

Based on previous mass-spectrometry reports, teas with different fermentation degrees contain several compound groups. Green tea was characterized by the presence of amine, alcohol, aldehyde, ketone, alkane, ester, heterocyclic compound, and others, with geraniol, linalool, 3-methylbutanal, and pentanal identified as the main volatile compounds [56]. Yellow tea, on the other hand, contained aldehyde, alkane, ketone, ester, alcohol, heterocyclic compound, aromatic compound, phenol, and alkene, with geraniol, octyl 4-methoxycinnamate, 1,3-dimethylbenzene, and diisobutyl phthalate being the predominant volatile compounds [57]. Meanwhile, red tea was composed of alcohol, ketone, acid, ester, heterocyclic compound, and hydrocarbon, featuring acetic acid, benzaldehyde, furfural, and phenethyl alcohol as its primary volatile compounds [58]. Lastly, black tea exhibited the presence of alcohol, a heterocyclic compound, aldehyde, ketone, ester, acid, and olefins, with linalool, (E)-2-hexen-1-ol, acetone, and 2-butanone identified as its main volatile compounds [59].

In this study, the volatile compounds in various fermentation degrees of tea were analyzed by GC-MS. A total of 35 volatile compounds were identified from all samples. These compounds consisted of amine, alcohol, aldehyde, hydrocarbon, ester, acid, ketone, sulfide, and carbon, as presented in Table S1. The concentration of the compound was determined based on the peak area of the chromatogram, which is directly proportional to the compound’s concentration in the sample [60]. A compound with a high peak area in a sample is referred to as the main compound in that sample. Amine and alcohol groups were predominant, constituting approximately 70% to 90% of the total compounds. Green, yellow, red, and black teas contained 13, 13, 28, and 14 compounds, respectively.

Various amines (e.g., 2-aziridinylethylamine, 3-ethyl-1H-pyrrole) were predominantly detected by sensors S2 and S9, while alcohols and hydrocarbons were detected across all sensors of the e-nose. Aldehydes and sulfides were primarily detected by S2. Esters (e.g., acetic acid methyl ester, acetic acid 3-ethylcyclobutyl ester) found exclusively in yellow, red, and black teas, were captured by S2. Acids were identified by S2, S6, and S9, while ketones were mainly observed by S8. Notably, nickel tetracarbonyl, a compound unique to green tea, was detected by multiple sensors, including S1, S4, S5, S6, S7, and S8. A complete list of identified compounds and their corresponding sensor responses is provided in Table S1.

All tea samples contained (2-aziridinylethyl)amine, methyl alcohol, acetone, dimethyl sulfide, 2-methylpropanal, cis-3-cyclopentene-1,2-diol, 3-methylbutanal, and 2-methylbutanal. Several studies have also identified 2-methylbutanal and 3-methylbutanal in green, yellow, and black teas [57,59,61,62]. Additionally, (2-aziridinylethyl)amine has been found in black tea [63], acetone has been observed in green tea [56], and dimethyl sulfide has been detected in green, yellow, and black teas [64,65]. The presence of 2-methylpropanal has been reported in both green [56] and black teas [61].

All compounds in the tea samples were analyzed using a PCA score plot, as shown in Figure 6(a). PC1, accounting for 49.3% of the variance, and PC2, accounting for 16.9%, together explain a substantial portion of the data variability. The PCA plot reveals a distinct separation between green tea and the fermented teas (red, yellow, and black). An overlap was observed between red and black teas, indicating a similarity in their volatile compound profiles. The contribution of each volatile compound to PC1 and PC2 is shown in Figure 6(b). Compounds located far from the origin (0,0) have the greatest influence in separating the clusters, such as compounds 11, 12, and 13.

Several compounds with potential as biomarkers for tea types were presented in Figures 6(c) 6(h). Nickel tetracarbonyl was detected exclusively in green tea (Figure 6(c)) and was absent in fermented teas. This compound is likely associated with the minimal oxidation process in non-fermented teas, making it a key marker for identifying green tea. In contrast, (1R)-4,7,7-trimethylbicyclo[2.2.1]heptan-2-one is found in all fermented teas (yellow, red, and black) (Figure 6(d)), suggesting that it serves as a reliable marker of fermentation, likely produced during catechin oxidation. Cis-2-methylcyclohexyl acetate is present in both red and black teas and contributes to the chemical similarity observed between these 2 tea types (Figure 6(e)). Furthermore, (2-aziridinylethyl)amine was the main compound in all tea types, with the highest concentration found in black tea (Figure 6(f)). It was observed that an increase in the degree of fermentation tends to result in higher concentrations of 2-methylbutanal (Figure 6(g)). This compound is likely related to the increasing oxidation process in fermented teas. Meanwhile, an increase in the degree of fermentation tends to result in lower concentrations of methyl alcohol (Figure 6(h)). This is likely related to its conversion into secondary metabolites during the enzymatic oxidation that occurs in the fermentation process.

Figure 6 The metabolomic analysis performed using GC-MS: (a) PCA score plot, (b) PCA loading plot, (c) (h) representative volatile compounds identified as potential biomarkers indicative of tea types.

Integration of e-nose and GC-MS findings

The integration of the e-nose and GC-MS was reported to successfully identify various types of dark tea [66], detect aroma changes during yellow tea processing [67], and characterize volatile organic compounds in brown rice tea of different colors [68]. This study integrated e-nose and GC-MS results to

cross-validate the findings: The e-nose distinguished fermented vs. non-fermented teas based on aroma profiles, and GC-MS identified specific volatile compounds differentiating those same groups. The PCA visualization of the e-nose data aligned well with that of the GC-MS results, indicating consistent separation patterns across both analytical techniques. In particular, green tea appeared separated in both analyses, while red, black, and yellow teas showed overlapping clusters. This may be due to the presence of specific biomarker compounds in each tea category: Nickel tetracarbonyl in green tea, and (1R)-4,7,7-trimethylbicyclo[2.2.1]heptan-2-one in yellow, red, and black teas. Green tea was classified as non-fermented tea, while yellow, red, and black teas were categorized as fermented tea.

A significant overlap between red and black teas was observed in both the LDA visualization of the e-nose data and the PCA visualization of GC-MS results. This overlap suggests a chemical similarity between these 2 types of fermented tea, such as acetic acid 3-ethylcyclobutyl ester, 2,2-dimethylpropanehydrazide, and cis-2-methylcyclohexyl acetate. This could be due to similarities in the fermentation process, where red tea undergoes partial fermentation and black tea undergoes full fermentation, resulting in similar volatile compound profiles. This overlap highlights a potential limitation of the e-nose system, which may require further refinement (e.g., additional sensor types or more complex feature extraction techniques) to differentiate between teas with subtle chemical differences. In addition, while the model was developed using tea samples from multiple batches, all samples originated from the same geographic region. The absence of external validation using geographically independent samples may limit the generalizability of the classification model across teas from different environments. Moreover, variability among production batches and similarities in volatile profiles, particularly between red and black teas, posed challenges for clear separation. The limited geographic diversity and batch-to-batch variability of the samples are acknowledged as constraints on the generalizability of the study. Future work should include external validation with datasets from diverse regions to assess and improve the model’s broader applicability.

These consistent patterns observed between the e-nose and GC-MS analyses indicate that the e-nose did not merely capture arbitrary sensor responses, but rather detected aroma patterns that reflect the actual chemical composition of the samples. The GC-MS results provided mechanistic explanations for the observed aroma patterns, thereby validating and strengthening the interpretation of e-nose classifications. This study also explored the association between specific sensor responses and GC-MS-identified compounds to elucidate the chemical basis of e-nose discrimination. As shown in Table S1, sensor signals are linked to the presence of volatiles. While this mapping supports a mechanistic interpretation of classification results, it does not quantify correlation strength or direction. Further analysis using statistical metrics (e.g., Pearson correlation or regression) is needed to establish quantitative relationships between compound concentrations and sensor outputs.

The study demonstrates that e-nose and GC-MS effectively distinguish between fermented and non-fermented teas. The optimal classification has been accomplished by the e-nose with selected feature extractions and further validated with GC-MS analysis. These 2 methods complement each other in the task of tea classification: GC-MS provides chemical profiles of the teas, while e-nose offers rapid sensory evaluation. This study focuses on selecting feature extraction combinations using silhouette scores. The proposed method could be combined with other feature selection methods to improve the classification performance.

Conclusions

An e-nose integrated with an LDA model was developed to differentiate between fermented and non-fermented teas. Although the silhouette scores were relatively low, indicating weak class separation and notable overlap between red and black tea classes, the model still achieved good performance, with an accuracy of 0.85, precision of 0.86, recall of 0.86, and an AUC of 0.97 on the testing data. These results suggest that the proposed method has potential for rapid and non-invasive tea classification. However, its applicability may be limited by the use of samples from a single geographic region (Yogyakarta) and should be validated with more diverse datasets. Future research could focus on improving class separation, particularly between partially and fully fermented teas, and testing the system in real industrial and retail environments to assess its robustness under varied conditions.

Acknowledgements

This work was financially supported by the Ministry of Education and Culture of the Republic of Indonesia through BPI (Indonesian Education Scholarships), Pusat Pelayanan Pembiayaan dan Asesmen Pendidkan Tinggi (Center for Higher Education Funding and Assessment Ministry of Higher Education, Science, and Technology of Republic Indonesia), and LPDP (Indonesia Endowment Fund for Education).

Declaration of Generative AI in Scientific Writing

The authors recognize the use of the generative AI tool ChatGPT (developed by OpenAI) to assist with language refinement and grammatical corrections during the preparation of this manuscript. The AI was not involved in generating content or interpreting data. The authors assume full responsibility for the content and conclusions presented in this work.

CRediT Author Statement

Ummi Kaltsum: Data curation; Formal analysis; Investigation; Validation; Writing-original draft. Kombo Othman Kombo: Data curation; Formal analysis; Validation; Visualization. Roto Roto: Methodology; Validation; Visualization. Sholihun Sholihun: Methodology; Project administration; Resources; Supervision; Validation; Writing-original draft. Kuwat Triyana: Conceptualization; Methodology; Supervision; Validation; Funding acquisition; and Writing and Editing-original draft.

References

[1] L Xing, H Zhang, R Qi, R Tsao and Y Mine. Recent advances in the understanding of the health benefits and molecular mechanisms associated with green tea polyphenols. Journal of Agricultural and Food Chemistry 2019; 67(4), 1029-1043.

[2] S Kaushal, P Nayi, D Rahadian and HH Chen. Applications of electronic nose coupled with statistical and intelligent pattern recognition techniques for monitoring tea quality: A review. Agriculture 2022; 12(9), 1359.

[3] Food and Agriculture Organization. Current global market situation and medium-term outlook. Food and Agriculture Organization, Rome, Italy, 2024.

[4] HF He. Research progress on theaflavins: Efficacy, formation, and preparation. Food and Nutrition Research 2017; 61(1), 1344521.

[5] T Tanaka and Y Matsuo. Production mechanisms of black tea polyphenols. Chemical and Pharmaceutical Bulletin 2020; 68(12), 1131-1142.

[6] LEA Camargo, LS Pedroso, SC Vendrame, RM Mainardes and NM Khalil. Antioxidant and antifungal activities of Camellia sinensis (L.) Kuntze leaves obtained by different forms of production. Brazilian Journal of Biology 2016; 76(2), 428-434.

[7] J Xu, M Wang, J Zhao, YH Wang, Q Tang and IA Khan. Yellow tea (Camellia sinensis L.), a promising Chinese tea: Processing, chemical constituents and health benefits. Food Research International 2018; 107, 567-577.

[8] KRJ Pou. Fermentation: The key step in the processing of black tea. Journal of Biosystems Engineering 2016; 41(2), 85-92.

[9] A Shang, J Li, DD Zhou, RY Gan and HB Li. Molecular mechanisms underlying health benefits of tea compounds. Free Radical Biology and Medicine 2021; 172, 181-200.

[10] Z Feng, Y Li, M Li, Y Wang, L Zhang, X Wan and X Yang. Tea aroma formation from six model manufacturing processes. Food Chemistry 2019; 285, 347-354.

[11] LF Wang, JY Lee, JO Chung, JH Baik, S So and SK Park. Discrimination of teas with different degrees of fermentation by SPME-GC analysis of the characteristic volatile flavour compounds. Food Chemistry 2008; 109(1), 196-206.

[12] H Yuan, X Chen, Y Shao, Y Cheng, Y Yang, M Zhang, J Hua, J Li, Y Deng, J Wang, C Dong, Y Jiang, Z Xie and Z Wu. Quality evaluation of green and dark tea grade using electronic nose and multivariate statistical analysis. Journal of Food Science 2019; 84(12), 3411-3417.

[13] Q Chen, J Zhao, Z Chen, H Lin and DA Zhao. Discrimination of green tea quality using the electronic nose technique and the human panel test, comparison of linear and nonlinear classification tools. Sensors and Actuators B: Chemical 2011; 159(1), 294-300.

[14] SN Hidayat, K Triyana, I Fauzan, T Julian, D Lelono, Y Yusuf, N Ngadiman, ACA Veloso and AM Peres. The electronic nose coupled with chemometric tools for discriminating the quality of black tea samples in situ. Chemosensors 2019; 7(3), 29.

[15] AY Yashin, BV Nemzer, E Combet and YI Yashin. Determination of the chemical composition of tea by chromatographic methods: A review. Journal of Food Research 2015; 4(3), 56-88.

[16] EJ Want, BF Cravatt and G Siuzdak. The expanding role of mass spectrometry in metabolite profiling and characterization. ChemBioChem 2005; 6(11), 1941-1951.

[17] MW Dong. The essence of modern HPLC: Advantages, limitations, fundamentals, and opportunities. LCGC North America 2020; 31(6), 472-479.

[18] M Bernabei, S Pantalei and I Sherrington. Development of an artificial olfactory system for lubricant degradation monitoring. International Journal of Condition Monitoring and Diagnostic Engineering Management 2020; 23, 3-12.

[19] B Aouadi, JLZ Zaukuu, F Vitális, Z Bodor, O Fehér, Z Gillay, G Bazar and Z Kovacs. Historical evolution and food control achievements of near infrared spectroscopy, electronic nose, and electronic tongue—critical overview. Sensors 2020; 20(19), 5479.

[20] M Xu, J Wang and L Zhu. Tea quality evaluation by applying e-nose combined with chemometrics methods. Journal of Food Science and Technology 2021; 58(4), 1549-1561.

[21] X Lu, J Wang, G Lu, B Lin, M Chang and W He. Quality level identification of west lake longjing green tea using electronic nose. Sensors and Actuators, B: Chemical 2019; 301, 127056.

[22] Q Chen, A Liu, J Zhao and Q Ouyang. Classification of tea category using a portable electronic nose based on an odor imaging sensor array. Journal of Pharmaceutical and Biomedical Analysis 2013; 84, 77-83.

[23] S Kiani, S Minaei and M Ghasemi-Varnamkhasti. Application of electronic nose systems for assessing quality of medicinal and aromatic plant products: A review. Journal of Applied Research on Medicinal and Aromatic Plants 2016; 3(1), 1-9.

[24] J Yan, X Guo, S Duan, P Jia, L Wang, C Peng and S Zhang. Electronic nose feature extraction methods: A review. Sensors 2015; 15(11), 27804-27831.

[25] S Cui, P Ling, H Zhu and HM Keener. Plant pest detection using an artificial nose system: A review. Sensors 2018; 18(2), 378.

[26] D Karakaya, O Ulucan and M Turkan. Electronic nose and its applications: A survey. International Journal of Automation and Computing 2020; 17, 179-209.

[27] M Wang and Y Chen. Electronic nose and its application in the food industry: A review. European Food Research and Technology 2024; 250, 21-67.

[28] Y Hu, Y Li, X Li, H Zhang, Q Chen and B Kong. Application of lactic acid bacteria for improving the quality of reduced-salt dry fermented sausage: Texture, color, and flavor profiles. LWT 2022; 154, 112723.

[29] GA Fagundes, S Benedetti, MA Pagani, AM Fiorentini, J Severo and M Salas-Mellado. Electronic sensory assessment of bread enriched with cobia (Rachycentron canadum). Journal of Food Process Engineering 2022; 45(7), e13656.

[30] L Qiu, M Zhang, AS Mujumdar and L Chang. Effect of edible rose (Rosa rugosa cv. Plena) flower extract addition on the physicochemical, rheological, functional and sensory properties of set-type yogurt. Food Bioscience 2021; 43, 101249.

[31] H Tian, J Xiong, S Chen, H Yu, C Chen, J Huang, H Yuan and X Lou. Rapid identification of adulteration in raw bovine milk with soymilk by electronic nose and headspace-gas chromatography ion-mobility spectrometry. Food Chemistry: X 2023; 18, 100696.

[32] Y Yang, J Chen, Y Jiang, MC Qian, Y Deng, J Xie, J Li, J Wang, C Dong and H Yuan. Aroma dynamic characteristics during the drying process of green tea by gas phase electronic nose and gas chromatography-ion mobility spectrometry. LWT 2022; 154, 112691.

[33] S Nanga, AT Bawah, BA Acquaye, MI Billa, FD Baeta, NA Odai, SK Obeng and AD Nsiah. Review of dimension reduction methods. Journal of Data Analysis and Information Processing 2021; 9, 189-231.

[34] LA Putri, I Rahman, M Puspita, SN Hidayat, AB Dharmawan, A Rianjanu, S Wibirama, R Roto, K Triyana and HS Wasisto. Rapid analysis of meat floss origin using a supervised machine learning-based electronic nose towards food authentication. NPJ Science of Food 2023; 7(1), 31.

[35] GG Teixeira, LG Dias, N Rodrigues, ÍMG Marx, ACA Veloso, JA Pereira and AM Peres. Application of a lab-made electronic nose for extra virgin olive oils commercial classification according to the perceived fruitiness intensity. Talanta 2021; 226, 122122.

[36] C Zhang, W Wang and Y Pan. Enhancing electronic nose performance by feature selection using an improved grey wolf optimization based algorithm. Sensors 2020; 20(15), 4065.

[37] RA Putri, SI Sabilla and R Sarno. Optimal feature selection algorithm (FSA) for electronic nose signal. In: Proceedings of the 1^st International Conference on Advanced Engineering and Technologies, East Java, Indonesia. 2023, p. 310-315.

[38] B Langenberg, M Janczyk, V Koob, R Kliegl and A Mayer. A tutorial on using the paired t test for power calculations in repeated measures ANOVA with interactions. Behavior Research Methods 2023; 55, 2467-2484.

[39] T Lin, X Lv, Z Hu, A Xu and C Feng. Semiconductor metal oxides as chemoresistive sensors for detecting volatile organic compounds: A review. Sensors 2019; 19(2), 233.

[40] S Kim, G Singh, M Oh and K Lee. An analysis of a highly sensitive and selective hydrogen gas sensor based on a 3D Cu-doped SnO₂ sensing material by efficient electronic sensor interface. ACS Sensors 2021; 6(11), 4145-4155.

[41] SN Hidayat, T Julian, AB Dharmawan, M Puspita, L Chandra, A Rohman, M Julia, A Rianjanu, DK Nurputra, K Triyana and HS Wasisto. Hybrid Learning method based on feature clustering and scoring for enhanced COVID-19 breath analysis by an electronic nose. Artificial Intelligence in Medicine 2022; 129, 102323.

[42] Z Wu, H Wang, X Wang, H Zheng, Z Chen and C Meng. Development of electronic nose for qualitative and quantitative monitoring of volatile flammable liquids. Sensors 2020; 20(7), 1817.

[43] X Zhao, P Yu, N Zhong, H Huang and H Zheng. Impact of storage temperature on green tea quality: Insights from sensory analysis and chemical composition. Beverages 2024; 10(2), 35.

[44] Y Sun, J Wang and S Cheng. Discrimination among tea plants either with different invasive severities or different invasive times using MOS electronic nose combined with a new feature extraction method. Computers and Electronics in Agriculture 2017; 143, 293-301.

[45] R Gutierrez-Osuna, HT Nagle, B Kermani and SS Schiffman. Handbook of machine olfaction. WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim, Germany, 2004, p. 105-132.

[46] K Qian, Y Bao, J Zhu, J Wang and Z Wei. Development of a portable electronic nose based on a hybrid filter-wrapper method for identifying the Chinese dry-cured ham of different grades. Journal of Food Engineering 2021; 290, 110250.

[47] A Gorji-Chakespari, AM Nikbakht, F Sefidkon, M Ghasemi-Varnamkhasti and EL Valero. Classification of essential oil composition in Rosa damascena Mill. genotypes using an electronic nose. Journal of Applied Research on Medicinal and Aromatic Plants 2017; 4, 27-34.

[48] KO Kombo, SN Hidayat, M Puspita, A Kusumaatmaja, R Roto, H Nirwati, R Susilowati, EL Haksari, T Wibowo, S Wandita, Wahyono, M Julia and K Triyana. A machine learning-based electronic nose for detecting neonatal sepsis: Analysis of volatile organic compound biomarkers in fecal samples. Clinica Chimica Acta 2025; 565, 119974.

[49] K Korjus, MN Hebart and R Vicente. An efficient data partitioning to improve classification performance while keeping parameters interpretable. Plos One 2016; 11(8), e0161788.

[50] A Nazarkar, H Kuchulakanti, CS Paidimarry and S Kulkarni. Impact of various data splitting ratios on the performance of machine learning models in the classification of lung cancer. In: Proceedings of the 2^nd International Conference on Emerging Trends in Engineering, Hyderabad, India. 2023, p. 96-104.

[51] T Wang, H Zhang, Y Wu, W Jiang, X Chen, M Zeng, J Yang, Y Su, N Hu and Z Yang. Target discrimination, concentration prediction, and status judgment of electronic nose system based on large-scale measurement and multi-task deep learning. Sensors and Actuators B: Chemical 2022; 351, 130915.

[52] IC Valenzuela, LKS Tolentino and ROS Juan. Utilization of e-nose sensory modality as add-on feature for advanced driver assistance system. International Journal of Advanced Trends in Computer Science and Engineering 2019; 8(4), 1783-1788.

[53] P Borowik, T Grzywacz, R Tarakowski, M Tkaczyk, S Ślusarski, V Dyshko and T Oszako. Development of a low-cost electronic nose with an open sensor chamber: Application to detection of Ciboria batschiana. Sensors 2023; 23(2), 627.

[54] J Bu, W Liu, Z Pan and K Ling. Comparative study of hydrochemical classification based on different hierarchical cluster analysis methods. International Journal of Environmental Research and Public Health 2020; 17(24), 9515.

[55] R Graf, M Zeldovich and S Friedrich. Comparing linear discriminant analysis and supervised learning algorithms for binary classification—A method comparison study. Biometrical Journal 2024; 66(1), 2200098.

[56] N Liu, S Shen, L Huang, G Deng, Y Wei, J Ning and Y Wang. Revelation of volatile contributions in green teas with different aroma types by GC-MS and GC-IMS. Food Research International 2023; 169, 112845.

[57] Y Shi, M Wang, Z Dong, Y Zhu, J Shi, W Ma, Z Lin and H Lv. Volatile components and key odorants of Chinese yellow tea (Camellia sinensis). LWT 2021; 146, 111512.

[58] W Hu, G Wang, S Lin, Z Liu, P Wang, J Li, Q Zhang and H He. Digital evaluation of aroma intensity and odor characteristics of tea with different types—based on OAV-splitting method. Foods 2022; 11(15), 2204.

[59] Y Yang, H Zhu, J Chen, J Xie, S Shen, Y Deng, J Zhu, H Yuan and Y Jiang. Characterization of the key aroma compounds in black teas with different aroma types by using gas chromatography electronic nose, gas chromatography-ion mobility spectrometry, and odor activity value analysis. LWT 2022; 163, 113492.

[60] X Chang, Y Long, C Wang and Y Xiao. Chemical fingerprinting of volatile organic compounds from asphalt binder for quantitative detection. Construction and Building Materials 2023; 371, 130766.

[61] V Kraujalyte, E Pelvan and C Alasalvar. Volatile compounds and sensory characteristics of various instant teas produced from black tea. Food Chemistry 2016; 194, 864-872.

[62] JY Jeon and SH Choi. Aroma characteristics of dried citrus fruits-blended green tea. Journal of Life Science 2011; 21(5), 739-745.

[63] JS Gong, C Tang and CX Peng. Characterization of the chemical differences between solvent extracts from Pu-erh tea and Dian Hong black tea by CP-Py-GC/MS. Journal of Analytical and Applied Pyrolysis 2012; 95, 189-197.

[64] X Zhai, J Wang, H Wang, M Xue, X Yao, M Li, J Yu, L Zhang and X Wan. Formation of dimethyl sulfide from the decomposition of S-methylmethionine in tea (Camellia sinensis) during manufacturing process and infusion brewing. Food Research International 2022; 162, 112106.

[65] J Chen, Y Yang, Y Deng, Z Liu, J Xie, S Shen, H Yuan and Y Jiang. Aroma quality evaluation of Dianhong black tea infusions by the combination of rapid gas phase electronic nose and multivariate statistical analysis. LWT 2022; 153, 112496.

[66] S Gong, Z Zhang, J Chen, H Wu, H Jiang, C Teng and Z Dai. Enhanced understanding of dark tea quality through integrated GC-IMS and e-nose analysis. LWT 2025; 224, 117806.

[67] X Guo, W Schwab, CT Ho, C Song and X Wan. Characterization of the changes of aroma profiles in large-leaf yellow tea during processing using GC-MS and electronic nose analysis. Food Chemistry: X 2025; 27, 102507.

[68] L Zhou, W Zheng, Y Sui, Z Zhu, S Li, J Shi, T Xiong, F Cai, J Wen, Z Zheng and X Mei. Characterization of volatile organic compounds in selenium-enriched brown rice tea of different colors using e-nose, HS-GC-IMS and HS-SPME-GC-MS. LWT 2025; 224, 117830.