Erdem Önal

I am a first year MSc student in Computer Science (CPS2: AI and IoT) at Université Jean Monnet and Ecole Nationale Supérieure des Mines de Saint-Etienne.

I am interested in how machine learning can be integrated into intelligent systems.

Publications

Text Mining-Based Profiling of Chemical Environments in Protein-Ligand Binding Assays Across Analytical Techniques
Kalaycioğlu, Z., & Önal, E. (2025)
Chemometrics and Intelligent Laboratory Systems (under review)
Access the journal

Research

A Systematic Mapping Study on Green Load
Balancing Algorithms for Cloud Data Centers
Önal, E., Alkhori, F., & Schramm, M.
Read the paper

Notes

Gradient Descent
Gradient descent locates local minima by repeatedly stepping in the direction of the steepest decrease. It uses the formula xₙ₊₁ = xₙ - α∇f(xₙ), where α represents the learning rate. The negative gradient -∇f(x) indicates the direction of the most rapid drop in the function's value. Selecting the learning rate involves balancing how fast you converge and keeping things stable. If the rate is too small, convergence is slow. If the learning rate is too big, you might overshoot or diverge.
Hessian Matrix
The Hessian matrix, H, shows the local curvature of functions f(x₁, x₂, ..., xₙ). It's an n×n matrix where the entry (i,j) is ∂²f/∂xᵢ∂xⱼ. This matrix is symmetric if the second derivatives are continuous. In optimization, the Hessian at critical points tells you if the function has a local minimum (positive definite), a local maximum (negative definite), or a saddle point (indefinite).
Jacobian Matrix
The Jacobian matrix, J, is an m×n matrix that contains all the first order partial derivatives of a vector function f: ℝⁿ → ℝᵐ. The element in the (i,j) position is ∂fᵢ/∂xⱼ. The Jacobian gives you the best linear approximation of the function around a certain point. For square matrices, the Jacobian determinant lets you change variables in multiple integrals. If the determinant at a point isn't zero, the Inverse Function Theorem says the function can be locally inverted and has a continuously differentiable inverse.
K-Nearest Neighbors
K-Nearest Neighbors is a lazy learning method that makes predictions based on the K closest neighbors to a point. For classification, it uses a majority vote. For regression, it averages the values of the neighbors. The method measures distances between the query and training points, and then picks the K closest. Choosing K means finding a balance because a small K can be too sensitive to noise, but a big K makes boundaries smoother but might underfit the data. Odd K values are better in binary classification to prevent ties.
Lagrange Multipliers
Lagrange multipliers are used to solve constrained optimization problems. They minimize f(x) subject to g(x)=0 by creating the Lagrangian L(x,λ)=f(x)−λg(x). To find the solution, you set all partial derivatives to zero, which gives you ∇f(x)=λ∇g(x) and g(x)=0. The multiplier λ shows how the optimal value changes based on the constraint.
Logistic Regression
Logistic regression is a model used to classify things in machine learning. It starts by adding up the input features in a weighted way, plus a bias. This is written as z = w⋅x + b. Then, it uses the sigmoid function, which turns any number into a value between 0 and 1 using the formula g(z) = 1/(1 + e⁻ᶻ). The result can be seen as the likelihood that the input fits into a certain class. This makes logistic regression helpful for sorting inputs into categories.
Newton Raphson Method
The Newton Raphson method finds roots of the equation f(x)=0 using the iterative formula xₙ₊₁ = xₙ - f(xₙ)/f'(xₙ). Starting with a guess, x₀, each step uses the tangent line at the current point to guess where the function crosses the x axis. This method works when the initial guess is close enough to the root, f'(x)≠0 near the root, and the function changes smoothly. Common ways to check if it have converged are |xₙ₊₁ - xₙ| < ε or |f(xₙ)| < ε, where ε is a small value.
ReLU (Rectified Linear Unit)
ReLU is expressed as f(x) = max(0, x). It lets positive inputs go through as they are, but it outputs zero for negative inputs. It is good for calculations and helps with the vanishing gradient issue since its derivative is 1 for positive inputs and 0 for negative inputs. By setting negative values to zero, ReLU adds sparsity, which can speed up networks and reduce overfitting.
Regularization
Regularization stops overfitting by adding a penalty to the cost function that discourages big parameter values. For both linear and logistic regression, the regularized cost function is J(w,b) = original_cost + (λ/2m)∑wⱼ². The regularization parameter λ controls how to balance fitting the training data and keeping parameters small. In gradient descent, this adds λ/m·wⱼ to the derivative of wⱼ, which shrinks parameters by multiplying them by (1 - αλ/m) each time. Note that the bias parameter b isn't regularized, just the weight parameters w are penalized to prevent overfitting.
Softmax
Softmax changes a list of real numbers into a probability distribution, σ(zᵢ) = eᶻⁱ / ∑ⱼeᶻʲ. Each output is between 0 and 1, and they all add up to 1. This makes it good for multiclass classification in the output layer. Softmax makes differences between inputs bigger so that larger values get higher probabilities, and smaller values get lower ones. The function can be differentiated everywhere, so it works well for backpropagation.