أسئلة المقابلة للتعلم الآلي 2026: الخوارزميات والمقاييس والتعلم العميق

⏱️3 min read · 598 words

تغطي أسئلة المقابلة الخاصة بالتعلم الآلي في عام 2026 الإحصائيات واختيار النماذج ومقاييس التقييم والخوارزميات الشائعة ومفاهيم التعلم العميق وتعلم الآلة للإنتاج. يغطي هذا الدليل الأسئلة الأكثر شيوعًا حول تعلم الآلة لأدوار عالم البيانات ومهندس تعلم الآلة.

مفاهيم ML الأساسية

1. ما هي مقايضة التحيز والتباين؟

تحيز: خطأ من الافتراضات الخاطئة – النموذج بسيط للغاية، ولا يتناسب مع بيانات التدريب
التباين: خطأ بسبب الحساسية لبيانات التدريب – النموذج معقد للغاية، ومبالغ في التناسب
التنازل عن ميزة ممن أجل الحصول على أخرى: تقليل التحيز يؤدي إلى زيادة التباين والعكس صحيح

مشكلة	علامات	Fix
انحياز عالي (نقص التجهيز)	انخفاض دقة التدريب والاختبار	المزيد من الميزات، نموذج معقد، تدريب أطول
التباين العالي (التجهيز الزائد)	تدريب عالي، دقة اختبار منخفضة	مزيد من البيانات، والتنظيم، والتسرب، ونموذج أبسط

2. شرح الدقة والاستدعاء ودرجة F1

# For binary classification:
# True Positive (TP): correctly predicted positive
# False Positive (FP): predicted positive, actually negative
# False Negative (FN): predicted negative, actually positive
# True Negative (TN): correctly predicted negative

# Precision = TP / (TP + FP)
# "Of all predicted positives, how many were actually positive?"
# Use when false positives are costly (spam filter: don't block legitimate email)

# Recall (Sensitivity) = TP / (TP + FN)
# "Of all actual positives, how many did we catch?"
# Use when false negatives are costly (cancer detection: don't miss cancer)

# F1 = 2 * (Precision * Recall) / (Precision + Recall)
# Harmonic mean — balanced metric when both matter

from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

y_true = [1, 1, 0, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]

print(f"Precision: {precision_score(y_true, y_pred):.2f}")  # 0.80
print(f"Recall: {recall_score(y_true, y_pred):.2f}")        # 0.80
print(f"F1: {f1_score(y_true, y_pred):.2f}")                # 0.80
print(classification_report(y_true, y_pred))

3. ما هو التحقق المتبادل ولماذا هو مطلوب؟

from sklearn.model_selection import KFold, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Problem: single train/test split can be misleading
# Solution: cross-validation — use all data for both training and testing

model = RandomForestClassifier(random_state=42)

# K-Fold (k=5 most common)
cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"CV scores: {cv_scores}")
print(f"Mean: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

# Stratified K-Fold — maintain class balance in each fold (for imbalanced data)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_scores = cross_val_score(model, X, y, cv=skf, scoring='f1')

# Key insight: cross-validation gives a DISTRIBUTION of scores, not a single number
# Use mean ± std to understand model reliability

4. شرح التنظيم (L1، L2، التسرب)

from sklearn.linear_model import Lasso, Ridge, ElasticNet

# L1 (Lasso): penalty = lambda * |weights|
# Effect: sparse weights — some become exactly 0 (feature selection!)
lasso = Lasso(alpha=0.1)  # alpha = lambda

# L2 (Ridge): penalty = lambda * weights^2
# Effect: all weights shrink toward 0 but don't reach 0
ridge = Ridge(alpha=1.0)

# ElasticNet: combination of L1 + L2
elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)

# For deep learning: Dropout
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100, 256),
    nn.ReLU(),
    nn.Dropout(p=0.3),    # randomly zero 30% of neurons during training
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Dropout(p=0.2),
    nn.Linear(128, 1),
)
# At test time, dropout is automatically disabled

5. شرح النسب المتدرج ومتغيراته

# Gradient Descent: update weights in direction that reduces loss
# weights = weights - learning_rate * gradient

# Batch Gradient Descent:
# - Uses ALL training data to compute gradient
# - Stable but slow for large datasets

# Stochastic Gradient Descent (SGD):
# - Uses ONE sample per update
# - Fast but noisy/unstable

# Mini-batch SGD (most common):
# - Uses a batch (32, 64, 128, 256 samples) per update
# - Balance of speed and stability

# Adam (most popular optimizer in 2026):
# - Adaptive learning rates per parameter
# - Combines momentum + RMSprop
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

# Learning rate schedulers
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)
# Reduces LR when validation loss stops improving

6. ما الفرق بين التصنيف والانحدار؟

تصنيف: توقع فئة (البريد العشوائي/ليس البريد العشوائي، المرض/لا يوجد مرض)
الانحدار: توقع قيمة مستمرة (سعر المنزل، درجة الحرارة، المخزون)
التصنيف الثنائي: 2 فصول
متعدد الطبقات: 3+ فصول
متعدد التسمية: تسميات متعددة لكل مثال

7. كيف تتعامل مع مجموعات البيانات غير المتوازنة؟

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.utils.class_weight import compute_class_weight

# Option 1: Oversampling (SMOTE — synthetic minority)
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X_train, y_train)

# Option 2: Undersampling (reduce majority class)
rus = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = rus.fit_resample(X_train, y_train)

# Option 3: Class weights (tell model to penalize minority mistakes more)
weights = compute_class_weight('balanced', classes=[0,1], y=y_train)
class_weight_dict = {0: weights[0], 1: weights[1]}

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(class_weight='balanced', random_state=42)

# Option 4: Use appropriate metric
# Don't use accuracy for imbalanced data!
# Use: F1, AUC-ROC, Precision-Recall AUC

8. شرح بنية المحولات

المحولات (التي تم تقديمها في “الانتباه هو كل ما تحتاجه” 2017) تعمل على تشغيل LLMs وBERT وGPT ومعظم نماذج البرمجة اللغوية العصبية/الرؤية الحديثة:

الاهتمام الذاتي: يعتني كل رمز بجميع الرموز المميزة الأخرى — ويلتقط التبعيات طويلة المدى
اهتمام متعدد الرؤوس: الرؤوس المتعددة الانتباه تتعلم علاقات مختلفة
الترميز الموضعي: لإدخال معلومات موضع التسلسل
طبقات التغذية إلى الأمام: تحويل حضر التمثيل
التشفير-فك التشفير: يقوم جهاز التشفير بتشفير الإدخال؛ وحدة فك التشفير تولد المخرجات (الترجمة والتلخيص)
وحدة فك التشفير فقط: نمط GPT، قم بإنشاء النص بشكل انحداري

نجاح مقابلة تعلم الآلة: تعرف على مقايضة التباين والتحيز بشكل بديهي (نقص التجهيز مقابل الإفراط في التجهيز)، وشرح مقاييس التقييم بوضوح (متى تستخدم الدقة مقابل الاستدعاء مقابل F1)، وفهم التنظيم، وإظهار المعرفة بمتغيرات النسب المتدرجة. تغطي أسئلة ML الخاصة بالإنتاج هندسة الميزات ومراقبة النماذج وأنماط MLOps.

🔗 Share this article

X / Twitter Facebook WhatsApp LinkedIn Telegram