मशीन लर्निंग साक्षात्कार प्रश्न 2026: एल्गोरिदम, मेट्रिक्स और डीप लर्निंग

⏱️3 min read · 596 words

2026 में मशीन लर्निंग साक्षात्कार प्रश्न सांख्यिकी, मॉडल चयन, मूल्यांकन मेट्रिक्स, सामान्य एल्गोरिदम, गहन शिक्षण अवधारणाएं और उत्पादन एमएल को कवर करते हैं। यह मार्गदर्शिका डेटा वैज्ञानिक और एमएल इंजीनियर भूमिकाओं के लिए सबसे अधिक पूछे जाने वाले एमएल प्रश्नों को शामिल करती है।

कोर एमएल अवधारणाएँ

1. पूर्वाग्रह-विचरण समझौता क्या है?

पक्षपात: ग़लत धारणाओं से त्रुटि – मॉडल बहुत सरल है, प्रशिक्षण डेटा कमज़ोर है
झगड़ा: संवेदनशीलता से प्रशिक्षण डेटा तक त्रुटि – मॉडल बहुत जटिल, ओवरफिट
अदला – बदली: पूर्वाग्रह कम करने से विचरण बढ़ता है और इसके विपरीत

मुद्दा	लक्षण	Fix
उच्च पूर्वाग्रह (अंडरफिटिंग)	कम ट्रेन और परीक्षण सटीकता	अधिक सुविधाएँ, जटिल मॉडल, लंबा प्रशिक्षण
उच्च विचरण (ओवरफिटिंग)	उच्च ट्रेन, कम परीक्षण सटीकता	अधिक डेटा, नियमितीकरण, ड्रॉपआउट, सरल मॉडल

2. परिशुद्धता, रिकॉल और F1 स्कोर की व्याख्या करें

# For binary classification:
# True Positive (TP): correctly predicted positive
# False Positive (FP): predicted positive, actually negative
# False Negative (FN): predicted negative, actually positive
# True Negative (TN): correctly predicted negative

# Precision = TP / (TP + FP)
# "Of all predicted positives, how many were actually positive?"
# Use when false positives are costly (spam filter: don't block legitimate email)

# Recall (Sensitivity) = TP / (TP + FN)
# "Of all actual positives, how many did we catch?"
# Use when false negatives are costly (cancer detection: don't miss cancer)

# F1 = 2 * (Precision * Recall) / (Precision + Recall)
# Harmonic mean — balanced metric when both matter

from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

y_true = [1, 1, 0, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]

print(f"Precision: {precision_score(y_true, y_pred):.2f}")  # 0.80
print(f"Recall: {recall_score(y_true, y_pred):.2f}")        # 0.80
print(f"F1: {f1_score(y_true, y_pred):.2f}")                # 0.80
print(classification_report(y_true, y_pred))

3. क्रॉस-वैलिडेशन क्या है और इसकी आवश्यकता क्यों है?

from sklearn.model_selection import KFold, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Problem: single train/test split can be misleading
# Solution: cross-validation — use all data for both training and testing

model = RandomForestClassifier(random_state=42)

# K-Fold (k=5 most common)
cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"CV scores: {cv_scores}")
print(f"Mean: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

# Stratified K-Fold — maintain class balance in each fold (for imbalanced data)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_scores = cross_val_score(model, X, y, cv=skf, scoring='f1')

# Key insight: cross-validation gives a DISTRIBUTION of scores, not a single number
# Use mean ± std to understand model reliability

4. नियमितीकरण (एल1, एल2, ड्रॉपआउट) समझाएं

from sklearn.linear_model import Lasso, Ridge, ElasticNet

# L1 (Lasso): penalty = lambda * |weights|
# Effect: sparse weights — some become exactly 0 (feature selection!)
lasso = Lasso(alpha=0.1)  # alpha = lambda

# L2 (Ridge): penalty = lambda * weights^2
# Effect: all weights shrink toward 0 but don't reach 0
ridge = Ridge(alpha=1.0)

# ElasticNet: combination of L1 + L2
elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)

# For deep learning: Dropout
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100, 256),
    nn.ReLU(),
    nn.Dropout(p=0.3),    # randomly zero 30% of neurons during training
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Dropout(p=0.2),
    nn.Linear(128, 1),
)
# At test time, dropout is automatically disabled

5. ग्रेडिएंट डिसेंट और इसके प्रकारों की व्याख्या करें

# Gradient Descent: update weights in direction that reduces loss
# weights = weights - learning_rate * gradient

# Batch Gradient Descent:
# - Uses ALL training data to compute gradient
# - Stable but slow for large datasets

# Stochastic Gradient Descent (SGD):
# - Uses ONE sample per update
# - Fast but noisy/unstable

# Mini-batch SGD (most common):
# - Uses a batch (32, 64, 128, 256 samples) per update
# - Balance of speed and stability

# Adam (most popular optimizer in 2026):
# - Adaptive learning rates per parameter
# - Combines momentum + RMSprop
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

# Learning rate schedulers
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)
# Reduces LR when validation loss stops improving

6. वर्गीकरण और प्रतिगमन के बीच क्या अंतर है?

वर्गीकरण: एक श्रेणी की भविष्यवाणी करें (स्पैम/स्पैम नहीं, बीमारी/कोई बीमारी नहीं)
वापसी: एक सतत मूल्य (घर की कीमत, तापमान, स्टॉक) की भविष्यवाणी करें
द्विआधारी वर्गीकरण: 2 कक्षाएं
बहुल वर्ग: 3+ कक्षाएं
बहु लेबल: प्रति उदाहरण एकाधिक लेबल

7. आप असंतुलित डेटासेट को कैसे संभालते हैं?

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.utils.class_weight import compute_class_weight

# Option 1: Oversampling (SMOTE — synthetic minority)
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X_train, y_train)

# Option 2: Undersampling (reduce majority class)
rus = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = rus.fit_resample(X_train, y_train)

# Option 3: Class weights (tell model to penalize minority mistakes more)
weights = compute_class_weight('balanced', classes=[0,1], y=y_train)
class_weight_dict = {0: weights[0], 1: weights[1]}

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(class_weight='balanced', random_state=42)

# Option 4: Use appropriate metric
# Don't use accuracy for imbalanced data!
# Use: F1, AUC-ROC, Precision-Recall AUC

8. ट्रांसफार्मर वास्तुकला की व्याख्या करें

ट्रांसफार्मर (“अटेंशन इज़ ऑल यू नीड” 2017 में प्रस्तुत) पावर एलएलएम, बीईआरटी, जीपीटी, और सबसे आधुनिक एनएलपी/विज़न मॉडल:

आत्मध्यान: प्रत्येक टोकन अन्य सभी टोकन को संभालता है – लंबी दूरी की निर्भरता को पकड़ता है
मल्टी-हेड ध्यान: एकाधिक ध्यान प्रमुख विभिन्न संबंधों को सीखते हैं
स्थितीय एन्कोडिंग: अनुक्रम स्थिति की जानकारी इंजेक्ट करता है
फ़ीड-फ़ॉरवर्ड परतें: उपस्थित अभ्यावेदन को रूपांतरित करें
एनकोडर-विकोडक: एनकोडर इनपुट को एन्कोड करता है; डिकोडर आउटपुट उत्पन्न करता है (अनुवाद, सारांश)
केवल डिकोडर: GPT-शैली, स्वचालित रूप से पाठ उत्पन्न करें

एमएल साक्षात्कार की सफलता: पूर्वाग्रह-विचरण ट्रेडऑफ को सहजता से जानें (अंडरफिटिंग बनाम ओवरफिटिंग), मूल्यांकन मेट्रिक्स को स्पष्ट रूप से समझाएं (परिशुद्धता बनाम रिकॉल बनाम एफ 1 का उपयोग कब करें), नियमितीकरण को समझें, और ग्रेडिएंट डिसेंट वेरिएंट के ज्ञान का प्रदर्शन करें। प्रोडक्शन एमएल प्रश्न फीचर इंजीनियरिंग, मॉडल मॉनिटरिंग और एमएलओपीएस पैटर्न को कवर करते हैं।

🔗 Share this article

X / Twitter Facebook WhatsApp LinkedIn Telegram