মেশিন লার্নিং ইন্টারভিউ প্রশ্ন 2026: অ্যালগরিদম, মেট্রিক্স এবং ডিপ লার্নিং

⏱️3 min read · 606 words

মেশিন লার্নিং ইন্টারভিউ প্রশ্ন 2026 কভার পরিসংখ্যান, মডেল নির্বাচন, মূল্যায়ন মেট্রিক্স, সাধারণ অ্যালগরিদম, গভীর শিক্ষার ধারণা, এবং উত্পাদন ML. এই নির্দেশিকাটি ডেটা সায়েন্টিস্ট এবং এমএল ইঞ্জিনিয়ারের ভূমিকার জন্য সর্বাধিক জিজ্ঞাসিত এমএল প্রশ্নগুলিকে কভার করে৷

মূল এমএল ধারণা

1. পক্ষপাত-ভ্যারিয়েন্স ট্রেডঅফ কি?

পক্ষপাত: ভুল অনুমান থেকে ত্রুটি — মডেলটি খুব সহজ, প্রশিক্ষণের ডেটা কম
ভিন্নতা: প্রশিক্ষণ ডেটাতে সংবেদনশীলতা থেকে ত্রুটি — মডেল খুব জটিল, ওভারফিট৷
ট্রেডঅফ: পক্ষপাত কমানো প্রকরণ বাড়ায় এবং তদ্বিপরীত

ইস্যু	চিহ্ন	Fix
উচ্চ পক্ষপাত (আন্ডারফিটিং)	কম ট্রেন এবং পরীক্ষার নির্ভুলতা	আরও বৈশিষ্ট্য, জটিল মডেল, দীর্ঘ প্রশিক্ষণ
উচ্চ বৈচিত্র্য (ওভারফিটিং)	উচ্চ ট্রেন, কম পরীক্ষার নির্ভুলতা	আরও ডেটা, নিয়মিতকরণ, ড্রপআউট, সহজ মডেল

2. নির্ভুলতা, প্রত্যাহার এবং F1 স্কোর ব্যাখ্যা করুন

# For binary classification:
# True Positive (TP): correctly predicted positive
# False Positive (FP): predicted positive, actually negative
# False Negative (FN): predicted negative, actually positive
# True Negative (TN): correctly predicted negative

# Precision = TP / (TP + FP)
# "Of all predicted positives, how many were actually positive?"
# Use when false positives are costly (spam filter: don't block legitimate email)

# Recall (Sensitivity) = TP / (TP + FN)
# "Of all actual positives, how many did we catch?"
# Use when false negatives are costly (cancer detection: don't miss cancer)

# F1 = 2 * (Precision * Recall) / (Precision + Recall)
# Harmonic mean — balanced metric when both matter

from sklearn.metrics import precision_score, recall_score, f1_score, classification_report

y_true = [1, 1, 0, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 0, 1, 0, 1, 1, 0, 1, 0]

print(f"Precision: {precision_score(y_true, y_pred):.2f}")  # 0.80
print(f"Recall: {recall_score(y_true, y_pred):.2f}")        # 0.80
print(f"F1: {f1_score(y_true, y_pred):.2f}")                # 0.80
print(classification_report(y_true, y_pred))

3. ক্রস-ভ্যালিডেশন কি এবং কেন এটি প্রয়োজন?

from sklearn.model_selection import KFold, StratifiedKFold, cross_val_score
from sklearn.ensemble import RandomForestClassifier

# Problem: single train/test split can be misleading
# Solution: cross-validation — use all data for both training and testing

model = RandomForestClassifier(random_state=42)

# K-Fold (k=5 most common)
cv_scores = cross_val_score(model, X, y, cv=5, scoring='accuracy')
print(f"CV scores: {cv_scores}")
print(f"Mean: {cv_scores.mean():.3f} (+/- {cv_scores.std():.3f})")

# Stratified K-Fold — maintain class balance in each fold (for imbalanced data)
skf = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_scores = cross_val_score(model, X, y, cv=skf, scoring='f1')

# Key insight: cross-validation gives a DISTRIBUTION of scores, not a single number
# Use mean ± std to understand model reliability

4. নিয়মিতকরণ ব্যাখ্যা করুন (L1, L2, ড্রপআউট)

from sklearn.linear_model import Lasso, Ridge, ElasticNet

# L1 (Lasso): penalty = lambda * |weights|
# Effect: sparse weights — some become exactly 0 (feature selection!)
lasso = Lasso(alpha=0.1)  # alpha = lambda

# L2 (Ridge): penalty = lambda * weights^2
# Effect: all weights shrink toward 0 but don't reach 0
ridge = Ridge(alpha=1.0)

# ElasticNet: combination of L1 + L2
elastic = ElasticNet(alpha=0.1, l1_ratio=0.5)

# For deep learning: Dropout
import torch.nn as nn

model = nn.Sequential(
    nn.Linear(100, 256),
    nn.ReLU(),
    nn.Dropout(p=0.3),    # randomly zero 30% of neurons during training
    nn.Linear(256, 128),
    nn.ReLU(),
    nn.Dropout(p=0.2),
    nn.Linear(128, 1),
)
# At test time, dropout is automatically disabled

5. গ্রেডিয়েন্ট ডিসেন্ট এবং এর রূপগুলি ব্যাখ্যা কর

# Gradient Descent: update weights in direction that reduces loss
# weights = weights - learning_rate * gradient

# Batch Gradient Descent:
# - Uses ALL training data to compute gradient
# - Stable but slow for large datasets

# Stochastic Gradient Descent (SGD):
# - Uses ONE sample per update
# - Fast but noisy/unstable

# Mini-batch SGD (most common):
# - Uses a batch (32, 64, 128, 256 samples) per update
# - Balance of speed and stability

# Adam (most popular optimizer in 2026):
# - Adaptive learning rates per parameter
# - Combines momentum + RMSprop
import torch.optim as optim
optimizer = optim.Adam(model.parameters(), lr=1e-3, weight_decay=1e-4)

# Learning rate schedulers
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, factor=0.5)
# Reduces LR when validation loss stops improving

6. শ্রেণীবিভাগ এবং রিগ্রেশন মধ্যে পার্থক্য কি?

শ্রেণীবিভাগ: একটি বিভাগ ভবিষ্যদ্বাণী করুন (স্প্যাম/স্প্যাম নয়, রোগ/কোন রোগ নেই)
রিগ্রেশন: একটি অবিচ্ছিন্ন মান অনুমান করুন (বাড়ির দাম, তাপমাত্রা, স্টক)
বাইনারি শ্রেণীবিভাগ: 2টি ক্লাস
মাল্টি ক্লাস: 3+ ক্লাস
বহু-লেবেল: উদাহরণ প্রতি একাধিক লেবেল

7. আপনি কীভাবে ভারসাম্যহীন ডেটাসেটগুলি পরিচালনা করবেন?

from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.utils.class_weight import compute_class_weight

# Option 1: Oversampling (SMOTE — synthetic minority)
sm = SMOTE(random_state=42)
X_resampled, y_resampled = sm.fit_resample(X_train, y_train)

# Option 2: Undersampling (reduce majority class)
rus = RandomUnderSampler(random_state=42)
X_resampled, y_resampled = rus.fit_resample(X_train, y_train)

# Option 3: Class weights (tell model to penalize minority mistakes more)
weights = compute_class_weight('balanced', classes=[0,1], y=y_train)
class_weight_dict = {0: weights[0], 1: weights[1]}

from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier(class_weight='balanced', random_state=42)

# Option 4: Use appropriate metric
# Don't use accuracy for imbalanced data!
# Use: F1, AUC-ROC, Precision-Recall AUC

8. ট্রান্সফরমার আর্কিটেকচার ব্যাখ্যা কর

ট্রান্সফরমার (“অ্যাটেনশন ইজ অল ইউ নিড” 2017-এ প্রবর্তিত) পাওয়ার LLM, BERT, GPT, এবং সবচেয়ে আধুনিক NLP/ভিশন মডেল:

স্ব-মনোযোগ: প্রতিটি টোকেন অন্য সব টোকেনে উপস্থিত থাকে — দীর্ঘ-পরিসর নির্ভরতা ক্যাপচার করে
মাল্টি-মাথা মনোযোগ: একাধিক মনোযোগ মাথা বিভিন্ন সম্পর্ক শিখতে
অবস্থানগত এনকোডিং: অনুক্রম অবস্থান তথ্য ইনজেকশনের
ফিড-ফরোয়ার্ড স্তর: উপস্থিত প্রতিনিধিত্ব রূপান্তর
এনকোডার-ডিকোডার: এনকোডার ইনপুট এনকোড করে; ডিকোডার আউটপুট তৈরি করে (অনুবাদ, সংক্ষিপ্তকরণ)
শুধুমাত্র ডিকোডার: GPT-শৈলী, স্বয়ংক্রিয়ভাবে পাঠ্য তৈরি করুন

ML সাক্ষাত্কারের সাফল্য: পক্ষপাত-ভেরিয়েন্স ট্রেডঅফকে স্বজ্ঞাতভাবে জানুন (আন্ডারফিটিং বনাম ওভারফিটিং), মূল্যায়নের মেট্রিক্স পরিষ্কারভাবে ব্যাখ্যা করুন (কখন নির্ভুলতা বনাম রিকল বনাম F1 ব্যবহার করবেন), নিয়মিতকরণ বুঝুন এবং গ্রেডিয়েন্ট ডিসেন্ট ভেরিয়েন্টের জ্ঞান প্রদর্শন করুন। প্রোডাকশন এমএল প্রশ্নগুলি ফিচার ইঞ্জিনিয়ারিং, মডেল মনিটরিং এবং MLOps প্যাটার্নগুলি কভার করে।

🔗 Share this article

X / Twitter Facebook WhatsApp LinkedIn Telegram