Training Workflow and Sample Selection

The goal is to build a useful classifier quickly, not to create a perfect dataset before the first training run.

In practice, the fastest path is iterative: sample selection, training, result review, targeted corrections, and retraining.

Be consistent before you scale up

Decide early what should count as a positive or negative example for the selected workflow.

For cell counting and classification projects, positive means marker-expressing cell and negative means marker-negative cell.

For independent-detection projects, positive means accepted marker-positive cell and negative means rejected candidate, false positive, or background.

Apply the chosen rule consistently throughout labeling.

Start with a small set of slices

It is usually best to start with 3 to 5 slices unless there is more visual variety to cover.

Ideally, those slices should cover multiple animals when possible.

If you spread samples across too many slices too early, later correction becomes harder.

Start with an initial sample set

A practical starting point is around 200 samples total.

For cell counting and classification projects, a first target of around 100 positive and 100 negative samples is often useful.

For independent-detection projects, include clear accepted cells and clear rejected candidates. The right balance depends on how many false positives the detector produces.

Use region-based sampling as the main workflow

Rectangle selection is the main workflow for fast sample collection.

Select an interesting region and label the sampled objects as one class, then correct sampled objects that belong to the other class.

Batches of around 20 to 50 cells or candidates usually work well.

Train the first model early

Once you have the first around 200 samples, train the first model.

The purpose of the first model is to show what the classifier has learned and where it still fails.

Use imported models as a starting point when appropriate

If a suitable existing model is available, you can import it before collecting a full training set.

Imported models are still project-specific decisions. Review the output on your own slices and add training samples if the model misses relevant cells, accepts too much background, or behaves inconsistently across animals or regions.

Review the classifier output on real slices

Inspect representative slices qualitatively after training or importing a model.

Look for systematic mistakes, not only isolated errors.

For independent-detection projects, pay close attention to whether false-positive candidates are being rejected and accepted marker-positive cells are retained.

Use error-mining to improve the model

The most effective next step is usually to add training samples from the model's mistakes.

Correct misclassified cells or candidates and use them as new training samples.

Focus on quality over quantity

Usually, once you reach around 2000 training samples, classification does not improve much from simply adding more.

At that point, focus on sample quality, consistency, and difficult cases rather than quantity.