The output is a dataframe that has been back-transformed from \(\log_{2}\), with integers retained.
Users do not need to input parameters like auto_mode and default_input. During the function execution, there will be interactive outputs. Users can choose sample types that are more numerous and filter out samples that are less numerous. This prevents the noise from individual data points from being too large or affecting batch correction.
Please pay special attention! The primary purpose of the examples in this chapter is merely to illustrate using public data. However, in actual applications, different batches and groupings might vary. The premise for batch correction in the functions shown in this chapter is based on a broad assumption that different tumor types have different batches. Users can replace it based on their actual situation.
If the database lacks the target sample type, users can input “skip” in the default_input parameter of the Combat_Normal function.
Tip
The ComBat_seq function called within requires the input data to be in matrix form, not a dataframe. If users want to change the function settings, they should take note of this key point during the modification.
The function converts the result back to a dataframe for the output, making it convenient for users’ subsequent data analysis.
The batch and group parameters inside the function are vectors. Users can directly set these vectors, but they must ensure they are consistent with the order of the count samples.
Numbers 01 to 09 represent different types of tumor samples, and 10 to 19 represent different types of normal samples.
01 (primary solid tumors) and 11 (normal solid tissues) are the most common, while 06 represents metastasis.
It is generally recommended that users select 01, 06, and 11 during interaction. The decision to choose other numbers depends on the specific situation. Types with too few samples might contribute more noise than value to the overall data, making them less meaningful. They are not suitable for batch correction and are therefore not recommended for selection.
3.1 Different tumor types in TCGA
Please note: The TumorHistologicalTypes/NormalHistologicalTypes classifications are specific to the data in the TCGA database.
TumorHistologicalTypes
01 06 07
103 367 1
Found 2 batches
Using null model in ComBat-seq.
Adjusting for 0 covariate(s) or covariate level(s)
Estimating dispersions
Fitting the GLM model
Shrinkage off - using GLM estimates for parameters
Adjusting the data
NormalHistologicalTypes
11
113
Found 2 batches
Using null model in ComBat-seq.
Adjusting for 0 covariate(s) or covariate level(s)
Estimating dispersions
Fitting the GLM model
Shrinkage off - using GLM estimates for parameters
Adjusting the data
NormalHistologicalTypes
11
58
Found 2 batches
Using null model in ComBat-seq.
Adjusting for 0 covariate(s) or covariate level(s)
Estimating dispersions
Fitting the GLM model
Shrinkage off - using GLM estimates for parameters
Adjusting the data