2  UtilsFunction2

This section serves as the helper function for the AutogluonSelectML and AutogluonTimeLimit function.

2.1 splitdata.py

Reads the gene expression and class data, processes it, and splits it into training and testing sets.

2.1.1 Parameters

  • gene_data_path (str):
    • Path to the CSV file containing the gene expression data.
    • For example: ‘../data/gene_tpm.csv’
  • class_data_path (str):
    • Path to the CSV file containing the class data.
    • For example: ‘../data/tumor_class.csv’
  • class_name (str):
    • The name of the class column in the class data.
  • test_size (float, optional):
    • The proportion of the data to be used as the testing set.
    • Default is 0.2.
  • random_state (int, optional):
    • The seed used by the random number generator.
    • Default is 42.
  • threshold (float, optional):
    • The threshold used to filter out rows based on the proportion of non-zero values.
    • Default is 0.9.
  • random_feature (int, optional):
    • The number of random feature to select. If None, no random feature selection is performed.
    • Default is None.

2.1.2 Returns

  • train_data (pd.DataFrame):
    • The training data.
  • test_data (pd.DataFrame):
    • The testing data.

2.1.3 Usage

split_data(
    gene_data_path='../data/gene_tpm.csv', 
    class_data_path='../data/tumor_class.csv', 
    class_name, 
    test_size=0.2, 
    random_state=42, 
    threshold=0.9, 
    random_feature=None
    )

2.2 LogTransform.py

Evaluate and potentially apply log2 transformation to data. - This function checks data against a set of criteria to determine if a log2 transformation is needed, applying the transformation if necessary.

2.2.1 Parameters

  • data (np.ndarray):
    • A numerical numpy array.

2.2.2 Returns

  • result np.ndarray
    • The original data or the data transformed with log2.

2.2.3 Usage

log_transform(
    data
    )