stark_qa.tools

stark_qa.tools.api

stark_qa.tools.api.complete_texts_claude(inputs, **kwargs)
stark_qa.tools.api.complete_texts_hf(inputs, **kwargs)
stark_qa.tools.api.get_gpt_outputs(inputs, **kwargs)
stark_qa.tools.api.get_llm_output(message, model='gpt-4-0125-preview', max_tokens=2048, temperature=1, json_object=False)[source]

A general function to complete a prompt using the specified model.

Parameters:
  • message (str or list) – The input message or a list of message dicts.

  • model (str) – The model to use for completion.

  • max_tokens (int) – Maximum number of tokens to generate.

  • temperature (float) – Sampling temperature.

  • json_object (bool) – Whether to output in JSON format.

Returns:

The completed text generated by the model.

Return type:

str

Raises:

ValueError – If the model is not recognized.

stark_qa.tools.api.get_llm_outputs(inputs, **kwargs)
stark_qa.tools.api.parallel_func(func, n_max_nodes=5)[source]

A general function to call a function on a list of inputs in parallel.

Parameters:
  • func (callable) – The function to apply.

  • n_max_nodes (int) – Maximum number of parallel processes.

Returns:

A wrapper function that applies func in parallel.

Return type:

callable

stark_qa.tools.args

stark_qa.tools.args.load_args(args_dict)[source]

Convert a dictionary into an argparse.Namespace object.

Parameters:

args_dict (dict) – Dictionary of arguments to be converted.

Returns:

Namespace object with the arguments.

Return type:

argparse.Namespace

stark_qa.tools.args.merge_args(args_1, args_2)[source]

Merge two argparse.Namespace objects. Arguments from args_2 have higher priority.

Parameters:
  • args_1 (argparse.Namespace) – First namespace object.

  • args_2 (argparse.Namespace) – Second namespace object.

Returns:

Merged namespace object.

Return type:

argparse.Namespace

stark_qa.tools.download_hf

stark_qa.tools.download_hf.download_hf_file(repo, file, repo_type='dataset', save_as_file=None)[source]

Downloads a file from a Hugging Face repository and saves it to the specified path.

Parameters:
  • repo (str) – The repository name.

  • file (str) – The file path within the repository to download.

  • repo_type (str) – The type of the repository (e.g., ‘dataset’).

  • save_as_file (str, optional) – The local file path to save the downloaded file. If not provided, saves the file in the current directory with the same name as the original file.

stark_qa.tools.download_hf.download_hf_folder(repo, folder, repo_type='dataset', save_as_folder=None)[source]

Downloads a folder from a Hugging Face repository and saves it to the specified directory.

Parameters:
  • repo (str) – The repository name.

  • folder (str) – The folder path within the repository to download.

  • repo_type (str) – The type of the repository (e.g., ‘dataset’).

  • save_as_folder (str, optional) – The local directory to save the downloaded folder. Defaults to “data/”.

stark_qa.tools.graph

stark_qa.tools.graph.k_hop_subgraph(node_idx, num_hops, edge_index, relabel_nodes=False, num_nodes=None, flow='source_to_target', directed=False)[source]

Extracts the k-hop subgraph around a given node or a list of nodes.

Parameters:
  • node_idx (Union[int, List[int], Tensor]) – The central node or a list of central nodes.

  • num_hops (int) – The number of hops to consider.

  • edge_index (Tensor) – The edge indices of the graph.

  • relabel_nodes (bool, optional) – If True, the nodes will be relabeled to a contiguous range. Defaults to False.

  • num_nodes (Optional[int], optional) – The number of nodes in the graph. Defaults to None.

  • flow (str, optional) – The flow direction (‘source_to_target’, ‘target_to_source’, ‘bidirectional’). Defaults to ‘source_to_target’.

  • directed (bool, optional) – If True, the graph is treated as directed. Defaults to False.

Returns:

The node indices, the edge indices, the indices of the original nodes, and the edge mask.

Return type:

Tuple[Tensor, Tensor, Tensor, Tensor]

stark_qa.tools.graph.relabel_graph(subset, edge_index, num_nodes)[source]

Relabels the nodes in the graph to a contiguous range.

Parameters:
  • subset (Tensor) – The subset of nodes.

  • edge_index (Tensor) – The edge indices of the graph.

  • num_nodes (int) – The number of nodes in the graph.

Returns:

The relabeled edge indices.

Return type:

Tensor

stark_qa.tools.io

stark_qa.tools.io.load_files(save_path)[source]

Load all files from a specified directory.

Parameters:

save_path (str) – Directory to load the files from.

Returns:

Dictionary with filenames (without extension) as keys and file contents as values.

Return type:

dict

stark_qa.tools.io.read_from_file(file_path)[source]

Read content from a file based on its extension.

Parameters:

file_path (str) – Path to the file.

Returns:

Content of the file.

Return type:

content

Raises:

NotImplementedError – If the file type is not supported.

stark_qa.tools.io.save_files(save_path, **kwargs)[source]

Save multiple files in a specified directory.

Parameters:
  • save_path (str) – Directory to save the files.

  • **kwargs – Keyword arguments where keys are filenames (without extension) and values are the contents.

stark_qa.tools.io.write_to_file(file_path, content)[source]

Write content to a file based on its extension.

Parameters:
  • file_path (str) – Path to the file.

  • content – Content to write.

Raises:

NotImplementedError – If the file type is not supported.

stark_qa.tools.node

class stark_qa.tools.node.Node[source]

Bases: object

stark_qa.tools.node.df_row_to_dict(row, column_names=None)[source]

Convert a row of a DataFrame to a dictionary.

Parameters:
  • row (pandas.Series) – A row of a DataFrame.

  • column_names (list, optional) – The list of column names. Defaults to None.

Returns:

A dictionary that contains the same information as the row.

Return type:

dict

stark_qa.tools.node.dict_tree(dictionary, indent=0)[source]

Create a visual tree representation of a dictionary.

Parameters:
  • dictionary (dict) – The dictionary to represent as a tree.

  • indent (int) – The current indentation level.

Returns:

A string representing the dictionary as a tree.

Return type:

str

stark_qa.tools.node.register_node(node, dictionary)[source]

Register a dictionary into a Node object.

Parameters:
  • node (Node) – The node to register the dictionary to.

  • dictionary (dict) – The dictionary to register.

stark_qa.tools.process_text

stark_qa.tools.process_text.chunk_text(text, chunk_size)[source]

Split text into chunks of specified size.

Parameters:
  • text (str) – Input text to be chunked.

  • chunk_size (int) – Size of each chunk.

Returns:

List of text chunks.

Return type:

list

stark_qa.tools.process_text.clean_data(item)[source]

Clean the text data.

Parameters:

item (Union[str, list, dict]) – An object that contains text data which is cleaned iteratively.

Returns:

The cleaned data in the same format as item.

stark_qa.tools.process_text.clean_dict(dictionary, remove_values=['', 'nan'])[source]

Clean the dictionary by removing specific values.

Parameters:
  • dictionary (dict) – A dictionary to be cleaned.

  • remove_values (list) – List of values to remove from the dictionary.

Returns:

Cleaned dictionary.

Return type:

dict

stark_qa.tools.process_text.compact_text(text)[source]

Compact the text by removing unnecessary spaces and punctuation issues.

Parameters:

text (str) – Input text to be compacted.

Returns:

Compacted text.

Return type:

str

stark_qa.tools.process_text.decode_escapes(s)[source]

Decode escape sequences in a string.

Parameters:

s (str) – Input string with escape sequences.

Returns:

Decoded string.

Return type:

str

stark_qa.tools.process_text.exact_match_score(prediction, ground_truth)[source]

Calculate the exact match score between prediction and ground truth.

Parameters:
  • prediction (str) – Predicted text.

  • ground_truth (str) – Ground truth text.

Returns:

Exact match score.

Return type:

float

stark_qa.tools.process_text.f1_score(prediction, ground_truth)[source]

Calculate the F1 score between prediction and ground truth.

Parameters:
  • prediction (str) – Predicted text.

  • ground_truth (str) – Ground truth text.

Returns:

F1 score.

Return type:

float

stark_qa.tools.process_text.normalize_answer(s)[source]

Normalize text by removing punctuation, articles and extra whitespace, and lowercasing the text.

Parameters:

s (str) – Input text to be normalized.

Returns:

Normalized text.

Return type:

str

stark_qa.tools.process_text.pluralize(singular)[source]

Return the plural form of a given lowercase singular word (English only).

Parameters:

singular (str) – Singular word.

Returns:

Plural form of the word.

Return type:

str

stark_qa.tools.process_text.recall_score(prediction, ground_truth)[source]

Calculate the recall score between prediction and ground truth.

Parameters:
  • prediction (str) – Predicted text.

  • ground_truth (str) – Ground truth text.

Returns:

Recall score.

Return type:

float

stark_qa.tools.process_text.remove_punctuation(text)[source]

Remove all punctuation from the given text.

Parameters:

text (str) – Input text from which punctuation will be removed.

Returns:

Text without punctuation.

Return type:

str

stark_qa.tools.process_text.synonym_extractor(phrase)[source]

Extract synonyms for a given phrase using WordNet.

Parameters:

phrase (str) – Input phrase to find synonyms for.

Returns:

List of synonyms.

Return type:

list

stark_qa.tools.seed

stark_qa.tools.seed.set_seed(seed)[source]

Sets seed