J-Max coder, writer, problem solver:
    About     Archive     Feed     Resources

Python Imports

I currently use Spacemacs as a development environment. I open a .py file, then run-python to startup an interactive shell. In Spacemacs, the shell’s pwd is assigned to where run-python was executed. So, if it was executed in the .py file, the pwd would be the files subdirectory, rather than the projects root.

With data science development, csvs live in a data folder accessible from the top level of the directory. project_directory/data. If the Python file I am using is in project/src, if I wanted to read in a csv from the data folder I would use the relative path pd.read_csv('../data/target_csv'). This will only work if the interpreter’s pwd is the src directory.

To fix this problem, I included the following code at the top of my .py file:

import os  
  
cur_dir = os.getcwd()  
root = cur_dir.split('project_directory')[0]  

This retrieves the absolute file path to the project root, and assigns it to the root variable. Then, when loading csv, I concatenate root with the path from the top level of the directory, like so:

df = pd.read_csv(root+'project_directory/data/target.csv')

Cost versus Loss

It is helpful to me to establish conventional language. While working in machine learning, I’ve been fairly casual about my use of the terms ‘loss function’ and ‘cost function.’ This post is mostly aimed towards helping out myself in the future. Putting down the definition in writing will help my language precision.

I will follow the convention Andrew Ng uses in this Coursera lesson Logistic Regression Cost Function at 6:10.

The loss function measures how well you are doing on a single training example.

The cost function measures how well you are doing on the entire training set.