Altair SmartWorks Analytics

 

Modifying Pandas Data Frames Properly

The output data frame of many data prep operations may be considered a modified version of an input data frame. For example, in a Column Changes node, users can return a data frame with all of the columns in the input data frame, in addition to new columns produced during the prep operations. If the generated code for such a Column Changes node is inspected, you will notice an important line near the beginning of the code when using the Pandas engine:

 

In this line, the output data frame is initialized as an explicit copy of the input data frame. This explicit use of the .copy() method in Pandas is important because it avoids any modifications of the input data frame that Pandas may perform to optimize memory usage. Modifying the inputs to a code node can make it very difficult to understand your workflow, and so it is best to avoid such a situation by making explicit copies with the .copy() method.