How to Retrieve a Column from a Pandas DataFrame Object in Python



Python


In this article, we show how to retrieve a column from a pandas DataFrame object in Python.

A dataframe object is an object composed of a number of pandas series.

A pandas series is a labeled list of data.

A dataframe object is an object made up of a number of series objects.

A dataframe object is most similar to a table. It is composed of rows and columns.

In this article, we will show how to retrieve a column or multiple columns from a pandas DataFrame object in Python.

This is a form of data selection.

At times, you may not want to return the entire pandas DataFrame object. You may just want to return 1 or 2 or 3 columns or so.

You can retrieve a column in a pandas DataFrame object by using the DataFrame object name, followed by the label of the column name in brackets.

So if the DataFrame object name is dataframe1 and the column we are trying to retrieve the 'X' column, then we retrieve the column using the statement, dataframe1['X']

In the following code shown below, we show how to retrieve each individual column of a DataFrame object in Python.



So let's now go over the code.

So we first have to import the pandas module. We do this with the line, import pandas as pd.

as pd means that we can reference the pandas module with pd instead of writing out the full pandas each time.

We import rand from numpy.random, so that we can populate the DataFrame with random values. In other words, we won't need to manually create the values in the table. The randn function will populate it with random values.

We create a variable, dataframe1, which we set equal to, pd.DataFrame(randn(4,3),['A','B','C','D',],['X','Y','Z'])

This creates a DataFrame object with 4 rows and 3 columns.

The rows are 'A', 'B', 'C', and 'D'.

The columns are 'X', 'Y', and 'Z'.

After we output the dataframe1 object, we get the DataFrame object with all the rows and columns, which you can see above.

We then reference each of the individual columns.

The first column of the dataframe object is the X column. We reference this column using the statement, dataframe1['X']

The second column is the Y column. We refrence this column using the statement, dataframe1['Y']

The third column is the Z column. We reference this column using the statement, dataframe1['Z']

Next, we want to reference multiple columns.

To reference multiple columns, you put the labels of each of the columns within double brackets, with each of the labels separated by commas.

To reference both the X and Y columns, we use the statement, dataframe1[['X','Y']]

To reference the X and Z columns, we use the statement, dataframe1[['X','Z']]

So this is how you can reference a column or multiple columns of a Python DataFrame object in Python.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...