How to Retrieve a Row from a Pandas DataFrame Object in Python



Python


In this article, we show how to retrieve a row from a pandas DataFrame object in Python.

A dataframe object is an object composed of a number of pandas series.

A pandas series is a labeled list of data.

A dataframe object is an object made up of a number of series objects.

A dataframe object is most similar to a table. It is composed of rows and columns.

In this article, we will show how to retrieve a row or multiple rows from a pandas DataFrame object in Python.

This is a form of data selection.

At times, you may not want to return the entire pandas DataFrame object. You may just want to return 1 or 2 or 3 rows or so.

So there are 2 ways that you can retrieve a row from a pandas dataframe object. One way is by label-based locations using the loc() function and the other way is by index-based locations using the iloc() function.

Both will be explained in-depth.

In the following code below, we show how to retrieve a row from a pandas dataframe object using label-based locations with the loc() function.

Label-based Locations using the loc() Function

So one way to retrieve a row is through label-based locations.

When you create a dataframe object in Pythonn, normally you specify labels for the columns and for the rows.

So say for example, we create a dataframe object with columns, 'X', 'Y', 'Z' and rows, 'A', 'B', 'C', 'D'.

The labels for the rows are 'A', 'B', 'C', 'D'

So we can use these labels to retrieve a row or rows from a pandas dataframe.

How we do this is we use the pandas dataframe name followed by a dot and the loc() function. Inside of the loc function, we place the label of the row we want to retrieve. So if we want to retrieve the row with a label of 'A' from the dataframe1 pandas dataframe object, we use the following statement, dataframe1.loc['A']

This is shown in the following code below.



So let's now go over the code.

So we first have to import the pandas module. We do this with the line, import pandas as pd.

as pd means that we can reference the pandas module with pd instead of writing out the full pandas each time.

We import rand from numpy.random, so that we can populate the DataFrame with random values. In other words, we won't need to manually create the values in the table. The randn function will populate it with random values.

We create a variable, dataframe1, which we set equal to, pd.DataFrame(randn(4,3),['A','B','C','D',],['X','Y','Z'])

This creates a DataFrame object with 4 rows and 3 columns.

The rows are 'A', 'B', 'C', and 'D'.

The columns are 'X', 'Y', and 'Z'.

After we output the dataframe1 object, we get the DataFrame object with all the rows and columns, which you can see above.

We then reference each of the individual rows.

The first row of the dataframe object is the A row. We reference this row using the statement, dataframe1.loc['A']

The second row is the B row. We reference this row using the statement, dataframe1.loc['B']

The third row is the C row. We reference this row using the statement, dataframe1['C']

The fourth row is the D row. We reference this row using the statement, dataframe1['D']

Next, we want to reference multiple rows.

To reference multiple rows, you put the labels of each of the columns within double brackets, with each of the labels separated by commas.

To reference both the A and B rows, we use the statement, dataframe1[['A','B']]

To reference the C and D rows, we use the statement, dataframe1[['C','D']

Index-based Locations using the iloc() Function

Another way to retrieve a row is through index-based locations using the iloc() function.

Let's say you create a dataframe object whose rows are labeled, 'A', 'B', 'C', and 'D'

With index-based locations, you ignore these labels.

Instead, you reference each row with an index. In Python, indexs always begin at the location of 0.

So the first row, row A, can be referenced using the statement, dataframe1.iloc[0]

The second row, row B, can be referenced using the statement, dataframe1.iloc[1]

The third row, row C, can be referenced using the statement, dataframe1.iloc[2]

The fourth row, row D, can be referenced using the statement, dataframe1.iloc[3]

This is shown in the following code below.





So now we retrieve rows using index-based locations.

Instead of using the label of the row to retrieve the contents of the rows, we use indexes, utilizing the fact that the first row will have an index of 0, the second row will have an index of 1, the third row will have an index of 2, and so on...

If we want to retrieve multiple rows using index-based locations, we put these indexs within double brackets within the iloc() function.

And this is how we can retrieve a row or rows from a pandas dataframe object in Python.


Related Resources

How to Randomly Select From or Shuffle a List in Python



HTML Comment Box is loading comments...