Introduction: MATLAB: Analyzing Biochemical Data Extracted From Tissue Engineered Cartilage Constructs

Background

Cartilage in the diarthrodial joints provides a smooth, frictionless surface that effectively cushions and distributes locomotion forces during movement. After trauma or disease, however; articular cartilage does not present the ability to regenerate itself like other musculoskeletal tissues. To better understand the healing processes of cartilage, engineers are tissue-engineering articular cartilage constructs using scaffold-less techniques and performing in-vitro and in-vivo assessments. The function of cartilage has a direct effect on its biochemical composition. Therefore, one main goal is to study the biochemical properties of healthy and diseased native articular cartilage and to compare the results to the biochemical properties of the engineered constructs. After performing biochemical assays, engineers are presented with a variety of raw data from the different tests and groups.

Objective

The goal of this project will be to compact and analyze the raw data from the biochemical assays performed on the cartilage constructs. The biochemical tests will analyze the biochemistry of tissue-engineered constructs by collecting data regarding the presence of DNA, Collagen II, hydration, wet weight, and Glycosaminoglycans. Our project will use Matlab to gather and separate data for each component and experimental group. Once this project is complete, you will be able to analyze a given set of raw data, organize variables, perform statistical tests, find methods to present as well as visualize data and make several conclusions about the results you find.

Step 1: Understanding the Raw Data

For this project, we are working with an Excel spreadsheet of raw data provided by Professor Athanasiou's laboratory, DELTAi at the University of California, Irvine. This spreadsheet contains three different sheets of data: GAG, Collagen, and DNA. Each sheet represents a different biochemical test, which we will evaluate.

There are 4 different values that are measured for each biochemical test. We will be working with these values throughout the project.

Wet Weight (WW)

The weight of the construct, with water included, measured in mg.

Dry Weight (DW)

The dry weight is the weight of the construct but, as the name suggests, it contains no water because the constructs have been lyophilized.

Volume of Papain added

Papain, also known as papaya proteinase, is used to digest the cell membrane, which allows researchers to access the contents of a cell. Varying amounts are added to the constructs.

Fun fact: this is actually extracted from papayas! Our question for you is, how did someone ever think to use papayas for tissue engineering?

Step 2: Loading the Data

So before we do any data analysis, we need to make sure the data we work with is accessible. How do we do this? Simple! We only need about 5 lines of code! But before we do that, please make sure the file shows up in your Matlab directory.

xlsData = importdata('Raw Data Only.xlsx');
<br>GAG_Data = xlsData.data.GAG;
Collagen_Data = xlsData.data.Collagen; 
DNA_Data = xlsData.data.DNA; 

Let's go through this line by line:

  1. First line:This uses theimportdatafunction to take the file and upload it as a structure we can work with in Matlab. We assigned the variable name as "xlsData" but the name can be changed to whatever you prefer
  2. Second line: loads the first sheet 'GAG' in 'Raw Data Only.xlsx' to workspace
  3. Third line:loads the 2nd sheet 'Collagen' in 'Raw Data Only.xlsx' to workspace
  4. Fourth line:loads the 3rd sheet 'DNA' in 'Raw Data Only.xlsx' to workspace

Here's the general format our code has followed so far:

variable name = workbookvariable.data.sheetname

Step 3: Assigning Variables

Now that we have data to work with, we need to extract the specific columns of data we want to work with from the spreadsheet. We will be programming one sheet at a time, so we will show you how to do the first one and you can follow the same steps for the remaining two.

Below are the variables we assigned for the first sheet, GAG:

WWGAG = GAG_Data(:,1);
DWGAG= GAG_Data(:,2);
volumePapainGAG = GAG_Data(:,3);
measuredGAG = GAG_Data(:,4);<br>

***When programming, the variables don't have to be the same as ours, but they should be descriptive enough so you know that you're working with.

Line by line description of code:

Note: The colon (:) simply means that we are selecting a column

  1. WWGAG is the variable representing first column of GAG_Data
  2. DWGAG is the variable representing the 2nd column of GAG_Data
  3. volumePapainGAG is the variable representing the 3rd column of GAG_Data
  4. measuredGAG is the variable representing the 4th column of GAG_Data

Step 4: Performing Calculations

Now that you've assigned your variables. We will perform some calculations on the first sheet of data. You'll notice that calculations identical to this happen on the other sheets as well - just with different variables/values. The calculations are listed below:

Note: The calculations for DNA differ

Total GAG

totalGAG = (measuredGAG.*volumePapainGAG).*(1000/8);

GAG over Dry Weight

GAGODW = totalGAG./(DWGAG*1000);

Percent GAG over Dry Weight

pGAGODW = GAGODW.*100;

Percent GAG over Wet Weight

pGAGOWW = 100*totalGAG./(WWGAG.*1000);

Step 5: Putting Calculated Values in a Matrix

Now that we have all of our calculated values. We want to put everything we found into one matrix, for ease of accessibility. After that, we are going to assign our rows based off the original data we were working with. Each group of data was extracted from a certain condition. The conditions are:

Here is the code that puts the calculated values in a matrix

GAGMatrix = [totalGAG,GAGODW,pGAGODW,pGAGOWW]; 

Description:

We first assign a new matrix variable, GAGMatrix. Next, we put our calculated variables in the order we would want to see them in our new matrix. For this project we have it in the following order: total GAG, GAG over Dry Weight, Percent GAG over Dry Weight, and Percent GAG over Wet Weight.

Step 6: Further Separating the Values in the New Matrix

As mentioned in the last step, we want to split our data up further by assigning a group of values to a specific time the experiment was conducted. We also got this data from the spreadsheet, so just assume it's already given.

This code might seem intimidating at first, but it works the same way as everything else we've done so far.

GAGControl4W = GAGMatrix(1:3,:); 
GAGShear4W = GAGMatrix(4:6,:); 
GAGGF4W = GAGMatrix(7:9,:); 
GAGShearGF4W = GAGMatrix(10:12,:);
GAGControl8WVitro = GAGMatrix(13:15,:); 
GAGShear8WVitro = GAGMatrix(16:18,:);
GAGShearGF8WVitro = GAGMatrix(19:21,:); 
GAGControl8WVivo = GAGMatrix(22:24,:); 
GAGShear8WVivo = GAGMatrix(25:27,:); 
GAGShearGF8WVivo = GAGMatrix(28:30,:); 

Line-by-line Description:

  1. GAGControl4W is assigned to the first 3 rows of GAGMatrix
  2. GAGShear4W is assigned to the rows 4-6 of GAGMatrix
  3. GAGGF4W is assigned to the rows 7-9 of GAGMatrix

  4. GAGShearGF4W is assigned to the rows 10-12 of GAGMatrix

Step 7: Statistics - Mean, Standard Deviation, Min, and Max

Now, we can finally work with something other than variable declaration! We are now going to calculate the mean, standard deviation, maximum, and minimum of our calculated values.

Since there's so much data to work with, we are going to use a while loop to speed up the process. Here's a general overview of how the while loop works:

  • Creates another matrix for the calculated statistical data. (Ex. the standard deviation of the total GAG would be in column 1)
  • During each iteration of the loop, a row from the selected column of data is calculated. Each iteration will perform the calculation one row at a time. Since there are 28 rows, the loop will run while "i", the number of iterations is less than or equal to 28.
i = 1; 
a = 1; 
GAG_stats = zeros(10,8);
while i <= 28
	GAG_stats(a,1) = std(GAGMatrix(i:i+2,1)); 
	GAG_stats(a,2) = mean(GAGMatrix(i:i+2,1)); 
	GAG_stats(a,3) = max(GAGMatrix(i:i+2,1)); 
	GAG_stats(a,4) = min(GAGMatrix(i:i+2,1)); 

	GAG_stats(a,5) = std(GAGMatrix(i:i+2,2)); 
	GAG_stats(a,6) = mean(GAGMatrix(i:i+2,2)); 
	GAG_stats(a,7) = max(GAGMatrix(i:i+2,2)); 
	GAG_stats(a,8) = min(GAGMatrix(i:i+2,2)); 
	a = a+1; 
	i = i+3; 
end 

Line-by-line Description

  1. i is the variable for our iteration number. Before we start, we want to set the iteration number to 1.
  2. a represents the row numbers of the new matrix we are trying to create. We want to update the first column and keep moving to the right
  3. Creates a 10 x 8 matrix of zeros. We use the zeros as a placeholder value so that it's much easier to update it with the calculated values we get in the while loop
  4. Begins the while loop. Essentially states: the operations within the loop will keep running while the number of iterations is less than or equal to 28
  5. Assigns row "a" of column 1 to the standard deviation of i:i+2 rows from the Total GAG. Example: during the first loop, rows 1-3 of column 1 would be calculated
  6. Assigns row "a" of column 2 as the mean of i:i+2 rows from the Total GAG
  7. Assigns row "a" of column 3 as the maximum of i:i+2 rows from the Total GAG
  8. Assigns row "a" of column 4 as the minimum of i:i+2 rows from the Total GAG

Lines 9-12 are programmed in the same fashion, but instead calculate the mean, standard deviation, minimum, and maximum of the GAG over Dry Weight data.

13. This line ensures that the "a" values update with each iteration. "a" needs to increase by one each time to keep going through each row

14. "i" needs to increase by 3 because it is calculating a group of data.

Step 8: Boxplots

Now that we have statistical data to work with. We can begin to visualize our data! Visualization is especially helpful in making comparisons and connections between different groups. One method that is especially helpful is the use of boxplots. Boxplots allow us to visualize the max, min, median, 25 percentile, and 75 percentile.

By using the boxplot, we can make comparisons between specific times experimental data was collected to see if any correlations exist. For the following piece of code we are plotting the percent GAG over Dry Weight.

figure(3)

boxplot([GAGControl4W(:,3),GAGShear4W(:,3),GAGGF4W(:,3),GAGShearGF4W(:,3),GAGControl8WVitro(:,3),...
    GAGShear8WVitro(:,3),GAGShearGF8WVitro(:,3),GAGControl8WVivo(:,3),GAGShear8WVivo(:,3),GAGShearGF8WVivo(:,3)],...
    'Labels',{'Control 4W','Shear 4W','GF 4W','Shear+GF 4W','Control 8W Vitro','Shear 8W Vitro','Shear+GF 8w Vitro','Control 8W Vivo','Shear 8W Vivo',...
    'Shear+GF 8W Vivo'}) 
ylabel('%GAG/Dry Weight','fontweight','bold') 
xlabel('Independent Variables','fontweight','bold')
title('Boxplot for %GAG/DW of Different Independent Variables')

Line-by-line Description

  1. Creates a new figure window.
  2. Creates a boxplot with all the data of percent GAG over dry weight. All of the variables we selected just need to be separated by commas. Since this line of code is so long we used ellipses "..." to show we are continuing the rest of the line right below.
  3. The code after the ellipses will label each set of data in the box plot. The names are assigned to the variable in the same order they appear.
  4. To make the graph readable we labeled the Y-Axis "GAG/Dry Weight" and made it bold to increase readability - this is not required
  5. Labeled the X-Axis "Independent Variables" and also made it bold
  6. Titled the graph "Boxplot for %GAG/Dry Weight

We will repeat the steps mentioned above to create a box plot for percent GAG over Wet weight. The only thing that differs, aside from the variables selected.

figure(4)

boxplot([GAGControl4W(:,4),GAGShear4W(:,4),GAGGF4W(:,4),GAGShearGF4W(:,4),GAGControl8WVitro(:,4),...
    GAGShear8WVitro(:,4),GAGShearGF8WVitro(:,4),GAGControl8WVivo(:,4),GAGShear8WVivo(:,4),GAGShearGF8WVivo(:,4)],...
    'Labels',{'Control 4W','Shear 4W','GF 4W','Shear+GF 4W','Control 8W Vitro','Shear 8W Vitro','Shear+GF 8w Vitro','Control 8W Vivo','Shear 8W Vivo',...
    'Shear+GF 8W Vivo'}) 
ylabel('%GAG/Wet Weight','fontweight','bold') 
xlabel('Independent Variables','fontweight','bold')
title('Boxplot for %GAG/WW of Different Independent Variables')

Step 9: Hypothesis Testing & Conclusions

At this point in the project, we have calculated statistical data and lots of plots. Now, we have to understand what it all means in the context of this project. To accomplish this, we will use the ANOVA (Analysis of Means) statistical test, which is ideal for this specific project because it allows us to analyze the equality between multiple means and the data we have been working with is already normalized.

Performing ANOVA is fairly simple in MATLAB; however, we can't start until we have created hypotheses! We need to create a null and alternative hypothesis. For this project, we have selected the following hypothesis:

  • Null hypothesis: The groups have the same concentration of GAG.
  • Alternative hypothesis: The groups have different concentrations of GAG

To begin the ANOVA test, we need to create a matrix with the variables that we intend to test. Our matrix will look like this in Matlab:

A = [GAGControl4W, GAGShear4W, GAGGF4W, GAGShearGF4W];

The easy part is running the test itself. It only takes 1 line:

[p, table, stats] = anova1(A);

Now we want to print our results with our hypothesis in mind. To do this, create an if statement based off the p-values:

if p > 0.05 % Not enough evidence to reject null hypothesis 
	fprintf('p = %f.Because p is greater than 0.05, there insufficient evidence to reject null hypothesis.\n',p);
elseif p <= 0.05 %Sufficient evidence to reject null hypothesis 
	fprintf('p = %f.\n Because p is less than 0.05, there is sufficient evidence to reject null hypothesis.\n',p); 
end

If the p-value is greater than 0.05, there is not enough evidence to reject the null hypothesis.

If the p-value is greater than or equal to 0.05, there is sufficient evidence to reject the null hypothesis

Step 10: Conclusions

In order to create more conclusions about the data and learn more about it. We will run through some comparisons. These will be in the format of an if-else statement. Since we have multiple options we would like to run through, we specifically use elseif statements.

Comparison of Collagen between groups @ 4 weeks

if (GAGControl4W_mean <= GAGShear4W_mean && GAGControl4W_mean < GAGGF4W_mean && GAGControl4W_mean < GAGShearGF4W)<br>	fprintf('\n\nAt 4 weeks, Control group has a smaller mean concentration of GAG than Shear Group and Shear+GF Group. \nTherefore, shear and growth factors are not effective in reducing GAG.') 
elseif (GAGControl4W_mean >= GAGShearGF4W_mean) 
	fprintf('\n\nAt 4 weeks, Control group has a higher mean concentration of GAG than the combination Shear+ GF group.\nTherefore, the combination of Shear and Growth Factors were effective in reducing GAG in the construct.')
elseif (GAGControl4W_mean >= GAGShear4W_mean) 
	fprintf('\n\nAt weeks, Control group has a higher mean concentration of GAG than the Shear group.\nTherefore, Shear was effective in reducing GAG in the construct.') 
elseif(GAGControl4W_mean >= GAGGF4W_mean) 
	fprintf('\n\nAt 4 weeks, Control group has a higher mean concentration of GAG than the Growth Factor group.\nTherefore, Growth Factors were effective in reducing GAG in the construct.') 
else 
	fprintf('The comparison of GAG between the groups at 4 weeks was inconclusive') 
end

Comparison of Collagen between groups @ 8 weeks in vitro:

%The comparison of Collagen between the groups at 8 weeks in vitro

if (GAGControl8WVitro_mean <= GAGShear8WVitro_mean && GAGControl8WVitro_mean < GAGShearGF8WVitro) 
	fprintf('\n\nAt 8 weeks, Control group in vitro has a smaller mean concentration of GAG than Shear Group and Shear+GF Group in vitro. \nTherefore, shear and growth factors are not effective in reducing GAG.') 
elseif (GAGControl8WVitro_mean >= GAGShearGF8WVitro_mean) 
	fprintf('\n\nAt 8 weeks, Control group in vitro has a higher mean concentration of GAG than the combination Shear+ GF group in vitro.\nTherefore, the combination of Shear and Growth Factors were effective in reducing GAG in the construct.') 
elseif (GAGControl8WVitro_mean >= GAGShear8WVitro_mean) fprintf('\n\nAt 8 weeks, Control group in vitro has a higher in vitro mean concentration of GAG than the Shear group in vitro.\nTherefore, Shear was effective in reducing GAG in the construct.') 
else 
	fprintf('The comparison of GAG between the groups at 8 weeks in vitro was inconclusive')
end

Comparison of Collagen between the groups at 8 weeks in vivo

if (GAGControl8WVivo_mean <= GAGShear8WVivo_mean && GAGControl8WVivo_mean < GAGShearGF8WVivo) 
	fprintf('\n\nAt 8 weeks, Control group in vivo has a smaller mean concentration of GAG than Shear Group and Shear+GF Group in vitro. \nTherefore, shear and growth factors are not effective in reducing GAG.') 
elseif (GAGControl8WVivo_mean >= GAGShearGF8WVivo_mean) 
	fprintf('\n\nAt 8 weeks, Control group in vivo has a higher mean concentration of GAG than the combination Shear+ GF group in vitro.\nTherefore, the combination of Shear and Growth Factors were effective in reducing GAG in the construct.') 
elseif (GAGControl8WVivo_mean >= GAGShear8WVivo_mean) 
	fprintf('\n\nAt 8 weeks, Control group in vivo has a higher mean concentration of GAG than the Shear group in vitro.\nTherefore, Shear was effective in reducing GAG in the construct.')
else 
	fprintf('The comparison of GAG between the groups at 8 weeks in vivo was inconclusive') 
en

Step 11: Repeat!

Now that we have performed all the necessary calculations for our first sheet, GAG. Repeat the steps for the remaining two sheets. Here is the completed code for reference.