Introduction: Gephi With Twitter Data
Gephi allows the users to study social media networks. There are many different reasons a user might want to use this data. The data can be used in a personal way to study how their friends are connected to each other, or who follows whose tweets the most. It can also be used in a business sense. Gehpi can be used to see where to post certain data to get the biggest user response. This tutorial will get the user a look into how to use Gephi in a very basic way for twitter.
A computer with Windows 8 or 10, and the internet are the only things needed. A Twitter account is also needed, however a new one can also be created. Since these instructions focus on hashtags from Twitter and not who is following who, using a new account will not affect how much data the user acquires.
This process should take around 20 minutes.
Users need to have a basic understanding of how to use Windows for these instructions.
Teachers! Did you use this instructable in your classroom?
Add a Teacher Note to share how you incorporated it into your lesson.
Step 1: Download the Gephi
Click Download Gephi for windows or the other links below this button if you have a Mac or Linux computer. How to do this is shown with the picture on top.
Continue through the download process. When you get to the “select additional tasks” make sure to click the boxes for .gephi files, .gexfi files .gdf. What boxes to check are shown in the bottom box.
Step 2: Download Data From Twitter
Note: Using Node XL to retrieve data can only be downloaded by searching for certain hashtags. You will have to pay for Node XL pro to search by user on twitter
1. Download NodeXL.
Download NodeXL from https://nodexl.codeplex.com/.
Go through the download process. Figure one shows where to find this link at.
2. Open NodeXL
If you have Windows 10 open the windows icon. Scroll through your apps to the letter N. Look for NodeXL Excel Template. Figure 2, shows where to find Node XL at.
Search in your toolbar for NodeXL. Click on NodeXL Excel Template to open the product.
3. Download the data
Click on the tab, NodeXL, on top of the toolbar of Excel.
- On the very left on this tab, click on Import.
- Click on import from Twitter’s search network.
- Enter the hashtag you wish to search for on the twitter feed, in the box under Search for tweets.
- Use “basic network” instead of basic network plus friends when playing around with it under “What to import”.
- You should also keep the limit of tweets to 2000 tweets.
- Unless you need them, do not expand URL’s in tweets. These recommendations are just to make the process faster.
The correct buttons to select are shown in figure 3.
You will also have to authorize your Twitter account the first time you do this. Click on “I have a twitter account, but I have not yet authorized NodeXL to use my account”.
When you click "OK", this will take you to twitter.com and allow you to login or create a new account.After you login into Twitter, authorize NodeXL to use your account. You will get a code from Twitter. This is shown by figure 4.
After you get this pin, go back to NodeXL and enter in the box that states “PIN from the Twitter authorization Web page”.
You have now authorized your account to be used by NodeXL to gather data.
During this process, it may take some time due to Twitter having a cap of 15 tweets downloaded per minute.
After the data is downloaded this data, NodeXL will return you to all the usernames who have either tweeted this data or have been mentioned in the tweets.
4. Export the Data
To use this data in Gephi, you need to export in a format that Gephi can read. To do this in the NodeXL tab click on export, this is right under import. Next click on “To GraphMl File”. This is the format that Gephi reads the data.
It is shown in figure 5, what type of data needs to be exported.
This will transform the data into a useable format for Gephi. Save this file where you would like to and name it whatever you want to. You need to remember where this file is located..
Step 3: Open Gephi
Search for Gephi in your Window’s Button. To open the data, open the GraphML format wherever you saved it.
Step 4: Graph Definitions
Before going any further, you are going to need some basic graph theory terminology.
Node- A node in this case represents the user on Twitter or Facebook, but in a more general Graph theory a node is a point in a graph. A node is circled in orange in figure one.
Edge – An edge connects to nodes together. It is shown by the blue box in figure 1.
Directed graph –It is a graph where all edges are directed from one node to another. Twitter for example is a directed graph. In twitter a user would follow certain users, but they do not have to follow them back. This makes who follows who a directed graph. To the left is an example of a directed graph.
Undirected graph – It is a graph where the edges points both ways to both nodes. An example of this is Facebook friends. Everyone is a friend with whoever is a friend with them. Figure 1 is an example of an undirected graph.
Degree- how many nodes the current node connects to. The Degree of Node A in figure 1 is 3. The Degree of Node C in figure 1 is 2.
InDegree -how many directed edges are pointing towards it. The indegree of Node 2 in Figure 2 is 2 and the indegree of Node 10 is 3.
Out degree - how many edges point away from a node. The outdegree of Node 4 is 4, and the out degree of 2 is 1. Eccentricity This will tell you how centrally located each point is.
Step 5: Edit the Data
First, go to button left and click under layout, where it should say “choose a layout”. This is shown in Figure 1.
Use Force Atlas2. This will sort your data in a way that is based on the amount of connections in the nodes.
After running Force Atlas2, run expansion to push the nodes as far apart as you would like. This will make reading the data easier. The data should look like Figure 2.
Now you can add colors to make your data easier to read the data.
“Choosing an attribute” under the paint tab and while under the nodes tab will allow you to edit the color of the nodes.
Two features that will be helpful to change the nodes colors is degree and eccentricity. The first feature is changing the color by degree. The higher the degree the darker the color will be. If the data being use is from twitter, the same can be done with in and out degree.
You can change the color by double clicking on the bar highlighted below. This will allow you to change the color to any one you want. The bar is highlighted in figure 3. You can also drag the arrows of the color bar to change how dark the lines are.
You can use the same method, but with eccentricity. It will tell you under the tab what each size and how the close the points are compare to eccentricity. There are many other options that can be also applied.
Another feature that is very useful is changing the size of each node. This can be done by clicking the 3 circle button. The options here are the same as the one before. But instead you can choose a size of the circle instead of the color of the nodes. The location of this is shown in figure 4.
You should use this to change the opposite of what you changed before. You should keep the min size 1, but change the max size to 50, so a difference can be seen.
Two other features that can be edited include the A and Tt. The A you can change the color of the text, which is the user name of the node, and the Tt can change the size of the names. They are fine the way they are and can be kept the same. This is shown in figure 5.
Step 6: Running Analytics.
On the right hand side under statistics you can run any of the tests to see any the stats on the side. There are many different options and it is easy to find what each one means by searching the term on the web.
Step 7: Analyze the Data.
First the closer the data is the more related the two nodes are. They are most likely to have more of the same people being friends or following each other. Looking at the eccentricity, you can see which points are in the center of the data. These points are more likely to have a tweet or post spread if you focus something at one of these users. Finally, looking at the highest degree will show who is the most connected and also could be useful in seeing how data spreads. This would be the area that would need to be targeted to spread out the data the most.
This section will give an example of how to use this data. For this example, the data used will be from searching #NetNeutrality, on Twitter.
After running Gephi, following the instructions given above, the graph made can be seen in figure 1.
After looking at this data, the first thing that the user should notice is the two ring circle in the bottom right. This can be seen in figure 2.
From this we can see how all the points are related to the center dot. From this we know the green dot is definitely a leader in the conversation about Net Neutrality. This can be seen by just from how big the node is and that it is central to all of the other nodes. It can also be seen that this is the only point with high eccentricity. This shows that this a great target to either follow if you want to learn more about Net Neutrality, or direct tweets at if you would like to get the attention of a large pool of people.
Looking at the rest of the graph it seems as most clusters are randomly scattered. This shows how it may be hard to find another connection that connects as many people as the green dot from above does. However, you can also tell from looking above that the gigantic dot is hardly connected to the outside. This makes it seem like this user might be specific to a certain area of the world or topic in this debate. In this example the green dot happens to be the leader of the Net Neutrality movement in India. This shows how you still need to understood the context of the user. This might not be the best user to use, unless you are focused in that region, and you may want to look in other of the random clusters to see if you can a better point to use.
Step 8: Conclusion
This data can be used in a wide variety of applications. It can be used from school projects to personal curiosity on certain things happening on twitter. It can also be used in a business sense to target certain groups and to get your point or product out for other twitter users to see. Gephi can be used for a wide range of applications.