I am working on Python code that counts the number of unique commenters in a chat using their unique id found in the field name “_id” nested in the commenters field name. The JSON looks like this.
Json:
{
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{
"display_name":"student1",
"name":"student1",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{
"body":"Hi",
"fragments":[
{
"text":"Hi"
}
],
"is_action":false
},
"more_replies":false
}
{
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{
"display_name":"student2",
"name":"student2",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{
"body":"Hey!",
"fragments":[
{
"text":"Hey"
}
],
"is_action":false
},
"more_replies":false
}
{
"_id":"123adfvssw",
"content_type":"video",
"content_id":"12345",
"commenter":{
"display_name":"student1",
"name":"student1",
"type":"user",
},
"source":"chat",
"state":"published",
"message":{
"body":"How are you?",
"fragments":[
{
"text":"How are you?"
}
],
"is_action":false
},
"more_replies":false
}
In all, the topic received 3 commenters. However, student1 commented more than once. So in retrospect, there are only two unique commenters in this thread. My question is how do I ensure that I only count the unique commenters using their _id field in the JSON? I am able to count all the commenter fields in the text but I am unable to count the unique commenters. The initial code I wrote counts all the commenters field which prints 3. However, the real answer is 2 since student1 commented twice. I am now trying to put the commenter's _id in an array/list so that I can count the ids that are unique. However, I am having some trouble storing the multiple values through a loop. Please help if you can.
What I have tried:
Code that Prints Number of Commenters Field:
import json
import requests
from collections import Counter
files ="/chatinfo.txt"
with open(files) as f:
commenters = 0
for line in f:
jsondata = json.loads(line)
if "commenter" in jsondata:
commenters += 1
print(commenters)
Output
3
An attempt at getting the Commenter _id Field value in an array/list to compare and only count unique commenters _id:
import json
files = "/chatinfo.txt"
with open(files) as f:
num_with_field = 0
for line in f:
jsondata = json.loads(line)
dictjson = json.dumps(jsondata)
if "commenter" in jsondata:
commenterid = []
commenterid.append(jsondata["commenter"]["_id"])
print(commenterid)
Output:
['193984934']'['157255102']
['100365638']
____________
However, after this, I try to see what's in the array/list. I get ['100365638'] instead of all three values.
print(commenterid)
Output
['100365638']'
Out of the three, it looks like only 1 value was stored in the array/list commenterid.
Problem 1:
Can anyone help me with filling my array/list with the three values I need using the loop? The array/list should contain ['193984934']['157255102']['100365638'].
Problem 2:
In addition, how can I count the unique ids in that array? So far I've only seen how to count the frequency of the ids.
Counter(commenterid).values() # counts the elements' frequency.
Do you think
len(set(commenterid))
would work? Also if you have a better way of doing this other than storing the values I need in an array or list I would love to see it. Thanks in advance.