Machine Learning e-Portfolio

Unit 1: Introduction to Machine Learning

Collaborative discussion 1 - Initial post: The Fourth Industrial Revolution: What it means, how to respond - Schwab (2016)

In his article, Schwab shares his views on a 4th technological revolution and how it will change and redefine our daily life at its core.
Although this drastic change will disrupt and affect every industry worldwide it could potentially yield positive effects on businesses and our economy; However, he warns that it could also create polarised categories amongst the work force. Schwab also presses for governments to emphasise focusing on security and promoting transparency to enhance communication with the public.To summarise, Schwab predicts an unprecedented transformation on our society, which if we intend to manage successfully, will require all parties to be involved.

This underlying dependency to technology can be illustrated by the consequences resulting from disrupted IT operations in the retail industry.

Amazon Web Services is a leading online infrastructure provider which is used and relied upon by 2.38 million businesses worldwide and, it owns 50.1% of the cloud platform market. On the 7th of December 2021, Amazon experienced a major outage, suspending multiple services, causing unimaginable repercussions. It affected colleges causing the postponement of examinations, household appliances seized to function, and online deliveries were cancelled. The outage had a domino effect on all Amazon’s retailers as well as affecting major media players such as Disney, PlayStation Network, Slack, Netflix and Snapchat.

The scale of this disruption demonstrates how brittle our economy is, how tightly coupled it has become with digital technology and can no longer perform without it.
Because of interweaving technical intricacies and scale, attempting to prevent this sort of incident and expecting systems to run with absolute resiliency seems nearly impossible. Could we expect governments and lawmakers to enforce frameworks to emphasise private providers’ accountability in order to guarantee compensation for loss incurred by such events.

References:

The Fourth Industrial Revolution: what it means, how to respond (By Klaus Schwab)
Dead Roombas, stranded packages and delayed exams: How the AWS outage wreaked havoc across the U.S. (By Annie Palmer Published Thu, Dec 9 2021)
View HG Insights’ full market report: The AWS Ecosystem in 2024
AWS Outage Analysis: December 7 & 10, 2021 (By Internet Research Team | December 7, 2021)

Unit 2: Exploratory Data Analysis

Although a crucial part of the machine learning process, exploratory data analysis is used across many sectors such as healthcare, retail, finance, marketing, engineering, education, logistics... EDA enables us to extract meaningful insights and make sense of data.

Setting up a Google Colab notebook by importing the necessary tools for the exploratory data analysis tasks that will be performed later in the notebook. These libraries such as Pandas, Numpy, Matplotlib, Seaborn, Missingno each contain specific functions and functionalities to analyze, manipulate and visualize data. These libraries allow us to generate various types of charts: histograms, pairplots, scatter plots in order to understand the data distribution as well as the relation between the features in the dataset; Univariate and Bivariate analysis

Peek at the dataset with head() or tail()
Counting the dataset rows
Viewing the shape of the dataset (rows and columns)
OutputtingThe data types per columns
Assessing how much missing data the document counts
Viewing the mean, median, count, max
Visualising the skewness or kurtosis
Finding out how many unique fields per rows
Checking whether the type is consistent per column
Removing columns irrelevant to the analysis
Reshaping the dataset
Detecting outliers in the data

Artefact: Unit2-auto-mpg.ipynb

Collaborative discussion 1 - Peers response: The Fourth Industrial Revolution: What it means, how to respond - Schwab (2016)

Gavin mentions the Knight Capital software bug and the catastrophic financial impact and loss that this technical negligence yielded. This was a result of human errors dating back to 2003 when an engineer at Knight Capital had left deprecated server code in their SMARS application service and nobody had noticed this oversight. During a refactor 2 years later, the tests for power peg were breaking, so they were deleted. Nobody was using the long-deprecated option, so there was no longer a need to check its correctness. In 2012 when a new feature flag needed to be added to SMARS and an engineer reused the deprecated flag where no test failed during deployment since the old test had been removed.

As a result, the argument for the exponential speed in which technology and machines are evolving highlights the question whether we can keep up with technological progress and if we can, what measures are we willing to take.

References:
The Fourth Industrial Revolution: what it means and how to respond | World Economic Forum.
Deploy Gone Wrong: The Knight Capital Story | by Alex Ponomarev | Engineering Manager’s Journal | Medium
The Knight Capital Disaster - Speculative Branches

Ben shows the precariousness our society exposes itself to by relying exclusively on interconnected information systems. To convey and illustrate his argument, Ben has chosen the CrowdStrike incident which occurred in July 2024 demonstrating the chaos which affected various sectors and industries and was caused by a failed security patch update, allowing attackers to infiltrate their target network. The first critical vulnerability is that the attackers exploited outdated or not appropriately secured servers (Naseer, 2024)

Data governance and security are determining paradigms to this underlying and ubiquitous digital layer which is allowing our society to function. Proactively addressing AI-based security issues is a key factor for an industrial environment with smart factories, autonomous systems, CPS, IoT, cloud computing, and big data (de Azambuja et al., 2023). Organising actors to proactively protect against those threats is exemplified by the importance and the scope of cyber warfare.

Ben raises concern in delegating cybersecurity tasks to AI and machine learning. A recent study revealed that organisations are increasing the pace of adoption of AI/ML in cybersecurity and overall, close to three-quarters of firms surveyed admitted that they were testing use cases for AI/ML for cybersecurity (Kinyua and Awuah, 2021).

In handing over key responsibilities such as decision making and deployment to machine learning algorithms, Ben also points at the risk of reducing human intellectual skills which is a recurrent concern associated with the rise of artificial intelligence. (Ahmad et al., 2023)

References:
Ahmad, S.F. et al. (2023) ‘Impact of artificial intelligence on human loss in decision-making, laziness and safety in education’, Humanities & social sciences communications, 10(1), p. 311. de Azambuja, A.J.G. et al. (2023) ‘Artificial intelligence-based cybersecurity in the context of Industry 4.0—A survey’, Electronics ETF [Preprint]. Available at: https://doi.org/10.3390/electronics12081920. Kinyua, J. and Awuah, L. (2021) ‘AI/ML in security orchestration, automation and response: Future research directions’, Intelligent automation & soft computing, 28(2), pp. 527–545. Naseer, I. (2024) ‘The crowdstrike incident: Analysis and unveiling the intricacies of modern cybersecurity breaches’, World Journal of Advanced Engineering Technology and Sciences, 13(1), pp. 728–733.

Unit 3: Correlation and Regression

Correlation and regression are statistical techniques to study, quantify and describe the relationship between variables. Correlation quantifies the strength and type of relationship whereas regression expresses the relationship between one dependent variable and one or more independent variables called predictors or target variables.

Quantify and visualise the positive or negative strength of the relationship between variables using Covariance and Pearson Correlation Coefficient.

Understand linear regression in the context of machine learning by building a predictive model in Python using a Y-intercept form equation.

Build a polynomial model and plot the predicted outcome

Artefact: Unit03 Ex1 covariance_pearson_correlation.ipynb

Artefact: Unit03 Ex2 linear_regression.ipynb

Artefact: Unit03 Ex3 multiple_linear_regression.ipynb

Artefact: Unit03 Ex4 polynomial_regression.ipynb

Collaborative discussion 1 - Summary: The Fourth Industrial Revolution: What it means, how to respond - Schwab (2016)

Many thanks for your responses, of which both are insightful. In reference to Schwab’s 2016 paper on the Fourth Industrial Revolution, we can take it as an outset to the discussion of the effects of modern day technology which is evolving at a rate that can be difficult to keep up with.

However, as we know, it is imperative that we maintain the momentum of this rapid growth and change in order to provide a supportive and safe environment for global business, technology and finance to flourish; a hybrid cloud architecture sounds like a very good compromise to supply security in both the private and public cloud but one guarantees absolute privacy and the other, easily accessible information. It is interesting to see in the Nguyen – Sondano table, that proper training is the top concern for mitigating cloud security risks, followed closely by the implementation of IPS.

References:
Khan, S.U. and Ullah, N. (2016) ‘Challenges in the adoption of hybrid cloud: an exploratory study using systematic literature review’, The Journal of Engineering, 2016(5), pp. 107–118. Nguyen, D.S. and Sondano, J. (2023) ‘Resilience and Stability in Organizations Employing Cloud Computing in the Financial Services Industry’, Journal of Computer and Communications, 11(4), pp. 103–148. Website (2016). Available at: https://www.weforum.org/stories/2016/01/the-fourth-industrial-revolution-what-it-means-and-how-to-respond/.

Unit 4: Linear Regression with Scikit-Learn

In statistics, linear regression is a model that estimates the linear relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). Logistic regression is a supervised machine learning algorithm widely used for binary classification tasks, such as identifying whether an email is spam or not and diagnosing diseases by assessing the presence or absence of specific conditions based on patient test results.

Artefact: Global_GDP.ipynb

Unit 5: Clustering

Clustering is an unsupervised machine learning technique designed to group unlabeled examples based on their similarity to each other. (If the examples are labeled, this kind of grouping is called classification.)

Jaccard Coefficient Calculations

The table shows the pathological test results for three individuals.

Name	Gender	Fever	Cough	Test-1	Test-2	Test-3	Test-4
Jack	M	Y	N	P	N	N	A
Mary	F	Y	N	P	A	P	N
Jim	M	Y	P	N	N	N	A

Calculate Jaccard coefficient for the following pairs:

(Jack, Mary)
(Jack, Jim)
(Jim, Mary)


            import numpy as np


            # Jack	Y	N	P	N	N	A
            # Mary	Y	N	P	A	P	N
            # Jim	  Y	P	N	N	N	A

            #J(A, B) = (A & B) / (A | B)

            def jaccard_coef(x,y):
            intersection = np.logical_and(x, y)
            union = np.logical_or(x, y)
            similarity = intersection.sum() / float(union.sum())
            return similarity



            def binary_encode(s):
            bin = {"Y": 1, "P": 1, "N": 0, "A": 0}
            encoded_list = []
            for k in s:
            encoded_list.append(bin.get(k))
            return encoded_list


            dataset = {
            "Jack": ["Y", "N", "P", "N", "N", "A"],
            "Mary": ["Y", "N", "P", "A", "P", "N"],
            "Jim": ["Y", "P", "N", "N", "N", "A"]
            }

            jack = binary_encode(dataset["Jack"])
            mary = binary_encode(dataset["Mary"])
            jim = binary_encode(dataset["Jim"])

            print(f'Jack: {jack}')
            print(f'Mary: {mary}')
            print(f'Jim: {jim}')

            print('Jack & Mary: ',jaccard_coef(jack, mary))
            print('Jack & Jim: ', jaccard_coef(jack, jim))
            print('Jim & Mary: ', jaccard_coef(jim, mary))


            Output:

            Jack: [1, 0, 1, 0, 0, 0]
            Mary: [1, 0, 1, 0, 1, 0]
            Jim: [1, 1, 0, 0, 0, 0]

            Jack & Mary:  0.6666666666666666
            Jack & Jim:  0.3333333333333333
            Jim & Mary:  0.25

Artefact: jaccard_coefficient.ipynb

Unit 6: Clustering with Python

Clustering is an iterative method for grouping a collection of objects in such a manner that objects in the same cluster are more similar in some specific predefined characteristics to each other than to those in other clusters. Leveraging the K-means algorithm, we experimented with clustering by loading the Iris dataset in order to surface different groups of flowers based on characteristics such as sepal length, sepal width, petal length and petal width. After setting our centroids, we invoke the K-means method ‘Fit’. We use the Elbow method to determine the optimal number of clusters so that we can avoid overfitting. By employing the sum of square error (SSE) we measure the total squared distance between each data point and the centroid of the cluster it belongs to.

Artefact: Iris_K-means.ipynb

Unit 7: Introduction to Artificial Neural Networks (ANNs)

An Artificial neural network consists of connected nodes called neurons, which model the neurons in the brain. The connections between neurons serve a function similar to the brain synaptic neural connections adjusting their strength (weight) depending on stimuli. We experimented with the gradient descent function or loss function of an ANN by adjusting the learning rate which is the stride in sort taken during the descent along the function. The cost function is crucial to the learning of a model and offers a metric to measure a model's performance by quantifying the difference between predictions and actual results. The lower the value, the better the model is learning to fit the data in an optimal way. At the lowest value of the cost function, the model has reached convergence. Applying the calculations of chained partial derivatives to measure and adjust the ANN learning performance was fascinating and I finally was able to apply calculus to a concrete use case.

Artefact: Ex1 simple_perceptron.ipynb

Artefact: Ex2 perceptron_AND_operator.ipynb

Artefact: Ex3 multi-layer Perceptron.ipynb

Unit 8: Training an Artificial Neural Network

Calculating the cost function using gradient descent and adjusting the iteration number as well as the learning rate.


        import numpy as np

        def gradient_descent(x,y):
            m_curr = b_curr = 0
            iterations = 50
            n = len(x)
            learning_rate = 0.075

            for i in range(iterations):
                y_predicted = m_curr * x + b_curr  #y = mx + b
                cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
                md = -(2/n)*sum(x*(y-y_predicted))
                bd = -(2/n)*sum(y-y_predicted)
                m_curr = m_curr - learning_rate * md
                b_curr = b_curr - learning_rate * bd
                print ("m {}, b {}, cost {} iteration {}".format(m_curr,b_curr,cost, i))

        x = np.array([1,2,3,4,5])
        y = np.array([5,7,9,11,13])

        gradient_descent(x,y)


        Output
        --------
        m 4.6499999999999995, b 1.3499999999999999, cost 89.0 iteration 0
        m 1.0200000000000005, b 0.405, cost 53.734999999999985 iteration 1
        m 3.8047499999999994, b 1.2352499999999997, cost 32.557024999999975 iteration 2
        ...
        m 2.178758038970076, b 2.354664382435115, cost 0.07979893231689464 iteration 48
        m 2.174208302573649, b 2.371023607533314, cost 0.07580328391789995 iteration 49

Artefact: Gradient_descent.ipynb

Collaborative Discussion 2 - Initial post: Legal and Ethical views on ANN applications

In “The Language Machines”, Matthew Hutson undertakes a critical analysis of Large Language Models technological advances and the potential societal and ethical risks associated with their use.
The author discusses the outstanding language fluency of an LLM but he describes the characteristics of its process as purely mathematical. It works by observing the statistical relationships between the words and phrases it reads, but doesn’t understand their meaning (A remarkable AI can write like humans — but with no understanding of what it’s saying. By Matthew Hutson, 2021).

The lack of reasoning or understanding labeled as “nonsensical answer” which he observed in the results yields by prompting such LLMs has raised concerns amongst the AI community describing the models as “stochastic parrots” (Bender et al., 2021).
This inequality is exemplified as an inherent bias named the “coded gaze” by Dr Buolamwini. As a student, she reported a facial recognition software failure while working on a university project. After researching the cause of the glitch, she deduced that companies had introduced bias in their models by training their neural networks with uniform data. As a result, the lack of diversity in the data rarely exposed the models to similar complexion and the software wasn’t able to detect her face. (Buolamwini, 2023)

The benefits in using artificial intelligence in healthcare has proven to be a logical course to take globally, particularly for the NHS. The UK Government is investing in the National AI Strategy, investing money into research for the sake of progress and cost effective budgeting. The introduction of a smart stethoscope has been used to identify patients with suspected heart disease, the results proved to be accurate in its detection of heart issues; the results are promising so far, giving a strong indication leaning favourably for GPs to use the stethoscope in order to spot disease well in advance, in turn not only saving time but money spent on referrals, thus improving diagnosis time for the patient as well. Artificial intelligence has not only shown its capabilities in diagnosing lung cancer earlier, (as opposed to using the Brock score), but it has shown that it can detect early signs of other conditions and disease.(Artificial intelligence: 10 promising interventions for healthcare, 2023)

References:
A remarkable AI can write like humans — but with no understanding of what it’s saying. By Matthew Hutson (2021) in. Cham, Switzerland: Springer International Publishing.
Artificial intelligence: 10 promising interventions for healthcare (2023). NIHR Evidence. Available at: https://doi.org/10.3310/nihrevidence_59502.
Bender, E.M. et al. (2021) ‘On the dangers of stochastic parrots: Can language models be too big?’, in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. FAccT ’21: 2021 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA: ACM. Available at: https://doi.org/10.1145/3442188.3445922.
Buolamwini, J. (2023) Unmasking AI: My Mission to Protect What Is Human in a World of Machines. Random House.

Unit 9: Introduction to Convolutional Neural Networks (CNNs)

Artefact: Unit09 Ex1 Convolutional Neural Networks (CNN) - Object Recognition.ipynb

Collaborative Discussion 2 - Peers response: Legal and Ethical views on ANN applications

In his post, James highlights the opportunities and challenges posed by the use of LLMs and more specifically within the healthcare industry where their transformative impact suggests an over reliance of their capacity.

Large Language Models have proven to be beneficial in many areas of a rapidly growing revolution in a new era of machine learning. Matthew Hutson conveys a doubtful opinion in his article “The Language Machine”; there are the potential risk factors in LLMs used by businesses and organisations in relying on what is still an emerging technology, because it is easy to access information and scripts, reducing time and money spent on researching and composing the literature manually it is easy to get used to relying on it, when it can be less than perfect.

Once again, the associated risks and ethical risks of becoming reliant on AI chatbots such as ChatGPT are obvious when clinicians, although trained and more than capable, use GPT4 for accessing and writing medical research papers. This, in the healthcare system, although beneficial, can also have its drawbacks.

In the research paper “The Global Landscape of AI Ethics”, its findings reflect the concerns of many people who, although embrace the fifth revolution it reveals the five principal concerns in AI:- transparency, justice and fairness, non-maleficence, responsibility and privacy. The landscape is wide and with time and prudence, on our part, LLMs can work well with us as opposed to “for us”.

References:
Li, F., Ruijs, N. and Lu, Y. (2022) ‘Ethics & AI: A Systematic Review on Ethical Concerns and Related Strategies for Designing with AI in Healthcare’, AI, 4(1), pp. 28–53.
Website (2019). Available at: https://www.researchgate.net/publication/335579286_The_global_landscape_of_AI_ethi cs_guidelines.

In these times of using AI, it is proving to be a powerful writing tool for organisations from businesses to healthcare, and proving to be a valuable instrument in research, writing papers, composing emails and completing administrative jobs whereby that time can be implemented for expanding on other factors in an efficient capacity. However, as with everything that is created for making life easier and cost effective it comes at a cost.

The creative industry is under threat of becoming the worst hit and as such, companies that are using AI for writing be it creative or content, images, and artwork, the author and the artist are unable to compete against the speed and efficiency of AI generated pieces. However, the generative tools of AI are at risk of running out of ideas that would usually be taken from real authors and artists because many in the creative sector are reducing their pieces as a negative response to an infringement of their work; in turn this would mean that businesses who have previously enjoyed the ease of accessing creative ideas, will run the risk of struggling without human artists. So, there is the ethical side to revolutionary progress, which if abused can cause problems in the near future when humans revolt against the use of their craft being snatched away. Of course, the concern for job displacement for human workers from the labour force to office work and the medical sector is very prevalent across the globe. However, AI is far from perfect, and Matthew Hutson encapsulates the current situation by simply stating that “What we have today is essentially a mouth without a brain”.

References:
Blogs (2023) Sogeti Labs. Available at: https://labs.sogeti.com/ (Accessed: 27 January 2025). De Cremer, D., Bianzino, N.M. and Falk, B. (2023) How Generative AI Could Disrupt Creative Work, Harvard Business Review. Available at: https://hbr.org/2023/04/how-generative-ai-could-disrupt-creative-work (Accessed: 27 January 2025). A remarkable AI can write like humans — but with no understanding of what it’s saying. By Matthew Hutson (2021) in. Cham, Switzerland: Springer International Publishing.

Unit 10: CNN Interactive Learning

Working on classifying images with a convolutional neural network and measuring the performance of the CNN was probably the trickiest part of the module. Preparing the data in advance to optimise its use by the CNN was a step further than more basic EDA. Each step of the dataset optimisation was geared towards specific requirements of the CNN’s image feature detection process and the operations the layers need to perform before passing the data to the classifying layers. Choosing the correct numbers of different types of layers as well as attempting to parameterise (hyperparameters) the model was an iterative process which required logging each parameter (learning rate, epochs, activation function, patience and early stopping) and results of the validating and training scores.

Artefact: CNN_Summative_assessment.ipynb

Artefact: Summative Assignment Final.docx

Collaborative Discussion 2 - Summary: Legal and Ethical views on ANN applications

The use of LLMs in policing and surveillance activity has shown how predictive models can amplify the marginalisation of certain communities. The data selection process used to train these algorithms needs to be regulated and scrutinised so that we can avoid societal inequality(Jain, 2018).

Large Language Models in healthcare can pose a beneficial means in assisting doctors, thus saving time leading to prompt diagnoses and early intervention for patients. Transitioning to digital imaging in the medical industry has leveraged the use of CNNs for patterns recognition and classifying images. Exciting gains have also been made in predictive biomarker analysis. ML models have been used to identify predictive signatures from peripheral blood samples and tumour biopsy material, including analyses of whole-genome profiles(Hunter, Hindocha and Lee, 2022)

However we cannot ignore the “robotic responses” conveyed from LLMs based systems. Although the accuracy of CNNs in the medical field is progressing at an amazing rate helping with prognosis, diagnosis and assisting in tailoring patients treatment the necessary bond between patients and their doctor remains paramount. In diagnosing patients, human doctors have a unique way of making behavioral observations and showing a sense of empathy with patients that no machine can.(Cordero, 2023)

Collaborative AI in an ideal world, would yield the dream team; LLMs can work on the time-consuming tedious tasks that we as humans prefer to pass on to others. We are able to work on projects that computers cannot and in turn productivity higher and conducive to the increase of work generated between humans and artificial intelligence.(Slack, no date)

References:
Cordero, D., Jr (2023) ‘The downsides of artificial intelligence in healthcare’, The Korean Journal of Pain, 37(1), p. 87.
Hunter, B., Hindocha, S. and Lee, R. (2022) ‘The role of artificial intelligence in early cancer diagnosis’, Cancers, 14. Available at: https://doi.org/10.3390/cancers14061524.
Jain, A. (2018) ‘Book review: Weapons of Math Destruction by Cathy O’Neill’, Economics of Networks eJournal [Preprint]. Available at: https://doi.org/10.2139/ssrn.3187660.
Slack (2024) Collaborative Intelligence: People and AI Working Smarter Together, Slack. Available at: https://slack.com/blog/collaboration/collaborative-intelligence-people-and-ai-working-smarter-together.

Unit 11: Model Selection and Evaluation

Understanding of model selection and evaluation, which is a hugely important procedure in the machine learning workflow. Selection of correct model(s) is like selecting the correct tool(s) to do a specific job, which in machine learning case is prediction or classification. Once selected, we must evaluate the model(s) based on their performance. When we evaluate our model(s), we gain a greater insight into what it predicts well and what it does not, and this helps us turn it from a model that predicts our dataset with a 65% accuracy level to closer to 80% or 90%.

Artefact: Unit11_model_Performance_Measurement.ipynb

Unit 12: Industry 4.0 and Machine Learning

Discussing the fourth industrial revolution and the use of large language models in various industries and its impact on our life was extremely interesting. Those discussions broaden my views and thinking about the impact of smart AI powered technologies. On one hand, it highlighted and confirmed real concerns regarding the unemployment rate linked to the use of AI assistant tools and data privacy. Digital forgery is also a major issue which will progress exponentially, raising legal battles with authors and artists fighting for their property copyrights. On the other hand, the progress in the medical field is definitely counterbalancing the argument, offering invaluable help to practitioners and hope to cure illnesses still incurable to this day. I am a huge believer in AI powered robotics and it is also a field I really look forward to exploring. As we progressed in the course, I discovered many passionate researchers and writers in the AI community. Both technical and ethical researchers in this field really opened my eyes about how we could greatly benefit from a super artificial intelligence but also what is at stake in pursuing the development of an AGI. To name a few, Yann Lecun, Demmis Hasabis, Cathy O’Neil, Joy Buolamwini, Geoffrey Hinton, Emely Bender, Meredith Broussard… Will shape the beginning of my journey exploring this fascinating field. Read more...