<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>#FeatureExtraction Archives - anteelo</title>
	<atom:link href="https://anteelo.com/tag/featureextraction/feed/" rel="self" type="application/rss+xml" />
	<link>https://anteelo.com/tag/featureextraction/</link>
	<description>Leading Digital Solution Firm</description>
	<lastBuildDate>Fri, 21 May 2021 16:50:02 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	<generator>https://wordpress.org/?v=6.9.4</generator>

<image>
	<url>https://anteelo.com/wp-content/uploads/2020/01/cantlogo.png</url>
	<title>#FeatureExtraction Archives - anteelo</title>
	<link>https://anteelo.com/tag/featureextraction/</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Feature Extraction and Transformation in ML</title>
		<link>https://anteelo.com/feature-extraction-and-transformation-in-ml/?utm_source=rss&#038;utm_medium=rss&#038;utm_campaign=feature-extraction-and-transformation-in-ml</link>
		
		<dc:creator><![CDATA[Anteelo Master]]></dc:creator>
		<pubDate>Wed, 03 Feb 2021 16:48:43 +0000</pubDate>
				<category><![CDATA[General]]></category>
		<category><![CDATA[#FeatureExtraction]]></category>
		<category><![CDATA[#FeatureSelection]]></category>
		<category><![CDATA[#machinelearning]]></category>
		<category><![CDATA[#ML]]></category>
		<guid isPermaLink="false">https://anteelo.com/?p=4081</guid>

					<description><![CDATA[<p>Features Any machine learning algorithm requires some training data. In training data we have values for all features for all historical records.  Consider this simple data set Height         Weight         Age            Class 165                 70   [&#8230;]</p>
<p>The post <a href="https://anteelo.com/feature-extraction-and-transformation-in-ml/">Feature Extraction and Transformation in ML</a> appeared first on <a href="https://anteelo.com">anteelo</a>.</p>
]]></description>
										<content:encoded><![CDATA[<div align="justify">
<h2>Features</h2>
<p>Any <a href="https://anteelo.com/">machine learning</a> algorithm requires some training data. In training data we have values for all features for all historical records.  Consider this simple data set</p>
<p><strong>Height         Weight         Age            Class</strong></p>
<p>165                 70                   22             Male</p>
<p>160                 58                   22             Female</p>
<p>In this data set we have three features for each record (Height, Weight and Age).</p>
<p>Any algorithm takes into account all the features to be able to learn and predict. However all the features of the data set may not be relevant.</p>
<p>Suppose we have 1000 features in a large training data set, using all features may exhaust all the system memory and computational power. So we must choose most relevant features and transform them according the the input required to algorithm. After this process we may find that only 100 of 1000 features are contributing to labels.</p>
<p><img decoding="async" class="aligncenter" src="https://miro.medium.com/max/693/1*zUATaXMAmKof27rPyBRWsg.png" alt="CNN application on structured data-Automated Feature Extraction | by Sourish Dey | Towards Data Science" /></p>
<p>We can prepare training data by following  two techniques</p>
<ol>
<li>Feature Extraction</li>
<li>Feature Selection</li>
</ol>
<h2>Feature Extraction</h2>
<p>Feature extraction is the process of extracting important, non-redundant features from raw data. Suppose you have 5 text documents. Suppose there are 10 important words that are present in all 5 document. Then these 10 words may not be contributing in deciding the labels for those documents. We can omit these words and create new features excluding those words.</p>
<h4>TF-IDF technique</h4>
<p>TF-IDF technique is a feature extraction technique based on frequency of features in documents</p>
<p>Suppose there is set of documents D(d1, d2, d3)</p>
<p>TF(t,d) is term frequency = frequency of a term/feature value t in document d.</p>
<p>IDF(t,d,D) is inverse document frequency of term t for document d in set D. Here N is the number of documents N = |D|.</p>
<p><a href="http://blogs.quovantis.com/wp-content/uploads/2016/06/Screenshot-from-2016-06-23-154528.png"><img decoding="async" class="alignnone wp-image-5600 size-full" title="inverse document frequency" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAALEAAABGCAAAAACm1Sq4AAAGJUlEQVRo3u2Ye2xTVRzHf38Z2Vb8k8kwKo61GOPWrg9NHAXKXzVsQ/9Z5qawrYVOQyIIWzQhujGGf6gR1hkzSTbWbh0kShA2IMZAYOORKE73YDMGEVjf6/t52x7v47RdV2DQtaRN7vnj3nMf59xPT7+/1wGUaw1YYpaYJWaJWWKWmCVmiXOGOGgym/xUx2yxWv25QPw5AFSGEBp7huzszQXiwH+lnJUzCEVM0lYjkRM61hc3Qgt5dgmmcsTyNDWjsGYeIWOxPTeIIwq1bxt0I6StCeUGsUswjbQg8SKFOke8m36dE9mLYNTFn8oRYk1NGKGDUDdX7MgRYsV35GGWU/AlRZ4LxC7+NHkM18Oq9Mg488TTxU7qNAownWbiSGYC3vA7RT8GyY5Xtt6ZTmKvslnpygTx13yRuOw81RvcE0knseMDDpgzEj58Pq+H6YVQWlVBNGeGOHP5cUiVa8RBTOwwYDm7bX5kNGcxMUETB1sqSjeoKQs5ulG+7dOXV9zOamITIipllsAktwOhQ3BjXvXsLw1zWUxMWZ4GtGR3P+eqQyJxkPlWK5HlOg7VwwTZ7YdOl/hVJ8nfzTy792JWtNpFxOQae7bAGE2sQj3QNyXjuplnvktZ0caTVGFBu2CI7LaCGu35ZKe8xZfV3i2gAiO6nvcRQv5Kno1Qtdy/Y5onspjYsZ0Hjb3o+3yFTsn9FaHjQLY1dcbsJfYc02r6SBFPdzV1kT44IJSp+9V1nLpg9LVoLCGMd71PjU2vj6Z7/vAS+bEGDNTpY64V33CXMj1DfWHFJXzPZzE+Yk/KYbPbbXaHPcWELWI1m/y7i3k65rLx2BLEv+Uf8YQC1i1bo0p2ltEC8VdxxgPR2kIO8O1Dv2h+XfympFwoFNXfeODzJXLlyKkVUO4mnCo4y1RevUvVIOc38RXydc2G2Af4NLFdIoo7ELsSHlEcm/eC0mg0Dq/nXHzA07vSc49GDs5AM5VUw/HHJEYe/cikJS4eTGyTiOJC8IjXx/d4vKbhoSHtvfgERBWMUOeJApEnefpOGF5YCup1QzpNYt2qBQ151D0+8eI/MUYcX2Mj1MeS9OvSihqFomEgrlpbEcdCq30bTCZNR1RzFqSFZzeWKZqatt9MeKUZbqWDeF4swp5i+tjpAYgVx0OCUXcoHFloZVMgoxVP1MMA1uYE9jyRwd4XuNrfo28elE4GwgmDHad7x8U8J018dJlrLMaq6OC1SwtjxfE14UTSCDW9nUnKRRwl7oJahuoWvxhKBO34xX7pYiu8UiI/UA51EZr4w+UR/wXMLkkH/ITMRUVRGTd2Jw0IVTMyRmZOVBXdEM1lfO3wsw8ryrNh8abAbNH7ftTCpGEzecyvbEqNeFzNldDym1lZTSATpwr7Pg+/5wTdBueSZIzOAA9vXkXuR2UfrMYPqV+09jgzeABbXqiKUvAuWsYI9eSpKON4cmIH5Y97Nq/4ir5qgS7SXqAzaurl25vo1qCL2ftVkPnx9zuTZrOtlsUM2Fi4gxm8A1veLJC5uYuRMZnpVqx6152KKmhi5D4C1B5aQETl0fthDJeEQfmVENPiAw5hGV9+TpacAo5BKwphN2l7xZw4eBCaQ8gAdWFqnFcomQ2hlNeY9BWcTeQ8gc35duTbvHruLZwpad9enOUFqxmHO/Fayb+xe733ca+DfHigA1/s3Jc4dJByQb3Qrd5NXgzjH546sYT2bgfhjvMLENyU44gdrGxP3Gi1/rO64JbBcLWTU2WKBd59sBG/vxP0p/i4XEC3BQMJIcaY10bMcuGi/NoCf5w6sZiOIETtuop9hwtfirnUiErQoTtxUodjnqVU8IZEUM4X1F4Oxue4IDyEe38+XyaNJQDIsHWreujEyYGoLM6VlMlHhGvfi9AL3rdsYlp+hGmeXEjLgsxlqq1BqWyMxjynnW6OxMAbj+nz+oXL6hvd06hU7ojFPKvBjXz6MEoHsV0ifmgpFSZCRMp7gdTgB91fNrFLyEvTlupjti7oXx4xupC/qe/e08KNDLcVvMcYQsMPT0pMfINN6O/DTZefFrH/s7YzWCsjfzwpcYZ28JdT/edQY4lZYpaYJWaJWeLU2v+lm5tGpYvo2wAAAABJRU5ErkJggg==" alt="inverse document frequency" /></a></p>
<p>TF(t,d) x IDF(t,d,D) is a measure based of which we can say that term ‘t’ is important in document ‘d’. The words for which this measure is really low, we can omit those words in features.</p>
<h2>Feature Selection</h2>
<p><img fetchpriority="high" decoding="async" class="aligncenter" src="https://miro.medium.com/max/2004/1*2zpn0kBjT-acdRZCOhpQ_g.png" alt="Feature Selection and Feature Extraction in Machine Learning: An Overview | by Mehul Ved | Medium" width="554" height="275" /></p>
<p><a href="https://www.instagram.com/anteelodesign/">Feature selection</a> process tries to get most important features that are contributing to decide the label.</p>
<h4>Chi Square Test</h4>
<p>Chi Square test is a test that tests feature’s independence from class/label and then select ‘k’ features that depend most on class.</p>
<p>Example</p>
<p>Suppose there are some people living in four neighborhoods A, B, C and D and they are labelled as White Collar, or Blue Collar or no Collar</p>
<p>Chi Square test  may determine that 90 people living in neighbourhood A are living by chance and not because they are White collar. Chi Square test depends on probabilities.</p>
<p>Chi Square is calculated by below formula for a feature.</p>
<p><a href="http://blogs.quovantis.com/wp-content/uploads/2016/06/chiformula.png"><img decoding="async" class="alignnone wp-image-5603 size-full" title="chiformula" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAIkAAAA1CAIAAAD6Tl+QAAAHkElEQVR42u2aeVATVxzHVzSAFTEOCrWgVcqh0wELoV5YW7BigakFCtRiKcqh9UKtSlUORZkJKASmcokHVhBRQDlUHC4RPAgwcmg5BgiGISzJQJJJyCRMdicNVJRAbpMl4H7/YjLLvuPzfr/3++57gACVpgpApwBlgwplg7JBhbJB2aBC2aBC2UwvQVwuH2WjaYJvxMcnpRJCvJx2XWnjomw0CU2PY0QtRxg43Yk22M3ZVJSNBolzvaRHmM4gcortAoeZz2a4Oy8q/ikdFrNKOWBLTeWT+i768FTFyUDVQFUCPp8s0gGYUXnE7tuztewZntNYT0O+D8gDJ5CB6PXXjgfsj0i6W1VPLLn2l9eP+9JfsWGVrf6WrMjdm5djjWy3HzkVJtSpkEOB7l+bGDtdfQNNoCOA+/ODnMNq3oHgtl7Z7RddPQDP9FqgMczGMb1XdJi8jht+tvYhJf3vpgmmFfkuNd1TwVDhmqjYZai9Lo38HgUMZv/mk0sTV5aR0xzszjTzRv4m34/F53RwBXDffUIZbUaz8TB2KxKZcXiw4tDKxY4pnaJpjPHAA4v9qWBQVe1yG8PMMJaRr3hj0SEUuyYu9gVb7OP0uy5LfcpZwhy3DJiF0dHRwQCAyb6nQzOZDVPPLkNk8Q3VhJhjVhyrmThqZqnPQi1rQoeqDEp3Eg6zZHfVKAl6Zdy118J6mE9rIzEl5E3wspW+d8XH5D2HXy8MaR4fIPSHv2ABs7BG7qS5TMYBWvYZ/SpqmJbrPBf4IiAm8WJ8VLDr+j3lTFlxVrtX3yhSrWwkGtopCpviZYmU8T88DzYBFvlVsCaWS2D2D3OBVedej6Qgfnv06tX4tg8p3ViP/Q0xuAtEUndHQ/F5zx13+kXCRUwLMAlvqOWlJjYyDO3UiJG/Ml3EIzzw0Ae+jJk47/zuVHtt3W9TuiStK4gCUW5swWoBYqRlsC1n0mYTbo6xeLu5sxuyRi0Ln1yS08SW2FUw+XPART1sZBlaBWtQIvHo8tGRG63d6jpJLlsdv1lra2W5whg75+0MYexTSHwxxZJxQu/4H+pPmM4xj2jiiUx8f77Pp3rr8U0cAUylPkmOPLo3tpb9QZtNsp322GbzLoSfx0aVDQorZvEtQJ1Rhjp+aoobGYZWUdQw7WGgiXDOtTdceC0tCuEh8vOMcO+vFmAswxomPQh36QU+54xf0U1nV8+3imzmvrd5xAvOprjgoj5h5xmVZ86cKy0L0FkSTfoAryPcbPT0t+WPq/n44P0j7hHEodEmxLbArvKdZ/a3OvcbyYZWiXcxq49ZCunMsgqtYcl6mN9XfPLnPyf7E+5Ci0RRd8PrvBW8xSEw8VFjR+uLvLgDnp5HM5pZbx+B+XxWuddsY3wnX+leh/6xzWI2MN8+6MSo6Tx+MMDb0VxXd21C+/D/TYhrASadX2F4pEF9bKQZWqU01HAON5K0lu8byQYyxOsquPlyEsSjK3BJ5In/C7F7myoLc/NLa99M+BbAYBQ6ASuTXpaJeZWq9kAxLUBd8TYWYc08dbGRbmiVFK/tosMnI9vOjrw+SJkXULLc1kQ08uR9HASv4LBOwcGhRSCsJjZiWuDWHsd55VHlaFG5WkCGoVVafHKmO1ZIR9/1KkmZshbqueXnfFqBLAtz6Wz1+gCRFpgC5tOTTgG5FLmWnjQ2byoyklNz6gbGd55L7WFCahwK1F+48zMhHZ2Ncf8qVZzzuh5m19JhgQYKpj2jPct61C1vYEtjk1vW2Yi3mWd1uok7Nu68mItVAzLZQL0FvQUh290kyCf0EU3yO2B65WHzESthHU5kCz5mAbKy1z03rPFo2oLAMkJM/hu5Ug2dSCdejgwPFavwc+mN0r9qsOtO28wW4jE9UD4oXwS0tLT4Tn/5+/srtN8wircbLAmspNRdirr+agipJcNrid+kCwDL9lfLFzogCP4z/ZWZmalYLcAs8TGYh9tJqB6UO4kPtxJaCWv1APHS0t90qVtGXoTAwiDbzfiXbDSnSa2dOgjWgNG+Z4rEDMSBOP2kznax6iT1cyAZVuf81jV7CkBIgO430rw/MQWfdN5Bz+QwkYNMj/i9d3bZuSa84kz53Chw2Iw8G27rrehUIgOi5jrrLztWh8AHZ5hVE+mwLvgRVWNCRpHjZgTZPIyNK6aMmpuBe9vmL/ItYQjgYZ4aZ224O+NXnHtqm2wDwO8pud3AQgCNgsfNCLGBwLvt3PefU1I8rDcEncUnlKtrF4DpVSc32oc8ll1ywKyGhB3+t/sQMJgKHzcjWwsgVDS3p7nbeqd3ybBP3N4XN8PdzHQWuucPIOH9FT5unmlsIIhWctBC22C95+9+YuTr4+nmsuW7DbhVxmNl+WK/UgYSPZNx3Dzz2fAplPsR/jvkl++B5GZEqjhpx80quGgwXXKaJkrqcfNHktM0VRKPm1mquGiAslGqYmTWpdWlST5uplaq4qLB9GeDQFpXAt6HXjRA40Z9Uv9FA81mA1OfIJPWlZD6LxpoNJuRu13IpHXlVo7aLxpodNzwWcikdY2VBu83jEJk0jrKRpmkjkxaR9loblpH2aBC2aBsUCGg/wA61Z//ALNPLQAAAABJRU5ErkJggg==" alt="chiformula" /></a></p>
<p>Where O values are observed and E are expected.</p>
<p>In the above example, expected values for neighbourhood A for each class can be calculated as</p>
<p>E(A,White Collar) = (150/650) * 349</p>
<p>E(A,Blue Collar) = (150/650) * 151</p>
<p>E(A,No Collar) = (150/650) * 150</p>
<p>After Chi Square has been calculated, its closest value is located in below table for probability in the degree of freedom row.</p>
<p>In the above example for neighbourhood A the chi Square is equal to 2 (rounded off)  and degree of freedom here are (no of rows – 1) x (no of columns – 1) = 6</p>
<p>We get 2.20 in 6th row that is most close to 2, it gives us p = 0.9.</p>
<p>This means there is a 90% chance that deviation between expected and observed is due to chance only.</p>
<p>So we can conclude that labelling doesn’t really depend on neighbourhood A.</p>
<p>On the other hand if p values come to 0.01, it means there is only 1% probability that deviation is due to chance, most of the deviation is due to other factors. In this case we can not really ignore Neighbourhood A in our predictive modelling and it has to be selected.</p>
</div>
<p>The post <a href="https://anteelo.com/feature-extraction-and-transformation-in-ml/">Feature Extraction and Transformation in ML</a> appeared first on <a href="https://anteelo.com">anteelo</a>.</p>
]]></content:encoded>
					
		
		
			</item>
	</channel>
</rss>
